184

I am in need of a way to get the binary representation of a string in python. e.g.

st = "hello world"
toBinary(st)

Is there a module of some neat way of doing this?

5
  • 12
    What do you expect the output to be, specifically? Commented Sep 15, 2013 at 18:20
  • By "binary", do you mean 0101010 type or the ordinal number of each character in (e.g. hex)? Commented Sep 15, 2013 at 18:23
  • Assuming that you actually mean binary (zeros and ones), do you want a binary representation of each character (8 bits per character) one after another? e.g. h is ascii value 104 would be 01101000 in binary Commented Sep 15, 2013 at 18:30
  • This question has been answered many times on stackoverflow: stackoverflow.com/questions/11599226/… stackoverflow.com/questions/8553310/… Commented Sep 15, 2013 at 18:32
  • possible duplicate of Convert Binary to ASCII and vice versa (Python) Commented Mar 12, 2014 at 10:59

10 Answers 10

172

Something like this?

>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'

#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
Sign up to request clarification or add additional context in comments.

7 Comments

Or if you want each binary number to be 1 byte: ' '.join(format(ord(i),'b').zfill(8) for i in st)
For full bytes you can also use ' '.join('{0:08b}'.format(ord(x), 'b') for x in st), which is about 35% faster than the zfill(8) solution (at least on my machine).
What about converting more-than-one-byte chars, like β, e.g., which seems to me represented by 11001110 10110010 internally?
I know this was posted long time ago, but what about non-ASCII characters?
Is there a way to reconstruct the original string from the bytearray one: 1101000 1100101 1101100 '?
|
133

If by binary you mean bytes type, you can just use encode method of the string object that encodes your string as a bytes object using the passed encoding type. You just need to make sure you pass a proper encoding to encode function.

In [9]: "hello world".encode('ascii')                                                                                                                                                                       
Out[9]: b'hello world'

In [10]: byte_obj = "hello world".encode('ascii')                                                                                                                                                           

In [11]: byte_obj                                                                                                                                                                                           
Out[11]: b'hello world'

In [12]: byte_obj[0]                                                                                                                                                                                        
Out[12]: 104

Otherwise, if you want them in form of zeros and ones --binary representation-- as a more pythonic way you can first convert your string to byte array then use bin function within map :

>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']
 

Or you can join it:

>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'

Note that in python3 you need to specify an encoding for bytearray function :

>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'

You can also use binascii module in python 2:

>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'

hexlify return the hexadecimal representation of the binary data then you can convert to int by specifying 16 as its base then convert it to binary with bin.

4 Comments

Not only this is more pythonic, but this is "more" correct for multi-byte non-ASCII strings.
Just to note that (at least for the current version 3.7.4): (1) bytearray expects an encoding (not just a string) and (2) map(bin, ...) will return the map object. For the first point, I use for instance bob.encoding('ascii')` as suggested by @Tao. For the second, point, using the join method, as in the other examples of @Kasramvd will display the desired result.
the "hello world".encode('ascii') is perfect
This is odd. In python3, I can do >>> bin(bytearray("g", 'utf8')[0]) # '0b1100111'. But, I cannot do >>> bin("g".encode("utf8"))
55

We just need to encode it.

'string'.encode('ascii')

1 Comment

For me (v3.7.4), this returns a bytes object (with the ascii representations of each byte, if available), and in order to display its binary representation, I need bin, e.g. with ' '.join(item[2:] for item in map(bin, 'bob'.encode('ascii'))) (note that 0b needs to be removed at the beginning of the binary representation of each character).
16

You can access the code values for the characters in your string using the ord() built-in function. If you then need to format this in binary, the string.format() method will do the job.

a = "test"
print(' '.join(format(ord(x), 'b') for x in a))

(Thanks to Ashwini Chaudhary for posting that code snippet.)

While the above code works in Python 3, this matter gets more complicated if you're assuming any encoding other than UTF-8. In Python 2, strings are byte sequences, and ASCII encoding is assumed by default. In Python 3, strings are assumed to be Unicode, and there's a separate bytes type that acts more like a Python 2 string. If you wish to assume any encoding other than UTF-8, you'll need to specify the encoding.

In Python 3, then, you can do something like this:

a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))

The differences between UTF-8 and ascii encoding won't be obvious for simple alphanumeric strings, but will become important if you're processing text that includes characters not in the ascii character set.

Comments

9

In Python version 3.6 and above you can use f-string to format result.

str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))

01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100
  • The left side of the colon, ord(i), is the actual object whose value will be formatted and inserted into the output. Using ord() gives you the base-10 code point for a single str character.

  • The right hand side of the colon is the format specifier. 08 means width 8, 0 padded, and the b functions as a sign to output the resulting number in base 2 (binary).

1 Comment

Note that you are overriding str
3
def method_a(sample_string):
    binary = ' '.join(format(ord(x), 'b') for x in sample_string)

def method_b(sample_string):
    binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))


if __name__ == '__main__':

    from timeit import timeit

    sample_string = 'Convert this ascii strong to binary.'

    print(
        timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
        timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
    )

# 9.564299999998184 2.943955828988692

method_b is substantially more efficient at converting to a byte array because it makes low level function calls instead of manually transforming every character to an integer, and then converting that integer into its binary value.

Comments

2

This is an update for the existing answers which used bytearray() and can not work that way anymore:

>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding

Because, as explained in the link above, if the source is a string, you must also give the encoding:

>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>

Comments

0
''.join(format(i, 'b') for i in bytearray(str, encoding='utf-8'))

This works okay since its easy to now revert back to the string as no zeros will be added to reach the 8 bits to form a byte hence easy to revert to string to avoid complexity of removing the zeros added.

Comments

0

Here is a comparison of bit lengths in various encodings of ASCII 127 (delete). Note the respective 24, 16, and 32 bit byte order mark (BOM) in UTF-8-SIG, UTF-16, and UTF-32:

>>> for encoding in ('utf-8', 'utf-8-sig', 'utf-16', 'utf-16-le', 'utf-16-be', 'utf-32', 'utf-32-le', 'utf-32-be'): print(''.join(' '.join((f'{encoding:9}', f'{len(bs):2}', bs)) for bs in [''.join(f'{byte:08b}' for byte in '\x7f'.encode(encoding))]))
...
utf-8      8 01111111
utf-8-sig 32 11101111101110111011111101111111
utf-16    32 11111111111111100111111100000000
utf-16-le 16 0111111100000000
utf-16-be 16 0000000001111111
utf-32    64 1111111111111110000000000000000001111111000000000000000000000000
utf-32-le 32 01111111000000000000000000000000
utf-32-be 32 00000000000000000000000001111111

Comments

-2
a = list(input("Enter a string\t: "))
def fun(a):
    c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
    return c
print(fun(a))

1 Comment

Would you like to augment this unreadable code-only answer with some explanation? That would help fighting the misconception that StackOverflow is a free code writing service. In case you want to improve readability, try the info provided here: stackoverflow.com/editing-help

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.