How to convert string to binary?

Question

I am in need of a way to get the binary representation of a string in python. e.g.

st = "hello world"
toBinary(st)

Is there a module of some neat way of doing this?

By "binary", do you mean 0101010 type or the ordinal number of each character in (e.g. hex)? — cdarke
– cdarke, Commented Sep 15, 2013 at 18:23
Assuming that you actually mean binary (zeros and ones), do you want a binary representation of each character (8 bits per character) one after another? e.g. h is ascii value 104 would be 01101000 in binary — ChrisProsser
– ChrisProsser, Commented Sep 15, 2013 at 18:30
This question has been answered many times on stackoverflow: stackoverflow.com/questions/11599226/… stackoverflow.com/questions/8553310/… — 0xcaff
– 0xcaff, Commented Sep 15, 2013 at 18:32
possible duplicate of Convert Binary to ASCII and vice versa (Python) — jfs
– jfs, Commented Mar 12, 2014 at 10:59

Akshay Pratap Singh · Accepted Answer · 2020-05-22 18:56:06Z

172

Something like this?

>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'

#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'

edited May 22, 2020 at 18:56

Akshay Pratap Singh

3,3372 gold badges27 silver badges33 bronze badges

answered Sep 15, 2013 at 18:24

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ChrisProsser Over a year ago

Or if you want each binary number to be 1 byte: ' '.join(format(ord(i),'b').zfill(8) for i in st)

max Over a year ago

For full bytes you can also use ' '.join('{0:08b}'.format(ord(x), 'b') for x in st), which is about 35% faster than the zfill(8) solution (at least on my machine).

Sergey Bushmanov Over a year ago

What about converting more-than-one-byte chars, like β, e.g., which seems to me represented by 11001110 10110010 internally?

Mia Over a year ago

I know this was posted long time ago, but what about non-ASCII characters?

E. Erfan Over a year ago

Is there a way to reconstruct the original string from the bytearray one: 1101000 1100101 1101100 '?

|

Kasravnd · Accepted Answer · 2021-02-02 19:38:36Z

133

If by binary you mean bytes type, you can just use encode method of the string object that encodes your string as a bytes object using the passed encoding type. You just need to make sure you pass a proper encoding to encode function.

In [9]: "hello world".encode('ascii')                                                                                                                                                                       
Out[9]: b'hello world'

In [10]: byte_obj = "hello world".encode('ascii')                                                                                                                                                           

In [11]: byte_obj                                                                                                                                                                                           
Out[11]: b'hello world'

In [12]: byte_obj[0]                                                                                                                                                                                        
Out[12]: 104

Otherwise, if you want them in form of zeros and ones --binary representation-- as a more pythonic way you can first convert your string to byte array then use bin function within map :

>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']

Or you can join it:

>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'

Note that in python3 you need to specify an encoding for bytearray function :

>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'

You can also use binascii module in python 2:

>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'

hexlify return the hexadecimal representation of the binary data then you can convert to int by specifying 16 as its base then convert it to binary with bin.

edited Feb 2, 2021 at 19:38

answered Jun 4, 2015 at 10:58

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

4 Comments

Sergey Bushmanov Over a year ago

Not only this is more pythonic, but this is "more" correct for multi-byte non-ASCII strings.

Antoine Over a year ago

Just to note that (at least for the current version 3.7.4): (1) bytearray expects an encoding (not just a string) and (2) map(bin, ...) will return the map object. For the first point, I use for instance bob.encoding('ascii')` as suggested by @Tao. For the second, point, using the join method, as in the other examples of @Kasramvd will display the desired result.

F.Tamy Over a year ago

the "hello world".encode('ascii') is perfect

Slackware Over a year ago

This is odd. In python3, I can do >>> bin(bytearray("g", 'utf8')[0]) # '0b1100111'. But, I cannot do >>> bin("g".encode("utf8"))

Tao · Accepted Answer · 2018-10-11 13:51:10Z

55

We just need to encode it.

'string'.encode('ascii')

answered Oct 11, 2018 at 13:51

Tao

5794 silver badges4 bronze badges

1 Comment

Antoine Over a year ago

For me (v3.7.4), this returns a bytes object (with the ascii representations of each byte, if available), and in order to display its binary representation, I need bin, e.g. with ' '.join(item[2:] for item in map(bin, 'bob'.encode('ascii'))) (note that 0b needs to be removed at the beginning of the binary representation of each character).

Mark R. Wilkins · Accepted Answer · 2013-09-15 18:42:26Z

You can access the code values for the characters in your string using the ord() built-in function. If you then need to format this in binary, the string.format() method will do the job.

a = "test"
print(' '.join(format(ord(x), 'b') for x in a))

(Thanks to Ashwini Chaudhary for posting that code snippet.)

While the above code works in Python 3, this matter gets more complicated if you're assuming any encoding other than UTF-8. In Python 2, strings are byte sequences, and ASCII encoding is assumed by default. In Python 3, strings are assumed to be Unicode, and there's a separate bytes type that acts more like a Python 2 string. If you wish to assume any encoding other than UTF-8, you'll need to specify the encoding.

In Python 3, then, you can do something like this:

a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))

The differences between UTF-8 and ascii encoding won't be obvious for simple alphanumeric strings, but will become important if you're processing text that includes characters not in the ascii character set.

Markus Dutschke · Accepted Answer · 2020-02-04 09:41:07Z

9

In Python version 3.6 and above you can use f-string to format result.

str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))

01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100

The left side of the colon, ord(i), is the actual object whose value will be formatted and inserted into the output. Using ord() gives you the base-10 code point for a single str character.
The right hand side of the colon is the format specifier. 08 means width 8, 0 padded, and the b functions as a sign to output the resulting number in base 2 (binary).

edited Feb 4, 2020 at 9:41

Markus Dutschke

10.8k5 gold badges73 silver badges67 bronze badges

answered Jun 20, 2019 at 19:23

Vlad Bezden

90.7k27 gold badges261 silver badges190 bronze badges

1 Comment

meni181818 Over a year ago

Note that you are overriding str

Ben · Accepted Answer · 2018-07-31 13:31:32Z

def method_a(sample_string):
    binary = ' '.join(format(ord(x), 'b') for x in sample_string)

def method_b(sample_string):
    binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))


if __name__ == '__main__':

    from timeit import timeit

    sample_string = 'Convert this ascii strong to binary.'

    print(
        timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
        timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
    )

# 9.564299999998184 2.943955828988692

method_b is substantially more efficient at converting to a byte array because it makes low level function calls instead of manually transforming every character to an integer, and then converting that integer into its binary value.

Billal BEGUERADJ · Accepted Answer · 2018-05-11 11:13:02Z

2

This is an update for the existing answers which used bytearray() and can not work that way anymore:

>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding

Because, as explained in the link above, if the source is a string, you must also give the encoding:

>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>

answered May 11, 2018 at 11:13

Billal BEGUERADJ

23k45 gold badges125 silver badges142 bronze badges

Comments

Francis kinyuru · Accepted Answer · 2022-07-07 17:34:57Z

0

''.join(format(i, 'b') for i in bytearray(str, encoding='utf-8'))

This works okay since its easy to now revert back to the string as no zeros will be added to reach the 8 bits to form a byte hence easy to revert to string to avoid complexity of removing the zeros added.

answered Jul 7, 2022 at 17:34

Francis kinyuru

237 bronze badges

Comments

Cees Timmerman · Accepted Answer · 2023-10-17 21:18:27Z

Here is a comparison of bit lengths in various encodings of ASCII 127 (delete). Note the respective 24, 16, and 32 bit byte order mark (BOM) in UTF-8-SIG, UTF-16, and UTF-32:

>>> for encoding in ('utf-8', 'utf-8-sig', 'utf-16', 'utf-16-le', 'utf-16-be', 'utf-32', 'utf-32-le', 'utf-32-be'): print(''.join(' '.join((f'{encoding:9}', f'{len(bs):2}', bs)) for bs in [''.join(f'{byte:08b}' for byte in '\x7f'.encode(encoding))]))
...
utf-8      8 01111111
utf-8-sig 32 11101111101110111011111101111111
utf-16    32 11111111111111100111111100000000
utf-16-le 16 0111111100000000
utf-16-be 16 0000000001111111
utf-32    64 1111111111111110000000000000000001111111000000000000000000000000
utf-32-le 32 01111111000000000000000000000000
utf-32-be 32 00000000000000000000000001111111

Tomerikoo · Accepted Answer · 2019-07-30 22:23:08Z

-2

a = list(input("Enter a string\t: "))
def fun(a):
    c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
    return c
print(fun(a))

edited Jul 30, 2019 at 22:23

Tomerikoo

19.6k16 gold badges57 silver badges68 bronze badges

answered Jul 30, 2019 at 18:34

Solo Ship

1

1 Comment

Yunnosch Over a year ago

Would you like to augment this unreadable code-only answer with some explanation? That would help fighting the misconception that StackOverflow is a free code writing service. In case you want to improve readability, try the info provided here: stackoverflow.com/editing-help

Collectives™ on Stack Overflow

How to convert string to binary?

10 Answers 10

7 Comments

4 Comments

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

7 Comments

4 Comments

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Comments

1 Comment

Linked

Related