How to turn a binary string into a byte?

Question

If I take the letter 'à' and encode it in UTF-8 I obtain the following result:

'à'.encode('utf-8')
>> b'\xc3\xa0'

Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:

byte = bytearray('à','utf-8')
for x in byte:
    print(bin(x))

I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:

s = '1100001110100000'
value1 =  s[0:8].encode('utf-8')
value2 =  s[9:16].encode('utf-8')
value = value1 + value2
print(chr(int(value, 2)))
>> 憠

No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.

Mark Ransom · Accepted Answer · 2018-11-21 23:50:33Z

3

>>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
'à'

There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.

answered Nov 21, 2018 at 23:50

Mark Ransom

310k44 gold badges423 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

torek Over a year ago

Note also that the OP can use ''.join('{:08b}'.format(i) for i in byte) on the original byte-array object. This is pretty similar: we take the byte-array apart, one byte at a time, and format each one using :08b to get an eight-bit zero-filled string representation, then join all the strings without whitespace.

Joran Beasley · Accepted Answer · 2018-11-21 23:51:54Z

0

you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000

you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)

s = '1100001110100000'
value1 =  bytearray([int(s[:8],2), # bits 0..7 (8 total)
                     int(s[8:],2)] # bits 8..15 (8 total)
) 
print(value1.decode("utf8"))

answered Nov 21, 2018 at 23:51

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

Comments

Mark Tolonen · Accepted Answer · 2018-11-22 07:29:19Z

0

Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):

>>> s = '1100001110100000'
>>> int(s,2)
50080
>>> int(s,2).to_bytes(len(s)//8,'big')
b'\xc3\xa0'
>>> int(s,2).to_bytes(len(s)//8,'big').decode()
'à'

answered Nov 22, 2018 at 7:29

Mark Tolonen

181k26 gold badges182 silver badges278 bronze badges

Collectives™ on Stack Overflow

How to turn a binary string into a byte?

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related