0

I have a word in russian: "привет". It is encoded into utf-8 bytes using 'привет'.encode('utf-8') the result is python bytes object represented as:

b'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'

Now I saved it inside a file and when I read that file I get this string: "b'\\xd0\\xbf\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'"

How do I decode this string into the original word?

It is not the bytes object I'm trying to decode but a string, so

"b'\\xd0\\xbf\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'".decode('utf-8') 

returns AttributeError: 'str' object has no attribute 'decode'

The way I save it to a file is simply by calling logger.info(x.encode('utf-8')) which is

import logging 
logger = logging.getLogger('GENERATOR_DYNAMICS')

and the way I read a file is

with open('file.log') as f:
    logs = f.readlines()
6
  • 1st hin on duckduckgo with python decode byte string utf8 was the dupe - you did not really search a lot, did you? Please read How to Ask - first line of duty is doing research. (not my dv btw) Commented Oct 6, 2020 at 14:19
  • @PatrickArtner is it not the byte object to decode that is the problem, I'm trying to decode a string Commented Oct 6, 2020 at 14:26
  • maybe you could edit your post and show how you write to the file, how you read from the file and whats the exact problem with it. If you write (binary) into a file and read (binary) from a file you get the (binary) values back. Commented Oct 6, 2020 at 14:29
  • 1
    if you write the stringrepresentation of you bytearray into a textfile, you need to get it into a bytearray again: import ast + print("b'\\xd0\\xbf\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'", ast.literal_eval("b'\\xd0\\xbf\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'" ).decode("utf8")) Commented Oct 6, 2020 at 14:34
  • @PatrickArtner thank you, that is exactly what I was looking for Commented Oct 6, 2020 at 14:38

1 Answer 1

2

Your problems are two fold:

  • you got the stringrepresentation of a bytearray (from a file, but thats kindof irrelevant)
  • you want to get the bytearray back to utf8 text

So the solution is two steps as well:

import ast

# convert string representation back into binary
string_rep = "b'\\xd0\\xbf\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'"
as_binary = ast.literal_eval(string_rep)

# convert binary to utf8
text = as_binary.decode("utf8")
 

to get 'привет' again.

The last part is a duplicate of Python3: Decode UTF-8 bytes converted as string

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.