0

Why in Python 3 would the following code

print(str(b"Hello"))

output b'Hello' instead of just Hello as it happens with regular text strings? It looks like ultimately explicit, would-be-easy creating a str object from the most related binary string type is so counter-intuitive.

2
  • It prints Hello in py2 but b'Hello' in py3. Commented Jan 9, 2015 at 13:50
  • 1
    @BhargavRao: that's because in Python 2, b'Hello' is already a string. b'' is just an alias for '', for forward compatibility. Commented Jan 9, 2015 at 13:52

3 Answers 3

2

In Python 3, bytes.__str__ is not defined, so bytes.__repr__ is used instead, when you use str() on the object. Note that print() also calls str() on objects passed in, so the call is entirely redundant here.

If you are expecting text, decode explicitly instead:

print(b'Hello'.decode('ascii'))

The str() type can handle bytes objects explicitly, but only if (again) you provide an explicit codec to decode the bytes with first:

print(str(b'Hello', 'ascii'))

The documentation is very explicit about this behaviour:

If neither encoding nor errors is given, str(object) returns object.__str__(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a __str__() method, then str() falls back to returning repr(object).

If at least one of encoding or errors is given, object should be a bytes-like object (e.g. bytes or bytearray). In this case, if object is a bytes (or bytearray) object, then str(bytes, encoding, errors) is equivalent to bytes.decode(encoding, errors).

and

Passing a bytes object to str() without the encoding or errors arguments falls under the first case of returning the informal string representation.

Emphasis mine.

Sign up to request clarification or add additional context in comments.

3 Comments

So I do get a Unicode text string from str(b"Hello") but its contents are simply confusing?
@DesmondHume: you get a representation. A debugging tool. You get the same information for any other object that implements a helpful __repr__ representation. For bytes that is a string that lets you recreate the exact same value in another Python script or in the interactive interpreter.
One thing to know about it is that if you run Python with -b option such cases cause warnings or errors if using -bb option.
0

Why do you want this to "work"? A bytes object is a bytes object, and its string representation in Python 3 is on that form. You can convert it's contents to a proper text string (in Python3 - which in Python2 would be "unicode" objects) you have to decode it to text.

And for that you need to know the encoding -

Try the following instead:

print(b"Hello".decode("latin-1"))

Note the assumed "latin-1" text codec which will translate transparently codes not in ASCII range (128-256) to unicode. It is the codec used by default by Windows for western-European languages.

The "utf-8" codec can represent a much larger range of characters, and is the preferred encoding for international text - but if your byte string is not properly composed of utf-8 characters you might have an UnicodeDecode error on the process.

Please read http://www.joelonsoftware.com/articles/Unicode.html to proper undestand what text is about.

Comments

0

Beforehand, sorry for my English...

Hey, I had this problem some weeks ago. It works as the people above said. Here is a tip if the exceptions of the decoding process do not matter. In this case you can use:

bytesText.decode(textEncoding, 'ignore')

Ex:

>>> b'text \xab text'.decode('utf-8', 'ignore')  # Using UTF-8 is nice as you might know!
'text  text'                                     # As you can see, the « (\xab) symbol was
                                                 # ignored :D

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.