python, UnicodeEncodeError, converting unicode to ascii

Question

Firstly, I am pretty new to python, so forgive me for all the n00b stuff. So the application logic in Python goes like this:

I am sending and SQL Select to database and it returns an array of data.
I need to take this data and use it in another SQL insert sentence.

Now the problem is, that SQL query returns me unicode strings. The output from select is something like this:

(u'Abc', u'Lololo', u'Fjordk\xe6r')

So first I was trying to convert it string, but it fails as the third element contains this german 'ae' letter:

for x in data[0]:
    str_data.append(str(x))

I am getting: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 6: ordinal not in range(128)

I can insert unicode straightly to insert also as TypeError occurs. TypeError: coercing to Unicode: need string or buffer, NoneType found

Any ideas?

stackoverflow.com/questions/2365411/…

Ofiris
– Ofiris

2013-05-22 17:21:37 +00:00
Commented May 22, 2013 at 17:21 — Ofiris
– Ofiris, Commented May 22, 2013 at 17:21

Mezgrman · Accepted Answer · 2013-05-22 17:27:02Z

7

From my experiences, Python and Unicode are often a problem.

Generally speaking, if you have a Unicode string, you can convert it to a normal string like this:

normal_string = unicode_string.encode('utf-8')

And convert a normal string to a Unicode string like this:

unicode_string = normal_string.decode('utf-8')

answered May 22, 2013 at 17:27

Mezgrman

8866 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mark Ransom Over a year ago

'utf-8' is usually the right choice, but not always. You should use the same character set that your database is configured for.

Erki M. Over a year ago

Ok, i finally found how to force python to be in UTF-8 by default: def set_default_encoding(): import sys reload(sys) #to make setdefaultencoding available; IDK why sys.setdefaultencoding("UTF-8")

Mezgrman Over a year ago

This sounds useful. I'll try it out too!

Community · Accepted Answer · 2017-05-23 10:29:23Z

The issue here is that str function tries to convert unicode using ascii codepage, and ascii codepage doesn't have mapping for u\xe6 (æ - char reference here).

Therefore you need to convert it to some codepage which supports the char. Nowdays the most usual is utf-8 encoding.

>>> x = (u'Abc', u'Lololo', u'Fjordk\xe6r')
>>> print x[2].encode("utf8")
Fjordkær
>>> x[2].encode("utf-8")
'Fjordk\xc3\xa6r'

On the other hand you may try to convert it to cp1252 - Western latin alphabet which supports it:

>>> x[2].encode("cp1252")
'Fjordk\xe6r'

But Eeaster european charset cp1250 doesn't support it:

>>> x[2].encode("cp1250")
...
UnicodeEncodeError: 'charmap' codec can't encode character u'\xe6' in position 6: character maps to <undefined>

The issue with unicode in python is very common, and I would suggest following:

understand what unicode is
understand what utf-8 is (it is not unicode)
understand ascii and other codepages
recommended conversion workflow: input (any cp) -> convert to unicode -> (process) -> output to utf-8

Collectives™ on Stack Overflow

python, UnicodeEncodeError, converting unicode to ascii

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related