0

Firstly, I am pretty new to python, so forgive me for all the n00b stuff. So the application logic in Python goes like this:

  1. I am sending and SQL Select to database and it returns an array of data.
  2. I need to take this data and use it in another SQL insert sentence.

Now the problem is, that SQL query returns me unicode strings. The output from select is something like this:

(u'Abc', u'Lololo', u'Fjordk\xe6r')

So first I was trying to convert it string, but it fails as the third element contains this german 'ae' letter:

for x in data[0]:
    str_data.append(str(x))

I am getting: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 6: ordinal not in range(128)

I can insert unicode straightly to insert also as TypeError occurs. TypeError: coercing to Unicode: need string or buffer, NoneType found

Any ideas?

1

2 Answers 2

7

From my experiences, Python and Unicode are often a problem.

Generally speaking, if you have a Unicode string, you can convert it to a normal string like this:

normal_string = unicode_string.encode('utf-8')

And convert a normal string to a Unicode string like this:

unicode_string = normal_string.decode('utf-8')
Sign up to request clarification or add additional context in comments.

3 Comments

'utf-8' is usually the right choice, but not always. You should use the same character set that your database is configured for.
Ok, i finally found how to force python to be in UTF-8 by default: def set_default_encoding(): import sys reload(sys) #to make setdefaultencoding available; IDK why sys.setdefaultencoding("UTF-8")
This sounds useful. I'll try it out too!
4

The issue here is that str function tries to convert unicode using ascii codepage, and ascii codepage doesn't have mapping for u\xe6 (æ - char reference here).

Therefore you need to convert it to some codepage which supports the char. Nowdays the most usual is utf-8 encoding.

>>> x = (u'Abc', u'Lololo', u'Fjordk\xe6r')
>>> print x[2].encode("utf8")
Fjordkær
>>> x[2].encode("utf-8")
'Fjordk\xc3\xa6r'

On the other hand you may try to convert it to cp1252 - Western latin alphabet which supports it:

>>> x[2].encode("cp1252")
'Fjordk\xe6r'

But Eeaster european charset cp1250 doesn't support it:

>>> x[2].encode("cp1250")
...
UnicodeEncodeError: 'charmap' codec can't encode character u'\xe6' in position 6: character maps to <undefined>

The issue with unicode in python is very common, and I would suggest following:

  • understand what unicode is
  • understand what utf-8 is (it is not unicode)
  • understand ascii and other codepages
  • recommended conversion workflow: input (any cp) -> convert to unicode -> (process) -> output to utf-8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.