1

I used lxml to parse some web page as below:

>>> doc = lxml.html.fromstring(htmldata)
>>> element in doc.cssselect(sometag)[0]
>>> text = element.text_content()
>>> print text
u'Waldenstr\xf6m'

Why it prints u'Waldenstr\xf6m' but not "Waldenström" here?

After that, I tried to add this text to a MySQL table with UTF-8 character set and utf8_general_ci collatio, Users is a Django model:

>>> Users.objects.create(last_name=text)
'ascii' codec can't encode character u'\xf6' in position 9: ordinal not in range(128)

What I was doing wrong here? How can I get the the correct data "Waldenström" and write it to database?

2 Answers 2

2

you want text.encode('utf8')

Sign up to request clarification or add additional context in comments.

Comments

0
>>> print text
u'Waldenstr\xf6m'

There is a difference between displaying something in the shell (which uses the repr) and printing it (which just spits out the string):

>>> u'Waldenstr\xf6m'
u'Waldenstr\xf6m'

>>> print u'Waldenstr\xf6m'
Waldenström

So, I'm not sure your snippet above is really what happened. If it definitely is, then your XHTML must contain exactly that string:

<div class="something">u'Waldenstr\xf6m'</div>

(maybe it was incorrectly generated by Python using a string's repr() instead of its str()?)

If this is right and intentional, you would need to parse that Python string literal into a simple string. One way of doing that would be:

>>> r= r"u'Waldenstr\xf6m'"
>>> print r[2:-1].decode('unicode-escape')
Waldenström

If the snippet at the top is actually not quite right and you are simply asking why Python's repr escapes all non-ASCII characters, the answer is that printing non-ASCII to the console is unreliable across various environments so the escape is safer. In the above examples you might have received ?s or worse instead of the ö if you were unlucky.

In Python 3 this changes:

>>> 'Waldenstr\xf6m'
'Waldenström'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.