0

I have data on an SQL database (MariaDB), some of which contain UTF-8 characters (ÄÖÅ mostly). When printing this data in Python, I don't get the correct characters. However, if I print UTF-8 characters directly (for exampleprint("ÖÖ ää öö")), it works.

In my .py i have # -*- coding: utf-8 -*- and in my .sql I have SET character_set_server = "utf8";

1 Answer 1

0

http://mysql.rjweb.org/doc.php/charcoll#python says

1st or 2nd line in source code: # -- coding: utf-8 --

Python code for dumping hex (etc) for string 'u':

for i, c in enumerate(u): print i, '%04x' % ord(c), unicodedata.category(c), print unicodedata.name(c)

Miscellany notes on coding for utf8:

⚈  db = MySQLdb.connect(host=DB_HOST, user=DB_USER, passwd=DB_PASS, db=DB_NAME, charset="utf8", use_unicode=True)
⚈  conn = MySQLdb.connect(host="localhost", user='root', password='', db='', charset='utf8')
⚈  cursor.execute("SET NAMES utf8mb4;") -- not as good as using `charset'
⚈  db.set_character_set('utf8'), implies use_unicode=True
⚈  Literals should be u'...'
⚈  MySQL-python 1.2.4 fixes a bug wherein varchar(255) CHARACTER SET utf8 COLLATE utf8_bin is treated like a BLOB.

Checklist:

⚈  `# -*- coding: utf-8 -*-` -- (you have that)
⚈  `charset='utf8'` in `connect()` call -- Is that buried in `bottle_mysql.Plugin`? (Note: Try 'utf-8' and 'utf8')
⚈  Text encoded in utf8.
⚈  No need for encode() or decode() if you are willing to accept utf8 everywhere.
⚈  `u'...'` for literals
⚈  `` near start of html page
⚈  Content-Type: text/html; charset=UTF-8 (in HTTP response header)
⚈  header('Content-Type: text/html; charset=UTF-8'); (in PHP to get that response header)
⚈  `CHARACTER SET utf8 COLLATE utf8_general_ci` on column (or table) definition in MySQL.
⚈  utf8 all the way through

References:

⚈  https://docs.python.org/2/howto/unicode.html#the-unicode-type
⚈  http://stackoverflow.com/questions/9154998/python-encoding-mysql
⚈  http://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html

The Python language environment officially only uses UCS-2 internally since version 2.0, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Since Python 2.2, "wide" builds of Unicode are supported which use UTF-32 instead;[16] these are primarily used on Linux. Python 3.3 no longer ever uses UTF-16, instead strings are stored in one of ASCII/Latin-1, UCS-2, or UTF-32, depending on which code points are in the string, with a UTF-8 version also included so that repeated conversions to UTF-8 are fast.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.