2

I'm having problems with reading utf-8 data from a MySQL DB using Python. My DB contains one table named Videos, and the table contains at least one row that has Unicode chars, i.e.

[KR] Samsung Galaxy Beam 2 간단 리뷰 [4K]

The collation of the table is utf8_general_ci, just like the collation of the fields in the table.

This is the code I wrote in order to fetch all the data from my table:

# Open database connection
db = MySQLdb.connect("localhost","matan","pass","youtube", charset = 'utf8',use_unicode=True)

# prepare a cursor object using cursor() method
cursor = db.cursor()

# Prepare SQL query to INSERT a record into the database.
sql = "SELECT * FROM VIDEOS"
try:
   # Execute the SQL command
   cursor.execute(sql)
   # Fetch all the rows in a list of lists.
   results = cursor.fetchall()
   for row in results:
      title = row[0]
      link = row[1]
      # Now print fetched result
      print ("title=%s\nlink=%s\n\n" % \
            (title, link))
except:
   print "Error: unable to fecth data"

# disconnect from server
db.close()

When I run the above code, it prints all the rows that contain only "ascii" chars, but when it gets to a row that contains Unicode char (i.e. the line I mentioned above), it prints :

File "C:\Users\Matan\Dropbox\Code\Python\youtube.py", line 28, in printall (title, link)) File "C:\Python27\lib\encodings\cp862.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode characters in position 33-34: c haracter maps to <undefined>

and doesn't continue to the next row.

I'm using PhpMyAdmin version 4.1.14, MySQL version 5.6.17, and Python version 2.7.8 .

Edit: I dropped the except clause, and updated the error I've got.

2
  • If you really want to know what goes wrong, first get rid of this more-than-useless bare except clause and let the real exception propagate. Then please come back and post the full traceback. Just for the record: since you pass use_unicode=True to your connection, all you get back from the db are unicode strings (type Unicode, not type str). You have to encode them to the proper encoding before printing them (the proper encoding being system-dependant). Commented Dec 3, 2014 at 16:05
  • I edited my question with your advice. thanks. Commented Dec 3, 2014 at 17:01

1 Answer 1

3

Your problem is with your terminal (sys.stdout) encoding (cf http://en.wikipedia.org/wiki/Code_page_862), which depends on your system's settings. The best solution (as explained here : https://stackoverflow.com/a/15740694/41316) is to explicitely encode your unicode data before printing them to sys.stdout.

If you can't use a more usable encoding (utf-8 comes to mind, as it has been designed to handle all unicode characters), you can at least use an alternative error handling like "replace" (replaces non-encodable characters with '?') or "ignore" (suppress non-encodable characters).

Here's a corrected version of your code, you can play with the encoding and on_error settings to find out what solution works for you:

import sys
import MySQLdb

# set desired output encoding here
# it looks like your default encoding is "cp862"
# but you may want to first try 'utf-8' first
# encoding = "cp862"
encoding = "utf-8" 

# what do when we can't encode to the desired output encoding
# options are:
# - 'strict' : raises a UnicodeEncodeError (default)
# - 'replace': replaces missing characters with '?'
# - 'ignore' : suppress missing characters
on_error = "replace" 

db = MySQLdb.connect(
   "localhost","matan","pass","youtube", 
   charset='utf8',
   use_unicode=True
   )
cursor = db.cursor()
sql = "SELECT * FROM VIDEOS"
try:
   cursor.execute(sql)
   for i, row in enumerate(cursor):
      try:
         # encode unicode data to the desired output encoding
         title = row[0].encode(encoding, on_error)
         link = row[1].encode(encoding, on_error)
      except UnicodeEncodeError as e:
         # only if on_error='strict'
         print >> sys.stderr, "failed to encode row #%s - %s" % (i, e)
      else:
         print "title=%s\nlink=%s\n\n" % (title, link))
finally:
   cursor.close()
   db.close()

NB : you may also want to read this (specially the comments) http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/ for more on Python, strings, unicode, encoding, sys.stdout and terminal issues.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.