Properly getting blobs from mysql database with mysql connector in python

Question

When executing the following code:

import mysql.connector
connection = mysql.connector.connect(...) # connection params here
cursor = connection.cursor()
cursor.execute('create table test_table(value blob)')
cursor.execute('insert into test_table values (_binary %s)', (np.random.sample(10000).astype('float').tobytes(),))
cursor.execute('select * from test_table')
cursor.fetchall()

I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 1: invalid start byte

(...and then a stack trace which I don't think is useful here)

It seems that mysql connector converts my blob to string (and fails to do so). How can I fetch this data as bytes without any conversion?

AHalvar · Accepted Answer · 2018-11-25 14:31:50Z

We ran into the same issue that BLOBs were mistakenly read back as UTF-8 strings with MySQL 8.0.13, mysql-connector-python 8.0.13 and sqlalchemy 1.2.14.

What did the trick for us was enabling the use_pure option of MySQL Connector. The default of use_pure had changed in 8.0.11 with the new default being to use the C Extension. Thus, we set back the option:

create_engine(uri, connect_args={'use_pure': True}, ...)

Details of our error and stack trace:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
    ....
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 272, in execute
        self._handle_result(result)
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 163, in _handle_result
        self._handle_resultset()
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/cursor_cext.py", line 651, in _handle_resultset
        self._rows = self._cnx.get_rows()[0]
    File "/usr/local/lib/python3.6/site-packages/mysql/connector/connection_cext.py", line 273, in get_rows
        row = self._cmysql.fetch_row()
    SystemError: <built-in method fetch_row of _mysql_connector.MySQL object at 0x5627dcfdf9f0> returned a result with an error set

Eric Leschinski · Accepted Answer · 2019-03-14 13:42:24Z

I reproduced above error:

Traceback (most recent call last):
File "demo.py", line 16, in <module>
    cursor.execute(query, ())
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte '0xff ... ' 
in position 0: invalid start byte

Using versions:

$  python --version
Python 2.7.10

>>> mysql.connector.__version__
'8.0.15'

With python code

#!/usr/bin/python
# -*- coding: utf-8 -*-
import mysql.connector
conn = mysql.connector.connect(
      user='asdf', 
      password='asdf',
      host='1.2.3.4',
      database='the_db',
      connect_timeout=10)

cursor = conn.cursor(buffered=True)                     #error is raised here
try:
    query = ("SELECT data_blob FROM blog.cmd_table")
    cursor.execute(query, ())                         
except mysql.connector.Error as err:                    #error is caught here
    #error is caught here, and printed:
    print(err)                                          #printed thustly

Using a python variable "raw byte binary" populated by python's open( like this:

def read_file_as_blob(filename):
    #r stands for read
    #b stands for binary
    with open(filename, 'rb') as f:
        data = f.read()
    return data

So the problem is somewhere between the encoding transform of data in the file -> the encoding of data for mysql blob -> and how mysql lifts that blob and converts it back to utf-8.

Two solutions:

Solution 1 is exactly as AHalvar said, set use_pure=True parameter and pass to mysql.connector.connect( ... ). Then mysteriously it just works. But good programmers will note that deferring to mysterious incantation is a bad code smell. Fixes by brownian motion incur technical debt.

Solution 2 is to encode your data early and often, and prevent double re-encoding and double data decoding which is the source of these problems. Lock it down to a common encoding format as soon as possible.

The gratifying solution for me was forcing utf-8 encoding earlier in the process. Enforcing UTF-8 everywhere.

data.encode('UTF-8')

The unicode pile of poo represents my opinion on such babysitting of character encoding between various devices on different operating systems and encoding schemes.

Had to use_pure already because CEXT doesn't support prepared statements. But that didn't help me. Seems as if there's a bug in MySQL Python Connector that breaks all BINARY fields when using prepared statements. It just insists on decoding them as some kind of codec. Worked around it by (cringe) changing to character encoding latin1 which can't fail to decode. Pile of unicode poo indeed.

gastro · Accepted Answer · 2018-10-25 14:59:43Z

2

Apparently, this is a known issue with the Python 'mysql' module. Try to use 'pymysql' instead.

answered Oct 25, 2018 at 14:59

gastro

1164 bronze badges

1 Comment

Amir Forsati Over a year ago

Add such advice under comments, not answers.

iagerogiannis · Accepted Answer · 2020-08-12 20:08:18Z

1

Another way is to use raw=True parameter at initialization of connection:

connection = mysql.connector.connect(
    host="localhost",
    user="user",
    password="password",
    database="database",
    raw=True
)

answered Aug 12, 2020 at 20:08

iagerogiannis

3743 silver badges21 bronze badges

Collectives™ on Stack Overflow

Properly getting blobs from mysql database with mysql connector in python

4 Answers 4

Comments

I reproduced above error:

Two solutions:

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

I reproduced above error:

Two solutions:

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related