70

I can connect to my local mysql database from python, and I can create, select from, and insert individual rows.

My question is: can I directly instruct mysqldb to take an entire dataframe and insert it into an existing table, or do I need to iterate over the rows?

In either case, what would the python script look like for a very simple table with ID and two data columns, and a matching dataframe?

1
  • 1
    Do you need to use mysqldb, or are you ok with another MySQL connector? Commented Jan 23, 2018 at 2:11

9 Answers 9

113

Update:

There is now a to_sql method, which is the preferred way to do this, rather than write_frame:

df.to_sql(con=con, name='table_name_for_df', if_exists='replace', flavor='mysql')

Also note: the syntax may change in pandas 0.14...

You can set up the connection with MySQLdb:

from pandas.io import sql
import MySQLdb

con = MySQLdb.connect()  # may need to add some other options to connect

Setting the flavor of write_frame to 'mysql' means you can write to mysql:

sql.write_frame(df, con=con, name='table_name_for_df', 
                if_exists='replace', flavor='mysql')

The argument if_exists tells pandas how to deal if the table already exists:

if_exists: {'fail', 'replace', 'append'}, default 'fail'
     fail: If table exists, do nothing.
     replace: If table exists, drop it, recreate it, and insert data.
     append: If table exists, insert data. Create if does not exist.

Although the write_frame docs currently suggest it only works on sqlite, mysql appears to be supported and in fact there is quite a bit of mysql testing in the codebase.

Sign up to request clarification or add additional context in comments.

6 Comments

Andy - any thoughts on how to handle this with pandas 0.20.2's deprecation of the 'mysql' flavor?
Excellent, not sure why I didn't figure that from my searches, but that did the trick, thank you!
Such approach (sql.write_frame) is deprecated - stackoverflow.com/questions/38487878/…
What is the default schema? From pandas docs: "schemastr, optional Specify the schema (if database flavor supports this). If None, use default schema."
|
30

Andy Hayden mentioned the correct function (to_sql). In this answer, I'll give a complete example, which I tested with Python 3.5 but should also work for Python 2.7 (and Python 3.x):

First, let's create the dataframe:

# Create dataframe
import pandas as pd
import numpy as np

np.random.seed(0)
number_of_samples = 10
frame = pd.DataFrame({
    'feature1': np.random.random(number_of_samples),
    'feature2': np.random.random(number_of_samples),
    'class':    np.random.binomial(2, 0.1, size=number_of_samples),
    },columns=['feature1','feature2','class'])

print(frame)

Which gives:

   feature1  feature2  class
0  0.548814  0.791725      1
1  0.715189  0.528895      0
2  0.602763  0.568045      0
3  0.544883  0.925597      0
4  0.423655  0.071036      0
5  0.645894  0.087129      0
6  0.437587  0.020218      0
7  0.891773  0.832620      1
8  0.963663  0.778157      0
9  0.383442  0.870012      0

To import this dataframe into a MySQL table:

# Import dataframe into MySQL
import sqlalchemy
database_username = 'ENTER USERNAME'
database_password = 'ENTER USERNAME PASSWORD'
database_ip       = 'ENTER DATABASE IP'
database_name     = 'ENTER DATABASE NAME'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}@{2}/{3}'.
                                               format(database_username, database_password, 
                                                      database_ip, database_name))
frame.to_sql(con=database_connection, name='table_name_for_df', if_exists='replace')

One trick is that MySQLdb doesn't work with Python 3.x. So instead we use mysqlconnector, which may be installed as follows:

pip install mysql-connector==2.1.4  # version avoids Protobuf error

Output:

enter image description here

Note that to_sql creates the table as well as the columns if they do not already exist in the database.

2 Comments

\AppData\Roaming\Python\Python37\site-packages\sqlalchemy\engine\url.py in __init__(self, drivername, username, password, host, port, database, query) 69 self.host = host 70 if port is not None: ---> 71 self.port = int(port) 72 else: 73 self.port = None ValueError: invalid literal for int() with base 10: '' problem with port number, how to mention port
@pyd the format of the string to inlclude port is as follows: 'mysql+mysqldb://{user}:{password}@{server}:{port}/{database}'.format(user='<user>', password='<password>', server='<server>', port='<port>', database=<'database'>)
5

You can do it by using pymysql:

For example, let's suppose you have a MySQL database with the next user, password, host and port and you want to write in the database 'data_2', if it is already there or not.

import pymysql
user = 'root'
passw = 'my-secret-pw-for-mysql-12ud'
host =  '172.17.0.2'
port = 3306
database = 'data_2'

If you already have the database created:

conn = pymysql.connect(host=host,
                       port=port,
                       user=user, 
                       passwd=passw,  
                       db=database,
                       charset='utf8')

data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')

If you do NOT have the database created, also valid when the database is already there:

conn = pymysql.connect(host=host, port=port, user=user, passwd=passw)

conn.cursor().execute("CREATE DATABASE IF NOT EXISTS {0} ".format(database))
conn = pymysql.connect(host=host,
                       port=port,
                       user=user, 
                       passwd=passw,  
                       db=database,
                       charset='utf8')

data.to_sql(name=database, con=conn, if_exists = 'replace', index=False, flavor = 'mysql')

Similar threads:

  1. Writing to MySQL database with pandas using SQLAlchemy, to_sql
  2. Writing a Pandas Dataframe to MySQL

3 Comments

The name parameter should be the table name, not the database name.
The flavor kwarg for to_sql no longer is permitted.
Thank you! The index=False helped me solve the following error: (mysql.connector.errors.ProgrammingError) 1170 (42000): BLOB/TEXT column "index" used in key specification without a key length
3

This should do the trick:

import pandas as pd
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine

# Create engine
engine = create_engine('mysql://USER_NAME_HERE:PASS_HERE@HOST_ADRESS_HERE/DB_NAME_HERE')

# Create the connection and close it(whether successed of failed)
with engine.begin() as connection:
  df.to_sql(name='INSERT_TABLE_NAME_HERE/INSERT_NEW_TABLE_NAME', con=connection, if_exists='append', index=False)

Comments

2

The to_sql method works for me.

However, keep in mind that the it looks like it's going to be deprecated in favor of SQLAlchemy:

FutureWarning: The 'mysql' flavor with DBAPI connection is deprecated and will be removed in future versions. MySQL will be further supported with SQLAlchemy connectables. chunksize=chunksize, dtype=dtype)

Comments

2

Python 2 + 3

Prerequesites

  • Pandas
  • MySQL server
  • sqlalchemy
  • pymysql: pure python mysql client

Code

from pandas.io import sql
from sqlalchemy import create_engine

engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                       .format(user="root",
                               pw="your_password",
                               db="pandas"))
df.to_sql(con=engine, name='table_name', if_exists='replace')

Comments

1

This has worked for me. At first I've created only the database, no predefined table I created.

from platform import python_version
print(python_version())
3.7.3

path='glass.data'
df=pd.read_csv(path)
df.head()


!conda install sqlalchemy
!conda install pymysql

pd.__version__
    '0.24.2'

sqlalchemy.__version__
'1.3.20'

restarted the Kernel after installation.

from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://USER:PASSWORD@HOST:PORT/DATABASE_NAME', echo=False)

try:
df.to_sql(name='glasstable',con=engine,index=False, if_exists='replace')
print('Sucessfully written to Database!!!')

except Exception as e:
    print(e)

Comments

0

You might output your DataFrame as a csv file and then use mysqlimport to import your csv into your mysql.

EDIT

Seems pandas's build-in sql util provide a write_frame function but only works in sqlite.

I found something useful, you might try this

1 Comment

Thanks, this is how I've been doing this so far. I'm looking for a way to directly insert into mysql without the csv detour.
-1

df.to_sql(name = "owner", con= db_connection, schema = 'aws', if_exists='replace', index = >True, index_label='id')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.