4

I have a dataframe that have about 200M rows with example like this:

Date         tableName    attributeName
29/03/2019   tableA       attributeA
....

and I want to save the dataframe to a table in MySQL database. This is what I've tried to insert the dataframe to table:

def insertToTableDB(tableName,dataFrame):
    mysqlCon = mysql.connector.connect(host='localhost',user='root',passwd='')
    cursor = mysqlCon.cursor()
    for index, row in dataFrame.iterrows():
        myList =[row.Date, row.tableName, row.attributeName]
        query = "INSERT INTO `{0}`(`Date`, `tableName`, `attributeName`) VALUES (%s,%s,%s);".format(tableName)
        cursor.execute(query,myList)
        print(myList)
    try:
        mysqlCon.commit()
        cursor.close()        
        print("Done")
        return tableName,dataFrame
    except:
        cursor.close()
        print("Fail")

This code successful when I inserted a dataframe that have 2M rows. But, when I inserted dataframe that have 200M rows, I got error like this:

File "C:\Users\User\Anaconda3\lib\site-packages\mysql\connector\cursor.py", line 569, in execute
self._handle_result(self._connection.cmd_query(stmt))

File "C:\Users\User\Anaconda3\lib\site-packages\mysql\connector\connection.py", line 553, in cmd_query
result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))

File "C:\Users\User\Anaconda3\lib\site-packages\mysql\connector\connection.py", line 442, in _handle_result
raise errors.get_exception(packet)

ProgrammingError: Unknown column 'nan' in 'field list'

My dataframe doesn't have 'nan' value. Could someone help me to solve this problem?

Thank you so much.

3
  • can you print dataFrame.columns Commented Jul 29, 2019 at 5:46
  • @tawab_shakeel yes, of course. I already update the question Commented Jul 29, 2019 at 6:31
  • put for loop in try block or use if after for loop to check whether all expected fields are available in dataframe, i think one of your column holding nan (i.e. not a number) Commented Jul 29, 2019 at 6:38

3 Answers 3

2

replace everywhere 'NaN' for the string 'empty':

df = df.replace(np.nan, 'empty')

Remember to:

import numpy as np
Sign up to request clarification or add additional context in comments.

1 Comment

this is not what you want, it should say NULL.
1

try these steps

  1. drop rows containing nan using dropna
  2. Filter rows which not contains nan in string.
  3. Convert nan into None
df.dropna(inplace=True)

df[(df['Date']!='nan') & (df['tableName']!='nan') &(df['attributeName']!='nan')]

df1 = df.where((pd.notnull(df)), None)

1 Comment

The third one does not work.
0

df = df.astype(str) solves the problem for me - assuming you've already set up your table schema

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.