1

I'm writing a workflow that first converts a flat csv into a .db file for downstream processing. The file seems to write ok but when I try to read it again it throws errors saying that the table doesn't exist. Here is my code:

import pandas as pd
import sqlite3

# simulate some data
df=pd.DataFrame({'DATE':['01-01-2000','01-02-2000','01-03-2000','01-04-2000','01-05-2000'], 
                    'A1':[1,1,1,1,1],
                    'A2':[1,2,3,4,5],
                    'B':[2,2,3,3,2]})
df.to_csv('my_test.csv', index=False)

# function to write csv to .db file
def pandas_csv_to_db_converter(csvfile, table_name):


    df = pd.read_csv(csvfile)
    conn = sqlite3.connect(table_name)
    df.to_sql(table_name, conn, if_exists='append', index=False)

    return

pandas_csv_to_db_converter('my_test.csv', 'my_test.db')

So far so good, I have a csv and a .db file of similar sizes. So I want to read in the table to make sure its all ok.

def pandas_db_reader(db_file):

    conn = sqlite3.connect(db_file)
    c = conn.cursor()
    c.execute("SELECT name FROM sqlite_master WHERE type='table';")
    names = [tup[0] for tup in c.fetchall()]
    print(names)

    table = pd.read_sql_query("SELECT * from {}".format(db_file), conn)
    print(table)
    conn.close()
    return

out_db = 'my_test.db'
pandas_db_reader(out_db)

The string 'my_test.db' is printed as a table in a list on the console, indicating that it is a legitimate table, but then the code throws an error:

    cur.execute(*args)
sqlite3.OperationalError: no such table: my_test.db

During handling of the above exception, another exception occurred:

    cur.execute(*args)
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * from my_test.db': no such table: my_test.db

Its not very intuitive why this error is occurring. I've looked at quite a lot of tutorials and stack overflow questions on how to do this bit its doesn't seem to run. I don't have much experience using sql so any help would be really helpful! Thanks in advance!

EDIT: typo edited :
pandas_csv_to_db_converter('my_test.csv', my_test_db.db) 

is now

pandas_csv_to_db_converter('my_test.csv', 'my_test.db')
3
  • why not just do pandas.DataFrame.to_sql? Commented Jun 24, 2019 at 14:42
  • basically since you already have the dataframe, you can push it into sqlite tables like this: engine = sqlalchemy.create_engine('sqlite:///my_db.sqlite') df.to_sql('Resolved', engine, if_exists='append') Commented Jun 24, 2019 at 14:44
  • Thanks for your help, the csv comes from elsewhere, I was just making some test data for a working example. Commented Jun 24, 2019 at 15:03

1 Answer 1

2

You're passing the db_file as the tablename in pandas_db_reader(). You need to pass the correct TABLE_NAME variable to SQL query below.

table = pd.read_sql_query("SELECT * from {}".format(TABLE_NAME), conn)

For whatever reason (see here) my_test_db.db is not a valid table name. Change the table name value in pandas_csv_to_db_converter and pandas_db_reader i.e.

...
df.to_sql("test", conn, if_exists='append', index=False)
...

and

...
table = pd.read_sql_query("SELECT * from {}".format("test"), conn)
...
Sign up to request clarification or add additional context in comments.

6 Comments

How would I find out what the TABLE_NAME is? This is basically what I thought which is why I put some lines into 'pandas_db_reader' to find out the names - I guess the list 'names' its not giving me what I thought it was!
FYI i already tried: c.execute("SELECT * FROM INFORMATION_SCHEMA.TABLES ") and c.execute("SELECT * FROM sys.Tables ")
Your table name is defined here pandas_csv_to_db_converter('my_test.csv', my_test_db.db). where is the my_test_db object from? Note: my_test_db.db is not being passed as a string.
Ah sorry my_test_db.db is a typo on stackoverflow, it should be 'my_test.db' everywhere. I'll edit the question for consistency, its correct in my program and still fails to run.
Thanks for your help, this test example now works. I think the problem was starting in 'pandas_csv_to_db_converter', I was passing in full file names. If I moved into the working directory and run the whole workflow from there with just the filename without the full path it works fine! - It looks like it was using the full file path as the table name which is why I couldn't recover the table in ''pandas_db_reader", either that or maybe it was saving to working memory rather than to a file which gets deleted when the function has finished running. Anyway many thanks!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.