I want to convert a csv file to a db (database) file using python. How should I do it ?
-
Do you want to change the file name or change the way the data is formatted?Graeme Stuart– Graeme Stuart2014-03-16 21:34:33 +00:00Commented Mar 16, 2014 at 21:34
-
1What is a "db (database)" file?Raul Guiu– Raul Guiu2014-03-16 21:36:18 +00:00Commented Mar 16, 2014 at 21:36
-
@GraemeStuart i need to change the way data is formatted.Sentient07– Sentient072014-03-17 16:55:43 +00:00Commented Mar 17, 2014 at 16:55
2 Answers
You need to find a library that helps you to parse the csv file, or read the file line by line and parse it with standard python, it could be as simple as split the line on commas.
Insert in the Sqlite database. Here you have the python documentation on SQLite. You could also use sqlalchemy or other ORM .
Another way, could be using the sqlite shell itself.
1 Comment
I don't think this can be done in full generality without out-of-band information or just treating everything as strings/text. That is, the information contained in the CSV file won't, in general, be sufficient to create a semantically “satisfying” solution. It might be good enough to infer what the types probably are for some cases, but it'll be far from bulletproof.
I would use Python's csv and sqlite3 modules, and try to:
- convert the cells in the first CSV line into names for the SQL columns (strip “oddball” characters)
- infer the types of the columns by going through the cells in the second CSV file line (first line of data), attempting to convert each one first to an
int, if that fails, try afloat, and if that fails too, fall back to strings - this would give you a list of names and a list of corresponding probably types from which you can roll a
CREATE TABLEstatement and execute it - try to
INSERTthe first and subsequent data lines from the CSV file
There are many things to criticize in such an approach (e.g. no keys or indexes, fails if first line contains a field that is a string in general but just so happens to contain a value that's Python-convertible to an int or float in the first data line), but it'll probably work passably for the majority of CSV files.