3

I using SQLite (sqlite3) interfaced with Python, to hold parameters in a table which I use for processing a large amount of data. Suppose I have already populated the table initially, but then change the parameters, and I want to update my table. If I create a Python list holding the updated parameters, for every row and column in the table, how do I update the table?

I have looked here and here (though the latter refers to C++ as opposed to Python) but these don't really answer my question.

To make this concrete, I show some of my code below:

import sqlite3 as sql
import numpy as np

db = sql.connect('./db.sq3')
cur = db.cursor()

#... Irrelevant Processing Code ...#

cur.execute("""CREATE TABLE IF NOT EXISTS process_parameters (
                 parameter_id            INTEGER PRIMARY KEY,
                 exciton_bind_energy     REAL,
                 exciton_bohr_radius     REAL,
                 exciton_mass            REAL,
                 exciton_density_per_QW  REAL,
                 box_trap_side_length    REAL,
                 electron_hole_overlap   REAL,
                 dipole_matrix_element   REAL,
                 k_cutoff                REAL)""")

#Parameter list
process_params = [(E_X/1.6e-19, a_B/1e-9, m_exc/9.11e-31, 1./(np.sqrt(rho_0)*a_B), D/1e-6, phi0/1e8, d/1e-28, k_cut/(1./a_B)) for i in range(0,14641)]

#Check to see if table is populated or not
count = cur.execute("""SELECT COUNT (*) FROM process_parameters""").fetchone()[0]

#If it's not, fill it up
if count == 0:
    cur.executemany("""INSERT INTO process_parameters VALUES(NULL, ?, ?, ?, ?, ?, ?, ?, ?);""", process_params)
    db.commit()

Now, suppose that on a subsequent processing run, I change one or more of the parameters in process_params. What I'd like is for on any subsequent runs that Python will update the database with the most recent version of the parameters. So I do

else:
    cur.executemany("""UPDATE process_parameters SET exciton_bind_energy=?, exciton_bohr_radius=?, exciton_mass=?, exciton_density_per_QW=?, box_trap_side_length=?, electron_hole_overlap=?, dipole_matrix_element=?, k_cutoff=?;""", process_params)
    db.commit()
db.close()

But when I do this, the script seems to hang (or be going very slowly) such that Ctrl+C doesn't even quit the script (being run via ipython).

I know in this case, updating using a huge Python list may be irrelevant, but it's the principle here which I want to clarify, since at another time, I may not be updating every row with the same values. If someone could help me understand what's happening and/or how to fix this, I'd really appreciate it. Thank-you.

1 Answer 1

5
cur.executemany("""
    UPDATE process_parameters SET 
        exciton_bind_energy=?, 
        exciton_bohr_radius=?, 
        exciton_mass=?, 
        exciton_density_per_QW=?, 
        box_trap_side_length=?, 
        electron_hole_overlap=?, 
        dipole_matrix_element=?,
        k_cutoff=?
   ;
""", process_params)

You forgot the WHERE clause while updating. Without the WHERE clause, the UPDATE statement will update every row in the table. Since you provide 14641 sets of parameters, the SQLite driver will update rows for 14641 (input) × 14641 (rows in table) = 214 million times, which shows why it is slow.

The proper way is to update only the relevant row every time:

cur.executemany("""
    UPDATE process_parameters SET 
        exciton_bind_energy=?, 
        exciton_bohr_radius=?, 
        exciton_mass=?, 
        exciton_density_per_QW=?, 
        box_trap_side_length=?, 
        electron_hole_overlap=?, 
        dipole_matrix_element=?,
        k_cutoff=?
   WHERE parameter_id=?        
-- ^~~~~~~~~~~~~~~~~~~~ don't forget this
   ;
""", process_params)

For sure, this means process_params must include parameter IDs, and you need to modify the INSERT statement to insert the parameter ID as well.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, this seemed to work. Not surprisingly, though, updating row by row did take some time as compared to just creating and populating the table. A follow up question: In the INSERT statement, I've changed parameter_id from INTEGER PRIMARY KEY to INTEGER and put it as the last column in the table. Is there anything "bad" about using and INTEGER instead of an INTEGER PRIMARY KEY in this case, and is it bad practice to have the id column at the end of the table? Thanks!
@pvasudev Keep the PRIMARY KEY, otherwise you will need O(n) to find a row instead of O(log n) or O(1). You can use the syntax INSERT INTO process_parameters (exciton_bind_energy, <snip>, k_cutoff, parameter_id) VALUES (?, <snip>, ?, ?) to specify the column order while inserting.
@pvasudev I suggest you find a tutorial or book about SQL first.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.