Using UPDATE in SQLite for Many Rows with a Python List

Question

I using SQLite (sqlite3) interfaced with Python, to hold parameters in a table which I use for processing a large amount of data. Suppose I have already populated the table initially, but then change the parameters, and I want to update my table. If I create a Python list holding the updated parameters, for every row and column in the table, how do I update the table?

I have looked here and here (though the latter refers to C++ as opposed to Python) but these don't really answer my question.

To make this concrete, I show some of my code below:

import sqlite3 as sql
import numpy as np

db = sql.connect('./db.sq3')
cur = db.cursor()

#... Irrelevant Processing Code ...#

cur.execute("""CREATE TABLE IF NOT EXISTS process_parameters (
                 parameter_id            INTEGER PRIMARY KEY,
                 exciton_bind_energy     REAL,
                 exciton_bohr_radius     REAL,
                 exciton_mass            REAL,
                 exciton_density_per_QW  REAL,
                 box_trap_side_length    REAL,
                 electron_hole_overlap   REAL,
                 dipole_matrix_element   REAL,
                 k_cutoff                REAL)""")

#Parameter list
process_params = [(E_X/1.6e-19, a_B/1e-9, m_exc/9.11e-31, 1./(np.sqrt(rho_0)*a_B), D/1e-6, phi0/1e8, d/1e-28, k_cut/(1./a_B)) for i in range(0,14641)]

#Check to see if table is populated or not
count = cur.execute("""SELECT COUNT (*) FROM process_parameters""").fetchone()[0]

#If it's not, fill it up
if count == 0:
    cur.executemany("""INSERT INTO process_parameters VALUES(NULL, ?, ?, ?, ?, ?, ?, ?, ?);""", process_params)
    db.commit()

Now, suppose that on a subsequent processing run, I change one or more of the parameters in process_params. What I'd like is for on any subsequent runs that Python will update the database with the most recent version of the parameters. So I do

else:
    cur.executemany("""UPDATE process_parameters SET exciton_bind_energy=?, exciton_bohr_radius=?, exciton_mass=?, exciton_density_per_QW=?, box_trap_side_length=?, electron_hole_overlap=?, dipole_matrix_element=?, k_cutoff=?;""", process_params)
    db.commit()
db.close()

But when I do this, the script seems to hang (or be going very slowly) such that Ctrl+C doesn't even quit the script (being run via ipython).

I know in this case, updating using a huge Python list may be irrelevant, but it's the principle here which I want to clarify, since at another time, I may not be updating every row with the same values. If someone could help me understand what's happening and/or how to fix this, I'd really appreciate it. Thank-you.

kennytm · Accepted Answer · 2017-01-27 17:20:05Z

5

cur.executemany("""
    UPDATE process_parameters SET 
        exciton_bind_energy=?, 
        exciton_bohr_radius=?, 
        exciton_mass=?, 
        exciton_density_per_QW=?, 
        box_trap_side_length=?, 
        electron_hole_overlap=?, 
        dipole_matrix_element=?,
        k_cutoff=?
   ;
""", process_params)

You forgot the WHERE clause while updating. Without the WHERE clause, the UPDATE statement will update every row in the table. Since you provide 14641 sets of parameters, the SQLite driver will update rows for 14641 (input) × 14641 (rows in table) = 214 million times, which shows why it is slow.

The proper way is to update only the relevant row every time:

cur.executemany("""
    UPDATE process_parameters SET 
        exciton_bind_energy=?, 
        exciton_bohr_radius=?, 
        exciton_mass=?, 
        exciton_density_per_QW=?, 
        box_trap_side_length=?, 
        electron_hole_overlap=?, 
        dipole_matrix_element=?,
        k_cutoff=?
   WHERE parameter_id=?        
-- ^~~~~~~~~~~~~~~~~~~~ don't forget this
   ;
""", process_params)

For sure, this means process_params must include parameter IDs, and you need to modify the INSERT statement to insert the parameter ID as well.

answered Jan 27, 2017 at 17:20

kennytm

526k110 gold badges1.1k silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pvasudev Over a year ago

Thanks, this seemed to work. Not surprisingly, though, updating row by row did take some time as compared to just creating and populating the table. A follow up question: In the INSERT statement, I've changed parameter_id from INTEGER PRIMARY KEY to INTEGER and put it as the last column in the table. Is there anything "bad" about using and INTEGER instead of an INTEGER PRIMARY KEY in this case, and is it bad practice to have the id column at the end of the table? Thanks!

kennytm Over a year ago

@pvasudev Keep the PRIMARY KEY, otherwise you will need O(n) to find a row instead of O(log n) or O(1). You can use the syntax INSERT INTO process_parameters (exciton_bind_energy, <snip>, k_cutoff, parameter_id) VALUES (?, <snip>, ?, ?) to specify the column order while inserting.

kennytm Over a year ago

@pvasudev I suggest you find a tutorial or book about SQL first.

Collectives™ on Stack Overflow

Using UPDATE in SQLite for Many Rows with a Python List

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related