NumPy arrays with SQLite

Question

The most common SQLite interface I've seen in Python is sqlite3, but is there anything that works well with NumPy arrays or recarrays? By that I mean one that recognizes data types and does not require inserting row by row, and extracts into a NumPy (rec)array...? Kind of like R's SQL functions in the RDB or sqldf libraries, if anyone is familiar with those (they import/export/append whole tables or subsets of tables to or from R data tables).

I'd recommend to have a look at PyTables. It uses HDF5 as backend, not SQLite, but also supports powerful querying. — Sven Marnach
– Sven Marnach, Commented Oct 26, 2011 at 11:21
Thanks, but I want it to be in SQLite so R can have a go at it (R does much better with SQL tables than HDF5 files). — hatmatrix
– hatmatrix, Commented Oct 26, 2011 at 14:03

doug · Accepted Answer · 2011-10-28 19:53:47Z

why not give redis a try?

Drivers for your two platforms of interest are available--python (redis, via package index]2), and R (rredis, CRAN).

The genius of redis is not that it will magically recognize the NumPy data type and allow you to insert and extract multi-dimensional NumPy arrays as if they were native redis datatypes, rather its genius is in the remarkable ease with which you can create such an interface with just a few lines of code.

There are (at least) several tutorials on redis in python; the one on the DeGizmo blog is particularly good.

import numpy as NP

# create some data
A = NP.random.randint(0, 10, 40).reshape(8, 5)

# a couple of utility functions to (i) manipulate NumPy arrays prior to insertion 
# into redis db for more compact storage & 
# (ii) to restore the original NumPy data types upon retrieval from redis db
fnx2 = lambda v : map(int, list(v))
fnx = lambda v : ''.join(map(str, v))

# start the redis server (e.g. from a bash prompt)
$> cd /usr/local/bin      # default install directory for 'nix
$> redis-server           # starts the redis server

# start the redis client:
from redis import Redis
r0 = Redis(db=0, port=6379, host='localhost')       # same as: r0 = Redis()

# to insert items using redis 'string' datatype, call 'set' on the database, r0, and
# just pass in a key, and the item to insert
r0.set('k1', A[0,:])

# row-wise insertion the 2D array into redis, iterate over the array:
for c in range(A.shape[0]):
    r0.set( "k{0}".format(c), fnx(A[c,:]) )

# or to insert all rows at once
# use 'mset' ('multi set') and pass in a key-value mapping: 
x = dict([sublist for sublist in enumerate(A.tolist())])
r0.mset(x1)

# to retrieve a row, pass its key to 'get'
>>> r0.get('k0')
  '63295'

# retrieve the entire array from redis:
kx = r0.keys('*')           # returns all keys in redis database, r0

for key in kx :
    r0.get(key)

# to retrieve it in original form:
A = []
for key in kx:
    A.append(fnx2(r0.get("{0}".format(key))))

>>> A = NP.array(A)
>>> A
  array([[ 6.,  2.,  3.,  3.,  9.],
         [ 4.,  9.,  6.,  2.,  3.],
         [ 3.,  7.,  9.,  5.,  0.],
         [ 5.,  2.,  6.,  3.,  4.],
         [ 7.,  1.,  5.,  0.,  2.],
         [ 8.,  6.,  1.,  5.,  8.],
         [ 1.,  7.,  6.,  4.,  9.],
         [ 6.,  4.,  1.,  3.,  6.]])

This is a great idea in general for exchanging data/information between programs! -- thanks for the example also. Unfortunately redis is an in-memory solution? And the data I want to exchange is quite large so being able to use a hard-disk file is desirable...

hadsed · Accepted Answer · 2015-08-03 18:01:15Z

3

Doug's suggestion with redis is quite good, but I think his code is a bit complicated and, as a result, rather slow. For my purposes, I had to serialize+write and then grab+deserialize a square matrix of about a million floats in less than a tenth of a second, so I did this:

For writing:

snapshot = np.random.randn(1024,1024)
serialized = snapshot.tobytes()
rs.set('snapshot_key', serialized)

Then for reads:

s = rs.get('snapshot_key')
deserialized = np.frombuffer(s).astype(np.float32)
rank = np.sqrt(deserialized.size).astype(int)
snap = deserialized(rank, rank)

You can do some basic performance testing with ipython using %time, but neither the tobytes or frombuffer take more than a few milliseconds.

answered Aug 3, 2015 at 18:01

hadsed

3372 gold badges4 silver badges11 bronze badges

2 Comments

mart Over a year ago

great suggestion, but watch out with np.frombuffer(s).astype(np.float32): this will parse the buffer s as if it had dtype np.float64 and then casts it to np.float32. if the original matrix had dtype np.float32 this will return a matrix that is half the size. it is better to use the dtype argument of np.frombuffer (i.e. np.frombuffer(s, dtype=np.float32))

hadsed Over a year ago

good catch, i was being sloppy here especially with the python int mixed in too.

Aaron Robinson · Accepted Answer · 2012-08-23 21:18:29Z

1

This looks a bit older but is there any reason you cannot just do a fetchall() instead of iterating and then just initializing numpy on declaration?

answered Aug 23, 2012 at 21:18

Aaron Robinson

4161 gold badge6 silver badges14 bronze badges

Comments

orbeckst · Accepted Answer · 2012-12-20 00:08:33Z

1

I found at least three Python packages to interface SQLite and NumPy:

esutil.sqlite_util, which is part of esutil
hydroclimpy.io.sqlite
my own RecSQL (available from github.com/orbeckst/RecSQL)

Each of these packages has to deal with the problem that SQLite (by default) only understands standard Python types and not the NumPy data types such as numpy.int64.

RecSQL 0.7.8+ works for me (most of the time) but I consider it a pretty bad hack and glancing over the code, esutil.sqlite_util appears to be more mature.

answered Dec 20, 2012 at 0:08

orbeckst

3,2791 gold badge19 silver badges14 bronze badges

Collectives™ on Stack Overflow

NumPy arrays with SQLite

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related