2

So I created hdf5 file with a simple dataset that looks like this

>>> pd.read_hdf('STORAGE2.h5', 'table')
   A  B
0  0  0
1  1  1
2  2  2
3  3  3
4  4  4

Using this script

import pandas as pd
import scipy as sp
from pandas.io.pytables import Term

store = pd.HDFStore('STORAGE2.h5')

df_tl = pd.DataFrame(dict(A=list(range(5)), B=list(range(5))))

df_tl.to_hdf('STORAGE2.h5','table',append=True)

I know I can select columns using

x = pd.read_hdf('STORAGE2.h5', 'table',  columns=['A'])

or

x = store.select('table', where = 'columns=A')

How would I select all values in column 'A' that equals 3 or specific or indicies with strings in column 'A' like 'foo'? In pandas dataframes I would use df[df["A"]==3] or df[df["A"]=='foo']

Also does it make a difference in efficiency if I use read_hdf() or store.select()?

3
  • Have a read on the extensive docs on this: pandas.pydata.org/pandas-docs/stable/io.html#querying-a-table; FYI, Term is an older < 0.13.0 way of doing this. Commented Oct 10, 2014 at 15:27
  • @Jeff Ok so I guess it is going to be deprecated so I removed it. Thanks! Commented Oct 10, 2014 at 15:40
  • no its compatible, just an 'older' syntax (and the new one is more natural), of course IMHO Commented Oct 10, 2014 at 15:42

1 Answer 1

3

You need to specify data_columns= (you can use True as well to make all columns searchable)

(FYI, the mode='w' will start the file over, and is just for my example)

In [50]: df_tl.to_hdf('STORAGE2.h5','table',append=True,mode='w',data_columns=['A'])

In [51]: pd.read_hdf('STORAGE2.h5','table',where='A>2')
Out[51]: 
   A  B
3  3  3
4  4  4
Sign up to request clarification or add additional context in comments.

3 Comments

I see so if I do not append the data columns when I store it using to_hdf it would not work if I use store['table2'] = df_tl either. I am surprise pandas does not automate the data_columns=True.
I am guessing I cannot use store['table2'] = df_tl unless I want to read the whole table and edit it in memory.
correct, that stores it in 'fixed' format. It depends on your goal. If you need appending/query, or have huge data, then use 'table'. That said, 'fixed' is faster for read/writing so for a set that isn't huge it works well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.