When querying data in memory form a subset form CSV, I always do it this way:
df = pd.read_csv('data.csv', chunksize=10**3)
chunk1 = df.get_chunk()
chunk1 = chunk1[chunk1['Col1'] > someval]
for chunk in df:
chunk1.append(chunk[chunk['Col1'] >someval])
I recently started playing around with HDF5, and am not able to do this because the TableIterator object does not have a get_chunk() method or accept next().
df = pd.read_hdf('data.h5', chunksize=10**3)
df.get_chunk()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-19-xxxxxxxx> in <module>()
----> 1 df.get_chunk()
AttributeError: 'TableIterator' object has no attribute 'get_chunk'
Any ideas for a workaround? (I know that I can query from hdf5 on disk using pandas but for this purpose would like to try it this way)