0

Hei folks,

I've got a python process which generates matrices. These are stacked up one onto each other and saved as a tensor. Here is the code

import tables
h5file = tables.open_file("data/tensor.h5", mode="w", title="tensor")
atom = tables.Atom.from_dtype(n.dtype('int16'))
tensor_shape = (N, 3, MAT_SIZE, MAT_SIZE)

for i in range(N):
    mat = generate(i)
    tensor[i, :, :] = mat

The problem is that when it hits 8GB is goes out of memory. Shouldn't the HDF5 format never go out of memory? Like move the data from the memory to the disk when required?

1 Answer 1

1

When you are using PyTables the HDF5 file is kept in-memory until the file is closed (see more here: In-memory HDF5 files).

I will recommend you to have a look at the append and flush methods of PyTables, as I think that's exactly what you want. Be aware that flushing the buffer for every loop iteration will significantly reduce the performance of your code, due to the constant I/O that needs to be performed.

Also writing the file as chunks (just like when reading data into dataframes in pandas) might spike your interest - See more here: PyTables optimization

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.