1

I would like to get the byte contents of a pandas dataframe exported as hdf5, ideally without actually saving the file (i.e., in-memory).

On python>=3.6, < 3.9 (and pandas==1.2.4, pytables==3.6.1) the following used to work:

import pandas as pd
with pd.HDFStore(
    "in-memory-save-file",
    mode="w",
    driver="H5FD_CORE",
    driver_core_backing_store=0,
) as store:
    store.put("my_key", df, format="table")
    binary_data = store._handle.get_file_image()

Where df is the dataframe to be converted to hdf5, and the last line calls this pytables function.

However, starting with python 3.9, I get the following error when using the snippet above:

File "tables/hdf5extension.pyx", line 523, in tables.hdf5extension.File.get_file_image
tables.exceptions.HDF5ExtError: Unable to retrieve the size of the buffer for the file image.  Plese note that not all drivers provide support for image files.

The error is raised by the same pytables function linked above, apparently due to issues while retrieving the size of the buffer for the file image. I don't understand the ultimate reason for it, though.

I have tried other alternatives such as saving to a BytesIO file-object, so far unsuccessfully.

How can I keep the hdf5 binary of a pandas dataframe in-memory on python 3.9?

1
  • This is a related (albeit old) discussion on github where the call to get_file_image() is suggested Commented May 12, 2021 at 9:13

1 Answer 1

1

The fix was to do conda install -c conda-forge pytables instead of pip install pytables. I still don't understand the ultimate reason behind the error, though.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.