10

I want to get a dataframe as hdf in memory. The code below results in "AttributeError: '_io.BytesIO' object has no attribute 'put'". I am using python 3.5 and pandas 0.17

import pandas as pd
import numpy as np
import io

df = pd.DataFrame(np.arange(8).reshape(-1, 2), columns=['a', 'b'])
buf = io.BytesIO()
df.to_hdf(buf, 'some_key')

Update: As UpSampler pointed out "path_or_buf" cannot be an io stream (which I find confusing since buf usually can be an io stream, see to_csv). Other than writing to disk and reading it back in, can I get a dataframe as hdf in memory?

2
  • Out of curiosity - why would you want to do that? Commented Jan 6, 2017 at 15:15
  • I just came to the same point, did you manage to solve it? Commented Sep 25, 2017 at 15:14

3 Answers 3

2

Your first argument to df.to_hdf() has to be a "path (string) or HDFStore object" not an io stream. Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_hdf.html

Sign up to request clarification or add additional context in comments.

2 Comments

Ahh, didn't see that. to_csv has a similar argument path_or_buf where buf can be an io stream, which is why I got confused.
CSV is a a bulit-in module and HDF is external. I don't know what pandas is using for HDF file access, could be pytables or pyhdf...
0

You can manually manage an in-memory HDFStore to do this. (See the relevant section of the PyTables documentation and this GitHub issue).

import pandas as pd
import numpy as np
import tempfile

df = pd.DataFrame(np.arange(8).reshape(-1, 2), columns=['a', 'b'])
# Although we provide a filename to pd.HDFStore, nothing gets written to disk
# because of the driver_core_backing_store=0 parameter.
with pd.HDFStore(
    tempfile.mktemp(),
    driver="H5FD_CORE",
    driver_core_backing_store=0
) as store:
    store.put("some_key", df, errors="strict", encoding="UTF-8")
    # Store the resulting HDF file bytes to result.
    result = store._handle.get_file_image()

Comments

-1

just try this

df = pd.DataFrame(np.arange(8).reshape(-1, 2), columns=['a', 'b'])
df.to_hdf(path_or_buf='path\to\your\file')

refer pandas.DataFrame.to_hdf

1 Comment

I want it in memory, not on disk, and would prefer not writing to disk and then reading it back in

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.