1

I have a csv file which has "Date","Time" and other columns (10 or so)

Date,Time,C
20020515,123000000,10293
20020515,160000000,10287
20020516,111800000,10270
20020516,160000000,10260
20020517,130500000,10349
20020517,160000000,10276
20020520,123700000,10313
20020520,160000000,10258
20020521,114500000,10223

I am trying to load this into a hdf5 file and have Date and Time type be "String" and not integer32. So I am doing this

import h5py,numpy as np
my_data = np.genfromtxt("/tmp/data.txt",delimiter=",",dtype=None,names=True)
myFile="/tmp/data.h5"
with h5py.File(myFile,"a") as f:
  dset = f.create_dataset('foo',data=my_data)

I would like to store "Date" and "Time" as type "String" on HDF5. Not Int32.

3
  • I don't think it is possible. According to the docs: Datasets are very similar to NumPy arrays. They are homogenous collections of data elements, with an immutable datatype and (hyper)rectangular shape. This means that all columns must have the same dtype. Commented Dec 22, 2015 at 18:26
  • Do you want to change the way that you are storing the data in the HDF5 file, or do you want to be able to convert those columns to strings after reading them from the file? Commented Dec 22, 2015 at 18:47
  • I want to change the way I am storing the data. I want to store them as String instead of integer. Commented Dec 22, 2015 at 19:07

1 Answer 1

6

One simple solution would be to change the dtype of my_data before writing it to the file:

newtype = np.dtype([('Date', 'S8'), ('Time', 'S8'), ('C', '<i8')])
dset2 = f.create_dataset('foo2', data=my_data.astype(newtype))

You could also create an empty dataset by passing the appropriate dtype= and shape= parameters to f.create_dataset, then fill in the values from my_data:

dset3 = f.create_dataset('foo3', shape=my_data.shape, dtype=newtype)
dset3[:] = my_data.astype(newtype)

Note that I still have to cast my_data to newtype before writing it - h5py doesn't seem to be able to handle the type conversion itself:

In [15]: dset3[:] = my_data
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-15-6e62dae3d59a> in <module>()
----> 1 dset3[:] = my_data

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

/home/alistair/.venvs/core3/lib/python3.4/site-packages/h5py/_hl/dataset.py in __setitem__(self, args, val)
    584         mspace = h5s.create_simple(mshape_pad, (h5s.UNLIMITED,)*len(mshape_pad))
    585         for fspace in selection.broadcast(mshape):
--> 586             self.id.write(mspace, fspace, val, mtype)
    587 
    588     def read_direct(self, dest, source_sel=None, dest_sel=None):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

h5py/h5d.pyx in h5py.h5d.DatasetID.write (/tmp/pip-build-aayglkf0/h5py/h5py/h5d.c:3421)()

h5py/_proxy.pyx in h5py._proxy.dset_rw (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1794)()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dwrite (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1501)()

OSError: Can't prepare for writing data (No appropriate function for conversion path)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.