1

I am trying to read a HDF5-format MATLAB file in python, using the h5py library. This file is called "Q_visSDF_accurate.mat" and has two keys: "filename" and "sdf". "filename contains a cell array strings. "sdf" is a [6001, 49380] matrix containing floats. I had no problem to extract the variable sdf using the following code:

import h5py
data = h5py.File("Q_visSDF_accurate.mat", 'r')
sdf = data.get("sdf")[:,:]
sdf = sdf.astype(float)

However, I cant read the filename variable. I tried:

filename = data.get("filename")[0]

but the code returns:

array([<HDF5 object reference>, <HDF5 object reference>,
   <HDF5 object reference>, ..., <HDF5 object reference>,
   <HDF5 object reference>, <HDF5 object reference>], dtype=object)

I can I de-reference the containt of the filename variable? Using the hdf5storage package is not a solution, as it works only for python 32 bits and can only read a subset of matlab variables.

3
  • Have you tried using hdf5storage? It can read hdf5-based .mat files into a more usable form. Commented Dec 8, 2016 at 1:31
  • I edited my original post accordingly. Commented Dec 8, 2016 at 23:25
  • did you manage to solve this? I'm still stuck with your exact problem Commented Mar 11, 2018 at 5:39

1 Answer 1

2

In Octave I created a file with cell and matrix

>> xmat = [1,2,3;4,5,6;7,8,9];
>> xcell = {1,2,3;4,5,6;7,8,9};
>> save -hdf5 testmat.h5 xmat xcell

In ipython with h5py, I find that this file contains 2 groups

In [283]: F = h5py.File('../testmat.h5','r')
In [284]: list(F.keys())
Out[284]: ['xcell', 'xmat']

The matrix group has a type and value dataset:

In [285]: F['xmat']
Out[285]: <HDF5 group "/xmat" (2 members)>
In [286]: list(F['xmat'].keys())
Out[286]: ['type', 'value']
In [287]: F['xmat']['type']
Out[287]: <HDF5 dataset "type": shape (), type "|S7">
In [288]: F['xmat']['value']
Out[288]: <HDF5 dataset "value": shape (3, 3), type "<f8">
In [289]: F['xmat']['value'][:]
Out[289]: 
array([[ 1.,  4.,  7.],
       [ 2.,  5.,  8.],
       [ 3.,  6.,  9.]])

The cell has the same type and value, but value is another group:

In [291]: F['xcell']['type']
Out[291]: <HDF5 dataset "type": shape (), type "|S5">
In [292]: F['xcell']['value']
Out[292]: <HDF5 group "/xcell/value" (10 members)>

In [294]: list(F['xcell']['value'].keys())
Out[294]: ['_0', '_1', '_2', '_3', '_4', '_5', '_6', '_7', '_8', 'dims']
...
In [296]: F['xcell']['value']['dims'][:]
Out[296]: array([3, 3])

I had to use the [...] to fetch the value of a cell, since it is a 0d array:

In [301]: F['xcell']['value']['_0']['value'][...]
Out[301]: array(1.0)

To really replicate the question I should have created string cells values, but I think this illustrates well enough how a cells are stored - as named datasets within a data group.

I'm assuming the Octave h5 storage is compatible with MATLAB's.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.