0

I have a set of MAT-files which contains a matlab struct. The struct has bunch of arrays. I would like to open the file and transfer all of them into arrays. So far I have written the following code:

import h5py
>>> fs = h5py.File('statistics_VAD.mat','r')
>>> list(fs.keys())
['#refs#', 'data']
>>> 
>>> fs['data'].visititems(lambda n,o:print(n, o))
C <HDF5 dataset "C": shape (100, 1), type "|O">
P <HDF5 dataset "P": shape (100, 1), type "|O">
V <HDF5 dataset "V": shape (100, 1), type "|O">
Wn <HDF5 dataset "Wn": shape (100, 1), type "|O">
X <HDF5 dataset "X": shape (100, 1), type "|O">
a <HDF5 dataset "a": shape (100, 1), type "|O">
dn <HDF5 dataset "dn": shape (100, 1), type "|O">
>>> struArray = fs['data']
>>> print(struArray['P'])
<HDF5 dataset "P": shape (100, 1), type "|O">

I don't know how to transfer HDF5 dataset "P" to a numpy array. Any suggestion would be appreciated

14
  • What does arr=struArray['P'][:] do? Commented Feb 28, 2021 at 16:02
  • @hpaulj the output is >>> arr=struArray['P'][:] >>> arr array([[<HDF5 object reference>], [<HDF5 object reference>], [<HDF5 object reference>], [<HDF5 object reference>], [<HDF5 object reference>], [<HDF5 object reference>], .... Commented Feb 28, 2021 at 16:05
  • 1
    Those 'object refs' probably are items in the refs group, but I don't know if h5py can fetch them for you. scipy.io.loadmat can handle older style .mat file, but even there the result can have 'opaque' elements. Not everything that matlab saves to a file is translatable into numpy. Commented Feb 28, 2021 at 16:14
  • @hpaulj since I used this command line in matlab to save data save( 'statistics_VAD.mat','data', '-v7.3'); I get this error using scipy.io.loadmat: mat_contents = sio.loadmat(mat_fname) raise NotImplementedError('Please use HDF reader for matlab v7.3 files') NotImplementedError: Please use HDF reader for matlab v7.3 files Commented Feb 28, 2021 at 16:19
  • I wasn't recommending you use that reader. Commented Feb 28, 2021 at 16:28

1 Answer 1

1

Code below is the example mentioned in my comment (dtd 2021-03-01). It creates 2 datasets from NumPy arrays, then a dataset with 2 object references, 1 to each dataset. It then shows how to use the object references to access the data. A second dataset with region references is also done for completeness.

Notice how h5f[] is used twice: the inner one gets the object, and the outer one gets the data from the object reference. It's a subtlety that trips users new to references.

import numpy as np
import h5py

with h5py.File('SO_66410592.h5','w') as h5f :
    # Create 2 datasets using numpy arrays
    arr = np.arange(100).reshape(20,5)
    h5f.create_dataset('array1',data=arr)    
    arr = np.arange(100,0,-1).reshape(20,5)
    h5f.create_dataset('array2',data=arr) 
    
    # Create a dataset of OBJECT references: 
    h5f.create_dataset('O_refs', (10,), dtype=h5py.ref_dtype)
    h5f['O_refs'][0] = h5f['array1'].ref
    print (h5f['O_refs'][0])
    print (h5f[ h5f['O_refs'][0] ])
    print (h5f[ h5f['O_refs'][0] ][0,:])
    h5f['O_refs'][1] = h5f['array2'].ref
    print (h5f['O_refs'][1])
    print (h5f[ h5f['O_refs'][1] ])
    print (h5f[ h5f['O_refs'][1] ][-1,:])

    # Create a dataset of REGION references: 
    h5f.create_dataset('R_refs', (10,), dtype=h5py.regionref_dtype)
    h5f['R_refs'][0] = h5f['array1'].regionref[0,:]
    print (h5f['R_refs'][0])
    print (h5f[ h5f['R_refs'][0] ])    
    print (h5f[ h5f['R_refs'][0] ] [ h5f['R_refs'][0] ]) 
    h5f['R_refs'][1] = h5f['array2'].regionref[-1,:]
    print (h5f['R_refs'][1])
    print (h5f[ h5f['R_refs'][1] ])    
    print (h5f[ h5f['R_refs'][1] ] [ h5f['R_refs'][1] ]) 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.