2

I am having difficulty loading in 'str' variables 'Et' (Endtime) and 'St' (Starttime) from a MATLAB .mat file into Python.

I want identical output as in MATLAB. Instead I have had issues trying to solve this. See below for Python code and output.

# Import numpy and h5py to load in .mat files
import numpy as np
import h5py 

# Load in Matlab ('-v7.3') data
fname = 'directory/file.mat'
f = h5py.File(fname,'r') 

# create dictionary for data
data= {"average":np.array(f.get('average')),"median":np.array(f.get('median')), \
             "stdev":np.array(f.get('stdev')),"P10":np.array(f.get('p10')), \
             "P90":np.array(f.get('p90')),"St":np.str(f.get('stime')), \
             "Et":np.str(f.get('etime'))}
# All other variables are arrays

print(data["Et"])

output:

<HDF5 dataset "etime": shape (1, 6), type "<u4">

I want to have a string in python equal to the string in MATLAB. In other words, I want print(data["Et"]) = '01011212000000' which is the date and time.

How can I solve this?

An example of the data in MATLAB: example

9
  • 1
    At least with Octave 'hdf5' file, f['average'] has 2 datasets, 'type' and 'value'. It's a good idea to read both separately. For a string type is b'sq_string', and value is a (n,1) array of 'int8' dtype. That could, I think be cast to a Python bytestring. There have been a few of SO questions that explore loading hdf5 mat files, though I don't recall if any looked at strings. Commented Feb 13, 2019 at 1:36
  • 1
  • 1
    What is f.get('etime')? Is it a group or a dataset? If a group, does it have any keys? Commented Feb 13, 2019 at 2:42
  • 1
    Try np.array(f.get('etime')). Load it as an array; we might be able to 'decode' it after, as I do in my In[138]. Commented Feb 13, 2019 at 6:27
  • 1
    Let's refine that np.array(f.get('etime'), dtype='<u4'). Or use bytes as suggested by `@machnic. Commented Feb 13, 2019 at 23:52

3 Answers 3

2

If you don't mind the variable type of etime and stime stored in file.mat and you can store them as type char instead of string, you could read them in Python by: bytes(f.get(your_variable).value).decode('utf-8'). In your case:

data = {
    "average": np.array(f.get('average')),
    "median": np.array(f.get('median')),
    "stdev": np.array(f.get('stdev')),
    "P10": np.array(f.get('p10')),
    "P90": np.array(f.get('p90')),
    "St": bytes(f.get('stime')[:]).decode('utf-8'),
    "Et": bytes(f.get('etime')[:]).decode('utf-8')
}

I'm sure there is also a way of reading string type, but this might be the simplest solution.

Sign up to request clarification or add additional context in comments.

Comments

1

In Octave

>> x = 1:10;
>> y = reshape(1:12, 3,4);
>> et = '0101121200000';
>> xt = 'a string';
>> save -hdf5 testh5.mat x y et xt

In a numpy session:

In [130]: f = h5py.File('testh5.mat','r')
In [131]: list(f.keys())
Out[131]: ['et', 'x', 'xt', 'y']
In [132]: list(f['y'].keys())
Out[132]: ['type', 'value']
In [133]: f['x/type'].value
Out[133]: b'range'
In [134]: f['y/type'].value
Out[134]: b'matrix'
In [135]: f['y/value'].value
Out[135]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.],
       [10., 11., 12.]])
In [136]: f['et/type'].value
Out[136]: b'sq_string'
In [137]: f['et/value'].value
Out[137]: 
array([[48],
       [49],
       [48],
       [49],
       [49],
       [50],
       [49],
       [50],
       [48],
       [48],
       [48],
       [48],
       [48]], dtype=int8)
In [138]: f['et/value'].value.ravel().view('S13')
Out[138]: array([b'0101121200000'], dtype='|S13')
In [139]: f['xt/value'].value.ravel().view('S8')
Out[139]: array([b'a string'], dtype='|S8')
In [140]: f.close()

how to import .mat-v7.3 file using h5py

Opening a mat file using h5py and convert data into a numpy matrix

====

bytes also works in my file

In [220]: bytes(f['xt/value'].value)
Out[220]: b'a string'
In [221]: bytes(f['et/value'].value)
Out[221]: b'0101121200000'

5 Comments

This doesn't work for me. When using list(f['average'].keys()) I get the following error: AttributeError: 'Dataset' object has no attribute 'keys'.
OK, in my version, f['average'] is a group with 2 datasets. Apparently in yours f['average'] is the dataset itself. I don't have your file so can't explore it myself.
'average' is a 9 x 365 matrix containing mostly NaNs with a few floats here and there.
Digging around I see there's a greater difference between MATLAB v7.3 and Octave's hdf5. Without a sample file I can't help.
@hpualj I have added an image of the data in MATLAB. I couldn't find a way to attach a .mat file
0

When I need to load .mat I use scipy and it works fine:

import scipy.io
mat = scipy.io.loadmat('fileName.mat')

3 Comments

Sounds like the OP has saved the .mat with the newer hdf5 mode, not a loadmat compatible one.
I cannot see any string variables when following this procedure. Output: dict_keys(['__header__', '__version__', '__globals__', 'average', 'stdev', 'median', 'P90', 'P10', 'None', '__function_workspace__'])
No Et or St. Note: don't worry about the NaNs - they are supposed to be.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.