0

I am trying to create .mat data files using python. The matlab code expects the data to have a certain format, where two-dimensional ndarrays of non-uniform sizes are stored as objects in a column vector. So, in my case, there would be k numpy arrays of shape (m_i, n) - with different m_i for each array - stored in a numpy array with dtype=object of shape (k, 1). I then add this object array to a dictionary and pass it to scipy.io.savemat().

This works fine so long as the m_i are indeed different. If all k arrays happen to have the same number of rows m_i, the behaviour becomes strange. First of all, it requires very explicit assignment to a numpy array of dtype=object that has been initialised to the final size k, otherwise numpy simply creates a three-dimensional array. But even when I have the correct format in python and store it to a .mat file using savemat, there is some kind of problem in the translation to the matlab format.

When I reload the data from the .mat file using scipy.io.loadmat, I find that I still have an object array of shape (k, 1), which still has elements of shape (m, n). However, each element is no longer an int or a float but is instead a numpy array of shape (1, 1) that has to be further indexed to access the contained int or float. So an individual element of an object vector that was supposed to be a numpy array of shape (2, 4) would look something like this:

[array([[array([[0.82374894]]), array([[0.50730055]]),
        array([[0.36721625]]), array([[0.45036349]])],
       [array([[0.26119276]]), array([[0.16843872]]),
        array([[0.28649524]]), array([[0.64239569]])]], dtype=object)]

This also poses a problem for the matlab code that I am trying to build my data files for. It runs fine for the arrays of objects that have different shapes but will break when there are arrays containing arrays of the same shape.

I know this is a rather obscure and possibly unavoidable issue but I figured I would see if anyone else has encountered it and found a fix. Thanks.

9
  • Regardless of the shape/format of the data, what's the problem with building an "adapter" function/class that would convert the information stored in the .mat file to whatever the rest of the code expects? Also, consider that you could write a script that pre-processes the files (by e.g. loading them in MATLAB and saving them in the format that the rest of the code expects). The key here is using the right tool for the job, which might be writing a simple function that turns whatever you have into whatever you need, instead of wasting time on getting python/MATLAB to work "just right". Commented May 21, 2019 at 14:32
  • Also, it might be worth tagging this with mat-file instead of scipy (which is the least relevant here, imho) - but it's up to you. One last thing that I think could improve your question - please provide a minimal reproducible example, and show us what is the structure you expect to get in MATLAB vs what you're actually getting. Commented May 21, 2019 at 14:35
  • I find it useful to create a sample file at the MATLAB (I use octave), and loadmat to see what the numpy equivalent is. Commented May 21, 2019 at 15:12
  • @Dev, scipy is the source package for loadmat It's a broad category, but I check it regularly. Commented May 21, 2019 at 15:22
  • @Dev-iL, I expected this to work "just right" as I assumed savemat was an appropriate tool and I could thereby save myself a lot more time than I would spend figuring it all out in matlab/octave, which I am unfamiliar with. @hpaulj I have sample files that work with the matlab code (that I also run with octave), and I can see their structure by loading with scipy's loadmat. The files I create also appear to be working fine, apart from the edge cases I mentioned where all subarrays have the same shape. Commented May 21, 2019 at 16:03

1 Answer 1

0

I'm not quite clear about the problem. Let me try to recreate your case:

In [58]: from scipy.io import loadmat, savemat                               
In [59]: A = np.empty((2,1), object)     
In [61]: A[0,0]=np.arange(4).reshape(2,2)                                    
In [62]: A[1,0]=np.arange(6).reshape(3,2)                                    
In [63]: A                                                                   
Out[63]: 
array([[array([[0, 1],
       [2, 3]])],
       [array([[0, 1],
       [2, 3],
       [4, 5]])]], dtype=object)
In [64]: B=A[[0,0],:]                                                        
In [65]: B                                                                   
Out[65]: 
array([[array([[0, 1],
       [2, 3]])],
       [array([[0, 1],
       [2, 3]])]], dtype=object)

As I explained earlier today, creating an object dtype array from arrays of matching size requires special handling. np.array(...) tries to create a higher dimensional array. https://stackoverflow.com/a/56243305/901925

Saving:

In [66]: savemat('foo.mat', {'A':A, 'B':B})                                  

Loading:

In [74]: loadmat('foo.mat')                                                  
Out[74]: 
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Tue May 21 11:20:42 2019',
 '__version__': '1.0',
 '__globals__': [],
 'A': array([[array([[0, 1],
        [2, 3]])],
        [array([[0, 1],
        [2, 3],
        [4, 5]])]], dtype=object),
 'B': array([[array([[0, 1],
        [2, 3]])],
        [array([[0, 1],
        [2, 3]])]], dtype=object)}
In [75]: _74['A'][1,0]                                                       
Out[75]: 
array([[0, 1],
       [2, 3],
       [4, 5]])

Your problem case looks like it's a object dtype array containing numbers:

In [89]: C = np.arange(4).reshape(2,2).astype(object)                        
In [90]: C                                                                   
Out[90]: 
array([[0, 1],
       [2, 3]], dtype=object)
In [91]: savemat('foo1.mat', {'C': C})                                       
In [92]: loadmat('foo1.mat')                                                 
Out[92]: 
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Tue May 21 11:39:31 2019',
 '__version__': '1.0',
 '__globals__': [],
 'C': array([[array([[0]]), array([[1]])],
        [array([[2]]), array([[3]])]], dtype=object)}

Evidently savemat has converted the integer objects into 2d MATLAB compatible arrays. In MATLAB everything, even scalars, is at least 2d.

===

And in Octave, the object dtype arrays all produce cells, and the 2d numeric arrays produce matrices:

>> load foo.mat
>> A
A =
{
  [1,1] =

    0  1
    2  3

  [2,1] =

    0  1
    2  3
    4  5

}
>> B
B =
{
  [1,1] =

    0  1
    2  3

  [2,1] =

    0  1
    2  3

}
>> load foo1.mat
>> C
C =
{
  [1,1] = 0
  [2,1] = 2
  [1,2] = 1
  [2,2] = 3
}

Python: Issue reading in str from MATLAB .mat file using h5py and NumPy

is a relatively recent SO that showed there's a difference between the Octave HDF5 and MATLAB.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.