Convert structured numpy array (containing sub-arrays) to pandas dataframe

Question

Problem

As an example, consider the following structured numpy array (containing sub-arrays):

data = [
    (1, (5., 3., 7.), 6),
    (2, (2., 1., 3.), 9),
    (3, (3., 8., 4.), 3),
    (4, (1., 7., 4.), 2),
]
dtype = [('A', '<i8'), ('B', '<f8', (3,)), ('C', '<i8')]
arr = np.array(data, dtype=dtype)

I would like to convert this array arr into a pandas dataframe that looks like this:

   A  B_1  B_2  B_3  C
0  1  5.0  3.0  7.0  6
1  2  2.0  1.0  3.0  9
2  3  3.0  8.0  4.0  3
3  4  1.0  7.0  4.0  2

Tried thus far

I've tried to use pandas' method from_records to perform the conversion:

df = pd.DataFrame.from_records(arr)

but this throws the error Exception: Data must be 1-dimensional.

Question

What would be a good way to perform such a conversion to pandas dataframe?

BENY · Accepted Answer · 2020-03-12 23:25:16Z

3

This can be flatten with two pd.DataFrame call

df=pd.DataFrame(arr.tolist())
df=df.join(pd.DataFrame(df[1].tolist()).add_prefix('B'))
Out[404]: 
   0                1  2   B0   B1   B2
0  1  [5.0, 3.0, 7.0]  6  5.0  3.0  7.0
1  2  [2.0, 1.0, 3.0]  9  2.0  1.0  3.0
2  3  [3.0, 8.0, 4.0]  3  3.0  8.0  4.0
3  4  [1.0, 7.0, 4.0]  2  1.0  7.0  4.0

answered Mar 12, 2020 at 23:25

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Georgina Skibinski · Accepted Answer · 2020-03-12 23:46:19Z

1

You can do (assuming you know, that column B is the one to be expanded, you can iterate over dtype if you need to automate it further- to get the ones of compound type)

df=pd.DataFrame.from_records(map(lambda x: list(x), arr), columns=arr.dtype.names)
df2=pd.DataFrame(df["B"].tolist())
df2.columns=map(lambda x: f"B_{x+1}", df2.columns)

df=pd.concat([df, df2], sort=False, axis=1).drop(columns="B")

Outputs:

   A  C  B_1  B_2  B_3
0  1  6  5.0  3.0  7.0
1  2  9  2.0  1.0  3.0
2  3  3  3.0  8.0  4.0
3  4  2  1.0  7.0  4.0

edited Mar 12, 2020 at 23:46

answered Mar 12, 2020 at 23:41

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

Comments

hpaulj · Accepted Answer · 2020-03-12 23:59:12Z

In [56]: data = [ 
    ...:     (1, (5., 3., 7.), 6), 
    ...:     (2, (2., 1., 3.), 9), 
    ...:     (3, (3., 8., 4.), 3), 
    ...:     (4, (1., 7., 4.), 2), 
    ...: ] 
    ...: dtype = [('A', '<i8'), ('B', '<f8', (3,)), ('C', '<i8')] 
    ...: arr = np.array(data, dtype=dtype)                                                     
In [57]: arr                                                                                   
Out[57]: 
array([(1, [5., 3., 7.], 6), (2, [2., 1., 3.], 9), (3, [3., 8., 4.], 3),
       (4, [1., 7., 4.], 2)],
      dtype=[('A', '<i8'), ('B', '<f8', (3,)), ('C', '<i8')])

Looks like the newish structure_to_unstructured can handle this dtype:

In [59]: import numpy.lib.recfunctions as rf                                                   
In [60]: rf.structured_to_unstructured(arr)                                                    
Out[60]: 
array([[1., 5., 3., 7., 6.],
       [2., 2., 1., 3., 9.],
       [3., 3., 8., 4., 3.],
       [4., 1., 7., 4., 2.]])

then make the dataframe in the usual way.

In [63]: pd.DataFrame(_60, columns=['A','B1','B2','B3','C'])                                   
Out[63]: 
     A   B1   B2   B3    C
0  1.0  5.0  3.0  7.0  6.0
1  2.0  2.0  1.0  3.0  9.0
2  3.0  3.0  8.0  4.0  3.0
3  4.0  1.0  7.0  4.0  2.0

and add in the column dtypes

In [74]: df = pd.DataFrame(_60, columns=['A','B1','B2','B3','C'])                              
In [75]: df['A']=df['A'].astype(int)                                                           
In [76]: df['C']=df['C'].astype(int)                                                           
In [77]: df                                                                                    
Out[77]: 
   A   B1   B2   B3  C
0  1  5.0  3.0  7.0  6
1  2  2.0  1.0  3.0  9
2  3  3.0  8.0  4.0  3
3  4  1.0  7.0  4.0  2

Collectives™ on Stack Overflow

Convert structured numpy array (containing sub-arrays) to pandas dataframe

Problem

Tried thus far

Question

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Problem

Tried thus far

Question

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related