2

Problem

As an example, consider the following structured numpy array (containing sub-arrays):

data = [
    (1, (5., 3., 7.), 6),
    (2, (2., 1., 3.), 9),
    (3, (3., 8., 4.), 3),
    (4, (1., 7., 4.), 2),
]
dtype = [('A', '<i8'), ('B', '<f8', (3,)), ('C', '<i8')]
arr = np.array(data, dtype=dtype)

I would like to convert this array arr into a pandas dataframe that looks like this:

   A  B_1  B_2  B_3  C
0  1  5.0  3.0  7.0  6
1  2  2.0  1.0  3.0  9
2  3  3.0  8.0  4.0  3
3  4  1.0  7.0  4.0  2

Tried thus far

I've tried to use pandas' method from_records to perform the conversion:

df = pd.DataFrame.from_records(arr)

but this throws the error Exception: Data must be 1-dimensional.

Question

What would be a good way to perform such a conversion to pandas dataframe?

3 Answers 3

3

This can be flatten with two pd.DataFrame call

df=pd.DataFrame(arr.tolist())
df=df.join(pd.DataFrame(df[1].tolist()).add_prefix('B'))
Out[404]: 
   0                1  2   B0   B1   B2
0  1  [5.0, 3.0, 7.0]  6  5.0  3.0  7.0
1  2  [2.0, 1.0, 3.0]  9  2.0  1.0  3.0
2  3  [3.0, 8.0, 4.0]  3  3.0  8.0  4.0
3  4  [1.0, 7.0, 4.0]  2  1.0  7.0  4.0
Sign up to request clarification or add additional context in comments.

Comments

1

You can do (assuming you know, that column B is the one to be expanded, you can iterate over dtype if you need to automate it further- to get the ones of compound type)

df=pd.DataFrame.from_records(map(lambda x: list(x), arr), columns=arr.dtype.names)
df2=pd.DataFrame(df["B"].tolist())
df2.columns=map(lambda x: f"B_{x+1}", df2.columns)

df=pd.concat([df, df2], sort=False, axis=1).drop(columns="B")

Outputs:

   A  C  B_1  B_2  B_3
0  1  6  5.0  3.0  7.0
1  2  9  2.0  1.0  3.0
2  3  3  3.0  8.0  4.0
3  4  2  1.0  7.0  4.0

Comments

1
In [56]: data = [ 
    ...:     (1, (5., 3., 7.), 6), 
    ...:     (2, (2., 1., 3.), 9), 
    ...:     (3, (3., 8., 4.), 3), 
    ...:     (4, (1., 7., 4.), 2), 
    ...: ] 
    ...: dtype = [('A', '<i8'), ('B', '<f8', (3,)), ('C', '<i8')] 
    ...: arr = np.array(data, dtype=dtype)                                                     
In [57]: arr                                                                                   
Out[57]: 
array([(1, [5., 3., 7.], 6), (2, [2., 1., 3.], 9), (3, [3., 8., 4.], 3),
       (4, [1., 7., 4.], 2)],
      dtype=[('A', '<i8'), ('B', '<f8', (3,)), ('C', '<i8')])

Looks like the newish structure_to_unstructured can handle this dtype:

In [59]: import numpy.lib.recfunctions as rf                                                   
In [60]: rf.structured_to_unstructured(arr)                                                    
Out[60]: 
array([[1., 5., 3., 7., 6.],
       [2., 2., 1., 3., 9.],
       [3., 3., 8., 4., 3.],
       [4., 1., 7., 4., 2.]])

then make the dataframe in the usual way.

In [63]: pd.DataFrame(_60, columns=['A','B1','B2','B3','C'])                                   
Out[63]: 
     A   B1   B2   B3    C
0  1.0  5.0  3.0  7.0  6.0
1  2.0  2.0  1.0  3.0  9.0
2  3.0  3.0  8.0  4.0  3.0
3  4.0  1.0  7.0  4.0  2.0

and add in the column dtypes

In [74]: df = pd.DataFrame(_60, columns=['A','B1','B2','B3','C'])                              
In [75]: df['A']=df['A'].astype(int)                                                           
In [76]: df['C']=df['C'].astype(int)                                                           
In [77]: df                                                                                    
Out[77]: 
   A   B1   B2   B3  C
0  1  5.0  3.0  7.0  6
1  2  2.0  1.0  3.0  9
2  3  3.0  8.0  4.0  3
3  4  1.0  7.0  4.0  2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.