Converting pandas dataframe to structured arrays

Question

I have the following pandas dataframe

import pandas as pd
a = [2.5,3.3]
b = [3.6,3.9]
D = {'A': a, 'B': b}

which gives me something like

+---+-----+-----+
|   |  A  |  B  |
+---+-----+-----+
| 0 | 2.5 | 3.3 |
| 1 | 3.6 | 3.9 |
+---+-----+-----+

I want to convert this dataframe to a structured array like

data = np.rec.array([
('A', 2.5),
('A', 3.6),
('B', 3.3),
('B', 3.9),
], dtype = [('Type','|U5'),('Value', '<i8')])

I failed to find a way to make this happen since I'm new to pandas. I tried pd.to_records but the index is getting in the way and I cannot find a way around that.

Any help is appreciated. Thanks.

unutbu · Accepted Answer · 2017-10-19 19:43:41Z

10

Melt the DataFrame to make A and B (the column index) into a column. To get rid of the numeric index, make this new column the index. Then call to_records():

import pandas as pd
a = [2.5,3.3]
b = [3.6,3.9]
D = {'A': a, 'B': b}
df = pd.DataFrame(D)
result = (pd.melt(df, var_name='Type', value_name='Value')
          .set_index('Type').to_records())
print(repr(result))

yields

rec.array([('A',  2.5), ('A',  3.3), ('B',  3.6), ('B',  3.9)], 
          dtype=[('Type', 'O'), ('Value', '<f8')])

This is the key step:

In [167]: df
Out[167]: 
     A    B
0  2.5  3.6
1  3.3  3.9

In [168]: pd.melt(df)
Out[168]: 
  variable  value
0        A    2.5
1        A    3.3
2        B    3.6
3        B    3.9

Once you've melted the DataFrame, to_records (basically) returns the desired result:

In [169]: pd.melt(df).to_records()
Out[169]: 
rec.array([(0, 'A',  2.5), (1, 'A',  3.3), (2, 'B',  3.6), (3, 'B',  3.9)], 
          dtype=[('index', '<i8'), ('variable', 'O'), ('value', '<f8')])

edited Oct 19, 2017 at 19:43

answered Oct 19, 2017 at 19:37

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Xiaoyu Lu Over a year ago

Worked like a charm. Thanks a lot @unutbu!

AlexEl · Accepted Answer · 2023-01-07 11:00:31Z

4

works for me without melting

pandas Version: 1.5.2, numpy Version: 1.23.5, python 3.10.4

records = df.to_records(index=False)
data = np.array(records, dtype = records.dtype.descr)

answered Jan 7, 2023 at 11:00

AlexEl

412 bronze badges

Comments

BENY · Accepted Answer · 2017-10-19 19:37:46Z

2

np.rec.fromrecords(list(zip(df.melt().variable,df.melt().value)))
Out[531]: 
rec.array([('A',  2.5), ('A',  3.3), ('B',  3.6), ('B',  3.9)], 
          dtype=[('f0', '<U1'), ('f1', '<f8')])

answered Oct 19, 2017 at 19:37

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

skrubber · Accepted Answer · 2017-10-19 19:43:45Z

0

You can melt and call to_records:

pd.melt(df).to_records(index=False)

answered Oct 19, 2017 at 19:43

skrubber

1,1051 gold badge9 silver badges19 bronze badges

1 Comment

MRule Over a year ago

This returns np.recarray, not a structured np.ndarray; I've tested this and converting to np.ndarray after the fact; Using pd.melt(df).to_records(index=False).view(np.ndarray) doesn't seem to lead to the desired result. df.to_records(index=False).view(np.ndarray) seems to work, but seems to yield a slightly different type from the other answers.

GarrukApex · Accepted Answer · 2024-11-04 19:02:01Z

0

None of these worked for me, as soon as I tried to do anything with the ndarray I got an error like:

Cannot cast array data from dtype((numpy.record, [('14', '<f8'), ('15', '<f8'), ('16', '<f8'), ....

What did work was the pandas builtin function for converting to numpy!

data = df.to_numpy(dtype='float')

answered Nov 4, 2024 at 19:02

GarrukApex

1633 silver badges10 bronze badges

Collectives™ on Stack Overflow

Converting pandas dataframe to structured arrays

5 Answers 5

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related