6

I have the following pandas dataframe

import pandas as pd
a = [2.5,3.3]
b = [3.6,3.9]
D = {'A': a, 'B': b}

which gives me something like

+---+-----+-----+
|   |  A  |  B  |
+---+-----+-----+
| 0 | 2.5 | 3.3 |
| 1 | 3.6 | 3.9 |
+---+-----+-----+ 

I want to convert this dataframe to a structured array like

data = np.rec.array([
('A', 2.5),
('A', 3.6),
('B', 3.3),
('B', 3.9),
], dtype = [('Type','|U5'),('Value', '<i8')])

I failed to find a way to make this happen since I'm new to pandas. I tried pd.to_records but the index is getting in the way and I cannot find a way around that.

Any help is appreciated. Thanks.

0

5 Answers 5

10

Melt the DataFrame to make A and B (the column index) into a column. To get rid of the numeric index, make this new column the index. Then call to_records():

import pandas as pd
a = [2.5,3.3]
b = [3.6,3.9]
D = {'A': a, 'B': b}
df = pd.DataFrame(D)
result = (pd.melt(df, var_name='Type', value_name='Value')
          .set_index('Type').to_records())
print(repr(result))

yields

rec.array([('A',  2.5), ('A',  3.3), ('B',  3.6), ('B',  3.9)], 
          dtype=[('Type', 'O'), ('Value', '<f8')])

This is the key step:

In [167]: df
Out[167]: 
     A    B
0  2.5  3.6
1  3.3  3.9

In [168]: pd.melt(df)
Out[168]: 
  variable  value
0        A    2.5
1        A    3.3
2        B    3.6
3        B    3.9

Once you've melted the DataFrame, to_records (basically) returns the desired result:

In [169]: pd.melt(df).to_records()
Out[169]: 
rec.array([(0, 'A',  2.5), (1, 'A',  3.3), (2, 'B',  3.6), (3, 'B',  3.9)], 
          dtype=[('index', '<i8'), ('variable', 'O'), ('value', '<f8')])
Sign up to request clarification or add additional context in comments.

1 Comment

Worked like a charm. Thanks a lot @unutbu!
4

works for me without melting

pandas Version: 1.5.2, numpy Version: 1.23.5, python 3.10.4

records = df.to_records(index=False)
data = np.array(records, dtype = records.dtype.descr)

Comments

2
np.rec.fromrecords(list(zip(df.melt().variable,df.melt().value)))
Out[531]: 
rec.array([('A',  2.5), ('A',  3.3), ('B',  3.6), ('B',  3.9)], 
          dtype=[('f0', '<U1'), ('f1', '<f8')])

Comments

0

You can melt and call to_records:

pd.melt(df).to_records(index=False)

1 Comment

This returns np.recarray, not a structured np.ndarray; I've tested this and converting to np.ndarray after the fact; Using pd.melt(df).to_records(index=False).view(np.ndarray) doesn't seem to lead to the desired result. df.to_records(index=False).view(np.ndarray) seems to work, but seems to yield a slightly different type from the other answers.
0

None of these worked for me, as soon as I tried to do anything with the ndarray I got an error like:

Cannot cast array data from dtype((numpy.record, [('14', '<f8'), ('15', '<f8'), ('16', '<f8'), ....

What did work was the pandas builtin function for converting to numpy!

data = df.to_numpy(dtype='float')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.