Pandas-Merge Multiple Column Values to a NumPy Array

Question

I need to extract the Word Embeddings for a text dataset. Since Elmo takes a lot of time for a huge dataset, I tried to parallelize the process by dividing it into batches and store the values in a CSV File. Now I have a data frame that consists of around 1024 Columns which contains the word embeddings.

Example Dataframe:

Col 1	Col 2	Col 3
0.1	0.25	0.4
0.2	0.3	-0.1

What I need to do is to combine the values row-wise to a column and this needs to be a Numpy Array rather than a list.
This is what I need it to look like:
PS: The values in Col 4 need to be of type NumPy array.

Col 1	Col 2	Col 3	Col 4
0.1	0.25	0.4	[0.1,0.25,0.4]
0.2	0.3	-0.1	[0.2,0.3,-0.1]

What I've tried so far:

np.array(DF.iloc[:,0:1023].values.tolist())

But this throws the following error:

ValueError: Wrong number of items passed 1023, placement implies 1

How do I do this? Any advice would be helpful. Thanks in advance!

Henry Ecker · Accepted Answer · 2021-05-13 11:57:52Z

3

Try apply on axis 1 with to_numpy:

import pandas as pd

df = pd.DataFrame({'Col 1': {0: 0.1, 1: 0.2},
                   'Col 2': {0: 0.25, 1: 0.3},
                   'Col 3': {0: 0.4, 1: -0.1}})

df['Col 4'] = df.apply(lambda s: s.to_numpy(), axis=1)

print(df)

df:

   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]

answered May 13, 2021 at 11:57

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2021-05-13 12:01:38Z

2

You are close, need .tolist() after converting to numpy array:

df['Col 4'] = np.array(df.to_numpy()).tolist()
print (df)
   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]

For your data:

DF['Col 4'] = np.array(DF.iloc[:,0:1023].to_numpy().tolist())

answered May 13, 2021 at 12:01

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

Anurag Dabas · Accepted Answer · 2021-05-13 12:01:40Z

0

import pandas as pd
import numpy as np

You can use apply() method and array() method:

df['Col4']=np.array(df.apply(np.array,1))

Output of df:

   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]

edited May 13, 2021 at 12:01

answered May 13, 2021 at 11:56

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Comments

SeaBean · Accepted Answer · 2021-05-13 12:28:02Z

0

You can use np.array within .apply(), as follows:

df['Col 4'] = df.apply(np.array, axis=1)

Result:

print(df)

   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]


df['Col 4'].map(type)

0    <class 'numpy.ndarray'>
1    <class 'numpy.ndarray'>
Name: Col 4, dtype: object

answered May 13, 2021 at 12:28

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

4 Comments

Anurag Dabas Over a year ago

btw it is exact same like stackoverflow.com/a/67518811/14289892 but you just removed upper covering...ie np.array()

SeaBean Over a year ago

Same result, but simplified way of doing the same thing. No need to use 2 np.array(). Just one is enough. This is the subtle difference that we need to notice.

Anurag Dabas Over a year ago

even If you just remove np.array() but It is exact same solution....bruh..btw I agree that it is more simplified there is no need

SeaBean Over a year ago

I think we got to avoid redundant codes. Of course you can say 1 * (2 + 3) is the same as (2 + 3) but which one would you use ?

Collectives™ on Stack Overflow

Pandas-Merge Multiple Column Values to a NumPy Array

4 Answers 4

Comments

Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related