1

I need to extract the Word Embeddings for a text dataset. Since Elmo takes a lot of time for a huge dataset, I tried to parallelize the process by dividing it into batches and store the values in a CSV File. Now I have a data frame that consists of around 1024 Columns which contains the word embeddings.

Example Dataframe:

Col 1 Col 2 Col 3
0.1 0.25 0.4
0.2 0.3 -0.1

What I need to do is to combine the values row-wise to a column and this needs to be a Numpy Array rather than a list.
This is what I need it to look like:
PS: The values in Col 4 need to be of type NumPy array.

Col 1 Col 2 Col 3 Col 4
0.1 0.25 0.4 [0.1,0.25,0.4]
0.2 0.3 -0.1 [0.2,0.3,-0.1]

What I've tried so far:

np.array(DF.iloc[:,0:1023].values.tolist())

But this throws the following error:

ValueError: Wrong number of items passed 1023, placement implies 1

How do I do this? Any advice would be helpful. Thanks in advance!

0

4 Answers 4

3

Try apply on axis 1 with to_numpy:

import pandas as pd

df = pd.DataFrame({'Col 1': {0: 0.1, 1: 0.2},
                   'Col 2': {0: 0.25, 1: 0.3},
                   'Col 3': {0: 0.4, 1: -0.1}})

df['Col 4'] = df.apply(lambda s: s.to_numpy(), axis=1)

print(df)

df:

   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]
Sign up to request clarification or add additional context in comments.

Comments

2

You are close, need .tolist() after converting to numpy array:

df['Col 4'] = np.array(df.to_numpy()).tolist()
print (df)
   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]

For your data:

DF['Col 4'] = np.array(DF.iloc[:,0:1023].to_numpy().tolist())

Comments

0
import pandas as pd
import numpy as np

You can use apply() method and array() method:

df['Col4']=np.array(df.apply(np.array,1))

Output of df:

   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]

Comments

0

You can use np.array within .apply(), as follows:

df['Col 4'] = df.apply(np.array, axis=1)

Result:

print(df)

   Col 1  Col 2  Col 3             Col 4
0    0.1   0.25    0.4  [0.1, 0.25, 0.4]
1    0.2   0.30   -0.1  [0.2, 0.3, -0.1]


df['Col 4'].map(type)

0    <class 'numpy.ndarray'>
1    <class 'numpy.ndarray'>
Name: Col 4, dtype: object

4 Comments

btw it is exact same like stackoverflow.com/a/67518811/14289892 but you just removed upper covering...ie np.array()
Same result, but simplified way of doing the same thing. No need to use 2 np.array(). Just one is enough. This is the subtle difference that we need to notice.
even If you just remove np.array() but It is exact same solution....bruh..btw I agree that it is more simplified there is no need
I think we got to avoid redundant codes. Of course you can say 1 * (2 + 3) is the same as (2 + 3) but which one would you use ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.