How to convert vector wrapped as string to numpy array in pandas dataframe?

Question

I have a pandas dataframe with a column of vectors that I would like to perform matrix arithmetic on. However, upon closer inspection the vectors are all wrapped as strings with new line characters seemingly embedded in them:

How do I convert each vector in this column into numpy arrays? I've tried

df['Word Vector'].as_matrix

and

np.array(df['Word Vector'])

as well as

df['Word Vector'] = df['Word Vector'].astype(np.array)

but none produced the desired result. Any pointers would be appreciated!

profide an example of your data that we can experiment with. — Mohamed Ali JAMAOUI
– Mohamed Ali JAMAOUI, Commented Aug 16, 2017 at 8:38
@MedAli what would be the best way to do so? I wasn't sure of the process was that generated this format, how can I upload a sample of the dataframe to stackoverflow? — Matt
– Matt, Commented Aug 16, 2017 at 17:26

White · Accepted Answer · 2017-08-16 09:38:43Z

Hope the following works as what you expected

import pandas as pd
import numpy as np

x = str(np.arange(1,100))
df = pd.DataFrame([x,x,x,x])
df.columns = ['words']
print 'sample'
print df.head()
result = df['words'].apply(lambda x: 
                           np.fromstring(
                               x.replace('\n','')
                                .replace('[','')
                                .replace(']','')
                                .replace('  ',' '), sep=' '))
print 'result'
print result

output as following

    sample
                                               words
0  [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 ...
1  [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 ...
2  [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 ...
3  [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 ...
result
0    [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, ...
1    [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, ...
2    [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, ...
3    [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, ...

It is not elegant to call replace function so many times. However I did not find better approach. Anyway it should help you to convert string to vectors.

A side note, as data is presented in picture, You'd better check whether your data separation is done by space or tab. If it is tab, change sep=' ' to sep='\t'

Nic Scozzaro · Accepted Answer · 2021-04-06 05:10:28Z

2

This worked for me for string lists in a Pandas column:

df['Numpy Word Vector'] = df['Word Vector'].apply(eval).apply(np.array)

answered Apr 6, 2021 at 5:10

Nic Scozzaro

7,4733 gold badges47 silver badges49 bronze badges

Comments

Vlad · Accepted Answer · 2020-09-21 08:12:56Z

0

The solution below is shorter:

df[col_name] = df[col_name].apply(lambda x: np.array(eval(x)), 0)

Example:

df = pd.DataFrame(['[0., 1., 2., 3.]', '[1., 2., 3., 4.]'], columns=['Word Vector'])
df['Word Vector'][0] # '[0., 1., 2., 3.]'

df['Word Vector'] = df['Word Vector'].apply(lambda x: np.array(eval(x)), 0)
df['Word Vector'][0] # array([0., 1., 2., 3.])

answered Sep 21, 2020 at 8:12

Vlad

8,6255 gold badges38 silver badges47 bronze badges

Collectives™ on Stack Overflow

How to convert vector wrapped as string to numpy array in pandas dataframe?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related