Converting columns to a numpy array

Question

Let's assume that I have DataFrame like this:

Type   Vector
A      [0.2340, 0.5463, 0.5652, 0.3243, 0.3243]
B      [0.3244, 0.5566, 0.2344, 0.1213, 0.9821]
C      [0,5652,  0.3453, 0.3454, 0.5656, 0.6766]
D      [0,5125,  0.3345, 0.1112, 0.4545, 0.6324]

I want to calculate the distances of these vectors by using np.linalg.norm. What I want to get is

Type   Vector                                    distance1   distance2 
 A     [0.2340, 0.5463, 0.5652, 0.3243, 0.3243]   A-B          A-C
distance3
A-D

as new columns. Edit: I have done this also:

df['vector'] = df['vector'].apply(lambda x: np.array(x)) 
print(type(df['vector'].iloc[0]))

Result is :

<class 'numpy.ndarray'>

When I simply say :

print(np.linalg.norm(df['vector'].iloc[0] -df['vector'].iloc[1]))

I get a float value

However I iterate over the rows I get:

ValueError: Wrong number of items passed 544, placement implies 1

How I could solve it ? Note: Vectors are indeed 544 character long

Actually I saved them as an array, but then I merge it with different file and it becomes an object — Dogukan Yılmaz
– Dogukan Yılmaz, Commented Nov 21, 2017 at 13:27

Markus · Accepted Answer · 2017-11-21 15:02:47Z

1

If you are working with pickle, use pandas pickle-import:

import pandas as pd

df = pd.read_pickle('your_file_name')

Since pandas is build upon numpy, you can now get your desired column as numpy array:

import numpy as np:

np.linalg.norm(x = df['your column'])

Please watch for your vectors - They do not have the same size! For example C and D have the length 6. I assume your comma was meant to be a point for your first value.

Edit:

A full example would be:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A':[0.2340, 0.5463, 0.5652, 0.3243, 0.3243],
    'B':[0.3244, 0.5566, 0.2344, 0.1213, 0.9821],
    'C':[0.5652,  0.3453, 0.3454, 0.5656, 0.6766],
    'D':[0.5125,  0.3345, 0.1112, 0.4545, 0.6324]
})

df_distances = df.transpose()           #Transpose columns to rows

for col in df:
    for col2 in df:
        df_distances["{}_{}".format(col, col2)] = np.linalg.norm(df[col] - df[col2])

Edit 2 (related to my comment):

I recommend you rather to generate a list or dict with your wanted values, since appending everything to a table might result in a very large table. The code would then look like:

dic = {}

for col in df:
    for col2 in df:
        dic["{}_{}".format(col, col2)] = np.linalg.norm(df[col] - df[col2])

edited Nov 21, 2017 at 15:02

answered Nov 21, 2017 at 11:49

Markus

2,4975 gold badges34 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dogukan Yılmaz Over a year ago

Yeah but then how I will get the desired output ?

Markus Over a year ago

I updated the answer. Sorry, misunterstood you there. The code will generate and append columns with all possible distances. Please be aware that this might result in a large table. You should better just extract the values and not append it to your table.

Dogukan Yılmaz Over a year ago

I think the problem is related to dataset probably, so I have to say skip if not numpy array, somewhere

Collectives™ on Stack Overflow

Converting columns to a numpy array

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related