1

Let's assume that I have DataFrame like this:

Type   Vector
A      [0.2340, 0.5463, 0.5652, 0.3243, 0.3243]
B      [0.3244, 0.5566, 0.2344, 0.1213, 0.9821]
C      [0,5652,  0.3453, 0.3454, 0.5656, 0.6766]
D      [0,5125,  0.3345, 0.1112, 0.4545, 0.6324]

I want to calculate the distances of these vectors by using np.linalg.norm. What I want to get is

Type   Vector                                    distance1   distance2 
 A     [0.2340, 0.5463, 0.5652, 0.3243, 0.3243]   A-B          A-C
distance3
A-D

as new columns. Edit: I have done this also:

df['vector'] = df['vector'].apply(lambda x: np.array(x)) 
print(type(df['vector'].iloc[0]))

Result is :

<class 'numpy.ndarray'>

When I simply say :

print(np.linalg.norm(df['vector'].iloc[0] -df['vector'].iloc[1]))

I get a float value

However I iterate over the rows I get:

ValueError: Wrong number of items passed 544, placement implies 1

How I could solve it ? Note: Vectors are indeed 544 character long

6
  • Can you read those as an array? Commented Nov 21, 2017 at 11:28
  • I am reading them from pickle file Commented Nov 21, 2017 at 11:28
  • Can you save into an array? Commented Nov 21, 2017 at 11:29
  • Actually I saved them as an array, but then I merge it with different file and it becomes an object Commented Nov 21, 2017 at 13:27
  • can you upload the file with the data ? Commented Nov 21, 2017 at 13:57

1 Answer 1

1

If you are working with pickle, use pandas pickle-import:

import pandas as pd

df = pd.read_pickle('your_file_name')

Since pandas is build upon numpy, you can now get your desired column as numpy array:

import numpy as np:

np.linalg.norm(x = df['your column'])

Please watch for your vectors - They do not have the same size! For example C and D have the length 6. I assume your comma was meant to be a point for your first value.

Edit:

A full example would be:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A':[0.2340, 0.5463, 0.5652, 0.3243, 0.3243],
    'B':[0.3244, 0.5566, 0.2344, 0.1213, 0.9821],
    'C':[0.5652,  0.3453, 0.3454, 0.5656, 0.6766],
    'D':[0.5125,  0.3345, 0.1112, 0.4545, 0.6324]
})

df_distances = df.transpose()           #Transpose columns to rows

for col in df:
    for col2 in df:
        df_distances["{}_{}".format(col, col2)] = np.linalg.norm(df[col] - df[col2])

Edit 2 (related to my comment):

I recommend you rather to generate a list or dict with your wanted values, since appending everything to a table might result in a very large table. The code would then look like:

dic = {}

for col in df:
    for col2 in df:
        dic["{}_{}".format(col, col2)] = np.linalg.norm(df[col] - df[col2])
Sign up to request clarification or add additional context in comments.

3 Comments

Yeah but then how I will get the desired output ?
I updated the answer. Sorry, misunterstood you there. The code will generate and append columns with all possible distances. Please be aware that this might result in a large table. You should better just extract the values and not append it to your table.
I think the problem is related to dataset probably, so I have to say skip if not numpy array, somewhere

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.