4

I am having a hard time reading in an excel file with pandas DataFrame and converting a stored matrix to a numpy array. I think part of the issue is that the matrix is improperly stored. I have no control over the spreadsheet however, this is how it was sent to me.

For instance this is the string stored in a cell

[[[ 0.        0.        0.107851]
  [ 0.        0.       -0.862809]]]

I read in the row with DataFrame, and save each cell to a variable. I then try to convert this particular variable to a np.array since those number represent two sets of x, y, z coordinates.

I have tried np.fromstring and np.asarray to no avail. It will convert the string to a numpy array but it will be a terrible mess with the brackets still inside as characters. I have tried using np.squeeze to get rid of the brackets but it says dimension is not 1.

if I use np.asarray(item._coord, dtype=float) then it fails saying it cannot convert the string to float.

ValueError: could not convert string to float: '[[[ 0. 0. 0.107851] [ 0. 0. -0.862809]]]'

There is a '\n' that shows up in the middle of it, between the two lists. I use df = df.replace(r'\n', ' ',regex=True)' to clean out the\n`'s prior to data conversion attempts.

I am stuck

1 Answer 1

4

Use custom function for convert to numpy array after read_excel:

a= np.array([[[ 0.,        0.,        0.107851],
              [ 0.,        0.,       -0.862809]]])
print (a)
[[[ 0.        0.        0.107851]
  [ 0.        0.       -0.862809]]]

df = pd.DataFrame({'col':[a,a,a]})
print (df)
                                               col
0  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
1  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
2  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]

df.to_excel('test.xlsx', index=False)

import re
import ast
import numpy as np

#https://stackoverflow.com/a/44323021
def str2array(s):
    # Remove space after [
    s=re.sub('\[ +', '[', s.strip())
    # Replace commas and spaces
    s=re.sub('[,\s]+', ', ', s)
    return np.array(ast.literal_eval(s))

df = pd.read_excel('test.xlsx')

df['col'] = df['col'].apply(str2array)
print (df)
                                               col
0  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
1  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
2  [[[0.0, 0.0, 0.107851], [0.0, 0.0, -0.862809]]]
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, just trying to get this working. In making a MWE I had to leave out some things. I think this is going to work though. It will be a few more minutes before my incompetent self gets to a conclusion.
Awwwww Yeahhhhh, it runs like a well oiled machine!
Final comment - I used the very last option df['col'] = df['col'].apply(str2array)
@CharlieCrown - Mea culpa, I later realised it is not csv, but excel, so converter cannot be used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.