Pandas dataframe reading numpy array column as str

Question

I have two Python scripts, one that creates a .csv file and the other one that reads it.

This is how I save the dataframe in the first file:

df['matrix'] = df['matrix'].apply(lambda x: np.array(x))
df.to_csv("Matrices.csv", sep=",", index=False)

The type and shape of df['matrix'].iloc[0] is <class 'numpy.ndarray'> and (24, 60) respectively.

In the second script when I try

print ("type of df['matrix'].iloc[0]", type(df['matrix'].iloc[0]))

The output is type of df['matrix'].iloc[0] <class 'str'>

How can I make sure that df['matrix'] doesn't loose its nature?

What does the csv look like? How did it render the array object? My guess it included [], as might be produced by str(df['matrix'][0]. — hpaulj
– hpaulj, Commented Feb 10, 2019 at 19:07

jezrael · Accepted Answer · 2019-02-10 09:08:14Z

2

If want save and read only numpy array use savetxt and genfromtxt.

If there are multiple columns then use:

Use pickle:

df.to_pickle('file.pkl')
df = pd.read_pickle('file.pkl')

Convert arrays to multiple columns and then write to file:

a = np.array(
[[219,220,221],
 [154,152,14],
 [205,202,192]])

df = pd.DataFrame({'matrix':a.tolist(), 'b':np.arange(len(a))})
print (df)
            matrix  b
0  [219, 220, 221]  0
1   [154, 152, 14]  1
2  [205, 202, 192]  2

df1 = pd.DataFrame(df.pop('matrix').values.tolist(), index=df.index).add_prefix('mat_')
print (df1)
   mat_0  mat_1  mat_2
0    219    220    221
1    154    152     14
2    205    202    192

df = df.join(df1)
print (df)
   b  mat_0  mat_1  mat_2
0  0    219    220    221
1  1    154    152     14
2  2    205    202    192

But if really need to convert values to array need converter with ast.literal_eval:

import ast

df.to_csv('testing.csv', index=False)

df = pd.read_csv('testing.csv', converters={'matrix':lambda x: np.array(ast.literal_eval(x))})
print (type(df.loc[0, 'matrix']))

<class 'numpy.ndarray'>

edited Feb 10, 2019 at 9:08

answered Feb 10, 2019 at 8:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

yamini goel Over a year ago

I was initially using np.array(list(map(literal_eval, df['matrix']))) but my Python interpreter collapses while working on full dataset. Is there any other alternative?

jezrael Over a year ago

@yaminigoel - What about df.to_pickle(file) and df = pd.read_pickle(file) ?

yamini goel Over a year ago

I didn't know about pickle functionality. Does it work fine with .csv? My script crashes when I try df.to_pickle("Matrices.csv")

jezrael Over a year ago

@yaminigoel - What is error? Because to_csv always lost types of data, all data are always converted to strings. And then read_csv distinguish only floats and int columns, another are converted to strings.

anky · Accepted Answer · 2019-02-10 08:51:42Z

1

For saving arrays directly to csv as multiple columns use:

np.savetxt(r'C:\path\file.csv',a,delimiter=',')

If you need to read back as a python object, ast.literal_eval() is your saviour as pointed by @jezrael

answered Feb 10, 2019 at 8:51

anky

75.3k11 gold badges46 silver badges76 bronze badges

1 Comment

yamini goel Over a year ago

I was initially using np.array(list(map(literal_eval, df['matrix']))) but my Python interpreter collapses while working on full dataset. Is there any other alternative?

Collectives™ on Stack Overflow

Pandas dataframe reading numpy array column as str

2 Answers 2

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related