2

I have two Python scripts, one that creates a .csv file and the other one that reads it.

This is how I save the dataframe in the first file:

df['matrix'] = df['matrix'].apply(lambda x: np.array(x))
df.to_csv("Matrices.csv", sep=",", index=False)

The type and shape of df['matrix'].iloc[0] is <class 'numpy.ndarray'> and (24, 60) respectively.

In the second script when I try

print ("type of df['matrix'].iloc[0]", type(df['matrix'].iloc[0]))

The output is type of df['matrix'].iloc[0] <class 'str'>

How can I make sure that df['matrix'] doesn't loose its nature?

1
  • What does the csv look like? How did it render the array object? My guess it included [], as might be produced by str(df['matrix'][0]. Commented Feb 10, 2019 at 19:07

2 Answers 2

2

If want save and read only numpy array use savetxt and genfromtxt.


If there are multiple columns then use:

Use pickle:

df.to_pickle('file.pkl')
df = pd.read_pickle('file.pkl')

Convert arrays to multiple columns and then write to file:

a = np.array(
[[219,220,221],
 [154,152,14],
 [205,202,192]])

df = pd.DataFrame({'matrix':a.tolist(), 'b':np.arange(len(a))})
print (df)
            matrix  b
0  [219, 220, 221]  0
1   [154, 152, 14]  1
2  [205, 202, 192]  2

df1 = pd.DataFrame(df.pop('matrix').values.tolist(), index=df.index).add_prefix('mat_')
print (df1)
   mat_0  mat_1  mat_2
0    219    220    221
1    154    152     14
2    205    202    192

df = df.join(df1)
print (df)
   b  mat_0  mat_1  mat_2
0  0    219    220    221
1  1    154    152     14
2  2    205    202    192

But if really need to convert values to array need converter with ast.literal_eval:

import ast

df.to_csv('testing.csv', index=False)

df = pd.read_csv('testing.csv', converters={'matrix':lambda x: np.array(ast.literal_eval(x))})
print (type(df.loc[0, 'matrix']))

<class 'numpy.ndarray'>
Sign up to request clarification or add additional context in comments.

4 Comments

I was initially using np.array(list(map(literal_eval, df['matrix']))) but my Python interpreter collapses while working on full dataset. Is there any other alternative?
@yaminigoel - What about df.to_pickle(file) and df = pd.read_pickle(file) ?
I didn't know about pickle functionality. Does it work fine with .csv? My script crashes when I try df.to_pickle("Matrices.csv")
@yaminigoel - What is error? Because to_csv always lost types of data, all data are always converted to strings. And then read_csv distinguish only floats and int columns, another are converted to strings.
1

For saving arrays directly to csv as multiple columns use:

np.savetxt(r'C:\path\file.csv',a,delimiter=',')

If you need to read back as a python object, ast.literal_eval() is your saviour as pointed by @jezrael

1 Comment

I was initially using np.array(list(map(literal_eval, df['matrix']))) but my Python interpreter collapses while working on full dataset. Is there any other alternative?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.