0

The following is a simplied version of the issue.

df = pd.DataFrame(data={'key': [1,1,2,2], 'val': [3,4,5,5]})
df['val'] = df['val'].astype('Int64') # read_csv can't read Int64 array properly by default
df = df.groupby('key')['val'].agg(['unique'])
display(df)
df.to_csv('test')
df = pd.read_csv('test', index_col=0)
display(df)

And this is what I got

enter image description here

How can I read the unique column data correctly? Thanks


Thanks for @hide1nbush 's pointer. I resolved it using converter.

import ast
def convert_int64_array(array_string):
    return pd.array(ast.literal_eval(array_string.split("\n")[1]), dtype=pd.Int64Dtype())
df = pd.read_csv('test', index_col=0, converters={'unique': convert_int64_array})

But I wonder if there is a easier way to do this.

enter image description here


I found that using pickle format is the easiest way to round trip the dataFrame as file. I don't need to worry about index, int64 etc. See this to understand the difference between some major formats.

4
  • 1
    Have you try the method mentioned in this thread:stackoverflow.com/questions/42755214/… ? Commented Mar 2, 2023 at 3:56
  • Since your data is structured (a numpy array in cells), I recommend saving with binary (pickle, feather, etc), not text csv file. Commented Mar 2, 2023 at 4:39
  • @hide1nbush Thanks for the pointer! Updated my question. Is there an easy way to do it? Commented Mar 2, 2023 at 6:17
  • @QuangHoang thanks for the suggestion. I'm new to pandas and started with csv. I read this and feather seems to be a good option for me. Commented Mar 2, 2023 at 6:25

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.