The following is a simplied version of the issue.
df = pd.DataFrame(data={'key': [1,1,2,2], 'val': [3,4,5,5]})
df['val'] = df['val'].astype('Int64') # read_csv can't read Int64 array properly by default
df = df.groupby('key')['val'].agg(['unique'])
display(df)
df.to_csv('test')
df = pd.read_csv('test', index_col=0)
display(df)
And this is what I got
How can I read the unique column data correctly? Thanks
Thanks for @hide1nbush 's pointer. I resolved it using converter.
import ast
def convert_int64_array(array_string):
return pd.array(ast.literal_eval(array_string.split("\n")[1]), dtype=pd.Int64Dtype())
df = pd.read_csv('test', index_col=0, converters={'unique': convert_int64_array})
But I wonder if there is a easier way to do this.
I found that using pickle format is the easiest way to round trip the dataFrame as file. I don't need to worry about index, int64 etc. See this to understand the difference between some major formats.

