How to read_csv correctly for dataFrame with Int64 Array?

The following is a simplied version of the issue.

df = pd.DataFrame(data={'key': [1,1,2,2], 'val': [3,4,5,5]})
df['val'] = df['val'].astype('Int64') # read_csv can't read Int64 array properly by default
df = df.groupby('key')['val'].agg(['unique'])
display(df)
df.to_csv('test')
df = pd.read_csv('test', index_col=0)
display(df)

And this is what I got

How can I read the unique column data correctly? Thanks

Thanks for @hide1nbush 's pointer. I resolved it using converter.

import ast
def convert_int64_array(array_string):
    return pd.array(ast.literal_eval(array_string.split("\n")[1]), dtype=pd.Int64Dtype())
df = pd.read_csv('test', index_col=0, converters={'unique': convert_int64_array})

But I wonder if there is a easier way to do this.

I found that using pickle format is the easiest way to round trip the dataFrame as file. I don't need to worry about index, int64 etc. See this to understand the difference between some major formats.

edited Mar 2, 2023 at 6:44

asked Mar 2, 2023 at 3:50

lzl124631x

4,8713 gold badges33 silver badges55 bronze badges

1

Have you try the method mentioned in this thread:stackoverflow.com/questions/42755214/… ?

hide1nbush
– hide1nbush

2023-03-02 03:56:59 +00:00
Commented Mar 2, 2023 at 3:56
Since your data is structured (a numpy array in cells), I recommend saving with binary (pickle, feather, etc), not text csv file.

Quang Hoang
– Quang Hoang

2023-03-02 04:39:30 +00:00
Commented Mar 2, 2023 at 4:39
@hide1nbush Thanks for the pointer! Updated my question. Is there an easy way to do it?

lzl124631x
– lzl124631x

2023-03-02 06:17:44 +00:00
Commented Mar 2, 2023 at 6:17
@QuangHoang thanks for the suggestion. I'm new to pandas and started with csv. I read this and feather seems to be a good option for me.

lzl124631x
– lzl124631x

2023-03-02 06:25:06 +00:00
Commented Mar 2, 2023 at 6:25

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to read_csv correctly for dataFrame with Int64 Array?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked