How to groupby multiple columns to list in pandas DataFrame

Question

I've a DataFrame df:

    A   B   C        date
O   4   5   5   2019-06-2
1   3   5   2   2019-06-2
2   3   2   1   2019-06-2
3   4   4   3   2019-06-3
4   5   4   6   2019-06-3
5   2   3   7   2019-06-3

Now I can groupby one column by using the following code:

df.groupby('date')['A'].apply(list)


         A         date
O   [4,3,3]   2019-06-2
1   [4,5,2]   2019-06-3

but what if want to group by multiple columns? I've tried something like this but it doesn't seems to be working:

df.groupby('date')[['A','B','C']].apply(list)

The final DataFrame should look like this:

    A               B         C        date
O   [4,3,3]   [5,5,2]   [5,2,1]   2019-06-2
1   [4,5,2]   [4,4,3]   [3,6,7]   2019-06-3

Henry Ecker · Accepted Answer · 2021-12-15 01:53:01Z

6

Use GroupBy.agg instead of GroupBy.apply:

df1 = df.groupby('date')[['A','B','C']].agg(list).reset_index()
print (df1)
        date          A          B          C
0  2019-06-2  [4, 3, 3]  [5, 5, 2]  [5, 2, 1]
1  2019-06-3  [4, 5, 2]  [4, 4, 3]  [3, 6, 7]

EDIT: If wanting to do more aggregations pass it in list:

df2 = df.groupby('date')[['A','B','C']].agg(['mean','min','max', list])
print (df2)
                  A                            B                            C  \
               mean min max       list      mean min max       list      mean   
date                                                                            
2019-06-2  3.333333   3   4  [4, 3, 3]  4.000000   2   5  [5, 5, 2]  2.666667   
2019-06-3  3.666667   2   5  [4, 5, 2]  3.666667   3   4  [4, 4, 3]  5.333333   

                              
          min max       list  
date                          
2019-06-2   1   5  [5, 2, 1]  
2019-06-3   3   7  [3, 6, 7]

Then the MultiIndex columns can be flatten:

df2 = df.groupby('date')[['A','B','C']].agg(['mean','min','max', list])
df2.columns = df2.columns.map(lambda x: f'{x[0]}_{x[1]}')
df2 = df2.reset_index()
print (df2)
        date    A_mean  A_min  A_max     A_list    B_mean  B_min  B_max  \
0  2019-06-2  3.333333      3      4  [4, 3, 3]  4.000000      2      5   
1  2019-06-3  3.666667      2      5  [4, 5, 2]  3.666667      3      4   

      B_list    C_mean  C_min  C_max     C_list  
0  [5, 5, 2]  2.666667      1      5  [5, 2, 1]  
1  [4, 4, 3]  5.333333      3      7  [3, 6, 7]

edited Dec 15, 2021 at 1:53

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

answered Jan 9, 2020 at 14:18

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

astroluv Over a year ago

Now if a want to get the mean, min and max of each of these column as a seperate column? How can i get it?

astroluv Over a year ago

Need a bit of your help here @jezrael stackoverflow.com/questions/59699910/…

jezrael Over a year ago

@astroluv - Sorry, I forget post comment, my problem is not understand question :(

astroluv Over a year ago

My question is i've multiple columns with colnnames like "x_mean", "y_mean". How can i add an another column that will use the other columns to get a new column. "x_new = df.x_min_max_val / ( df.x_max - df.x_min ) * (df.x_mean - df.x_min) + df.x_min_max_val " . Similarly "y_new = df.y_min_max_val / ( df.y_max - df.y_min ) * (df.y_mean - df.y_min) + df.y_min_max_val". How can i achieve this with one liner?

jezrael Over a year ago

@astroluv What is reason for one line code? Do you need simlify code? Now I am offline, on phone only, but how working

for c in ['x', 'y']:df[f'{c}_new'] = df[f'{c}_min_max_val'] / ( df[f'{c}_max'] - df[f'{c}_min'] ) * (df[f'{c}_mean'] - df[f'{c}_min']) + df[f'{c}_min_max_val']

?

Collectives™ on Stack Overflow

How to groupby multiple columns to list in pandas DataFrame

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related