2

I've a DataFrame df:

    A   B   C        date
O   4   5   5   2019-06-2
1   3   5   2   2019-06-2
2   3   2   1   2019-06-2
3   4   4   3   2019-06-3
4   5   4   6   2019-06-3
5   2   3   7   2019-06-3

Now I can groupby one column by using the following code:

df.groupby('date')['A'].apply(list)


         A         date
O   [4,3,3]   2019-06-2
1   [4,5,2]   2019-06-3

but what if want to group by multiple columns? I've tried something like this but it doesn't seems to be working:

df.groupby('date')[['A','B','C']].apply(list)

The final DataFrame should look like this:

    A               B         C        date
O   [4,3,3]   [5,5,2]   [5,2,1]   2019-06-2
1   [4,5,2]   [4,4,3]   [3,6,7]   2019-06-3

1 Answer 1

6

Use GroupBy.agg instead of GroupBy.apply:

df1 = df.groupby('date')[['A','B','C']].agg(list).reset_index()
print (df1)
        date          A          B          C
0  2019-06-2  [4, 3, 3]  [5, 5, 2]  [5, 2, 1]
1  2019-06-3  [4, 5, 2]  [4, 4, 3]  [3, 6, 7]

EDIT: If wanting to do more aggregations pass it in list:

df2 = df.groupby('date')[['A','B','C']].agg(['mean','min','max', list])
print (df2)
                  A                            B                            C  \
               mean min max       list      mean min max       list      mean   
date                                                                            
2019-06-2  3.333333   3   4  [4, 3, 3]  4.000000   2   5  [5, 5, 2]  2.666667   
2019-06-3  3.666667   2   5  [4, 5, 2]  3.666667   3   4  [4, 4, 3]  5.333333   

                              
          min max       list  
date                          
2019-06-2   1   5  [5, 2, 1]  
2019-06-3   3   7  [3, 6, 7]  

Then the MultiIndex columns can be flatten:

df2 = df.groupby('date')[['A','B','C']].agg(['mean','min','max', list])
df2.columns = df2.columns.map(lambda x: f'{x[0]}_{x[1]}')
df2 = df2.reset_index()
print (df2)
        date    A_mean  A_min  A_max     A_list    B_mean  B_min  B_max  \
0  2019-06-2  3.333333      3      4  [4, 3, 3]  4.000000      2      5   
1  2019-06-3  3.666667      2      5  [4, 5, 2]  3.666667      3      4   

      B_list    C_mean  C_min  C_max     C_list  
0  [5, 5, 2]  2.666667      1      5  [5, 2, 1]  
1  [4, 4, 3]  5.333333      3      7  [3, 6, 7]  
Sign up to request clarification or add additional context in comments.

5 Comments

Now if a want to get the mean, min and max of each of these column as a seperate column? How can i get it?
Need a bit of your help here @jezrael stackoverflow.com/questions/59699910/…
@astroluv - Sorry, I forget post comment, my problem is not understand question :(
My question is i've multiple columns with colnnames like "x_mean", "y_mean". How can i add an another column that will use the other columns to get a new column. "x_new = df.x_min_max_val / ( df.x_max - df.x_min ) * (df.x_mean - df.x_min) + df.x_min_max_val " . Similarly "y_new = df.y_min_max_val / ( df.y_max - df.y_min ) * (df.y_mean - df.y_min) + df.y_min_max_val". How can i achieve this with one liner?
@astroluv What is reason for one line code? Do you need simlify code? Now I am offline, on phone only, but how working for c in ['x', 'y']:df[f'{c}_new'] = df[f'{c}_min_max_val'] / ( df[f'{c}_max'] - df[f'{c}_min'] ) * (df[f'{c}_mean'] - df[f'{c}_min']) + df[f'{c}_min_max_val']?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.