3

I have a dataframe that is in this form.

 Type  Major   GPA   
  F      A     2.6   
  T      B     3.4   
  T      C     2.9   
  F      A     1.8   
  T      B     2.8   
  F      C     3.5 
 ...

I'd like to group the Dataframe ("students") by Type and Major, count the number of rows for each grouping, then sort from most to least popular majors for each type, and, finally, create a new dataframe that includes the 20 most popular majors.

I'd like the output to look like this:

F   
A 21  
B 19  
C 15
...
T  
A 14  
B 7  
C 3   

This is what I did:

most_popular = students.groupby(['Type', 'Major']).size().sort_values(ascending=False)[:20]

But what this does is sort across both Types - rather than sort separately for each.

Thank you for your help.

2 Answers 2

1
most_popular = students.groupby(['Type', 'Major']).size().reset_index().sort_values(['Type', 'Major'], ascending=[True, False])[:20]

The key is to sort in both ASC and DSC order, you can use:

.sort_values(['Type', 'Major'], ascending=[True, False])
Sign up to request clarification or add additional context in comments.

Comments

0

The results are sorted automatically as a default argument. Is this the desired output?

>>> df.groupby(['Type', 'Major'], as_index=False).GPA.count().sort_values(['Major', 'GPA'])
  Type Major  GPA
0    F     A    2
2    T     B    2
1    F     C    1
3    T     C    1

1 Comment

Unfortunately no. It's sorting the results by Major - alphabetical order rather than by GPA.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.