Python Groupby with sorting

Question

I have Dataframe like:

         age    gender  occupation     zip_code
user_id             
1         24    M       technician      85711
2         53    F       other           94043
3         23    M       writer          32067
4         24    M       technician      43537
5         33    F       other           15213
6         42    M       executive       98101
7         57    M       administrator   91344
8         36    M       administrator   05201
9         29    M       student         01002
10        53    M       lawyer          90703

I have to get Male ratio per occupation and sort it from the most to the least.

I tried this and after this not able to proceed:

users.groupby(['occupation','gender']).gender.count()

cs95 · Accepted Answer · 2018-03-14 19:52:20Z

2

Divide counts of <occupation, gender> by counts of <gender>:

i = df.groupby(['occupation' ,'gender']).gender.count() 
j = df.groupby('gender').gender.count()

(i / j).sort_values(ascending=False)

occupation     gender
other          F         1.000
technician     M         0.250
administrator  M         0.250
writer         M         0.125
student        M         0.125
lawyer         M         0.125
executive      M         0.125
Name: gender, dtype: float64

You can filter using xs to get ratios for just men:

(i / j).sort_values(ascending=False).xs('M', level=1)

occupation
technician       0.250
administrator    0.250
writer           0.125
student          0.125
lawyer           0.125
executive        0.125
Name: gender, dtype: float64

edited Mar 14, 2018 at 19:52

answered Mar 14, 2018 at 19:45

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

subodh agrawal Over a year ago

What i/j is doing, could you please explain

cs95 Over a year ago

@subodhagrawal Division is aligned on the index. The first level of i's index, and j's index are aligned to correctly compute ratios.

subodh agrawal Over a year ago

but I am not getting the correct answer, I want to sort the ratio groupby gender= 'M'

cs95 Over a year ago

@subodhagrawal Oh, oops. Check again.

subodh agrawal Over a year ago

Yup....your answer looks easy to understand and 2nd one is lil'bit complex for me as I am a beginner in python.

|

Scott Boston · Accepted Answer · 2018-03-14 19:53:28Z

2

You can try this:

df_out = df.groupby(['gender','occupation'])['gender'].count()

(df_out / df_out.sum(level=0)).loc['M'].sort_values(ascending=False)

Output:

occupation
technician       0.250
administrator    0.250
writer           0.125
student          0.125
lawyer           0.125
executive        0.125
Name: gender, dtype: float64

answered Mar 14, 2018 at 19:53

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Collectives™ on Stack Overflow

Python Groupby with sorting

2 Answers 2

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related