2

I have Dataframe like:

         age    gender  occupation     zip_code
user_id             
1         24    M       technician      85711
2         53    F       other           94043
3         23    M       writer          32067
4         24    M       technician      43537
5         33    F       other           15213
6         42    M       executive       98101
7         57    M       administrator   91344
8         36    M       administrator   05201
9         29    M       student         01002
10        53    M       lawyer          90703

I have to get Male ratio per occupation and sort it from the most to the least.

I tried this and after this not able to proceed:

users.groupby(['occupation','gender']).gender.count()

2 Answers 2

2

Divide counts of <occupation, gender> by counts of <gender>:

i = df.groupby(['occupation' ,'gender']).gender.count() 
j = df.groupby('gender').gender.count()

(i / j).sort_values(ascending=False)

occupation     gender
other          F         1.000
technician     M         0.250
administrator  M         0.250
writer         M         0.125
student        M         0.125
lawyer         M         0.125
executive      M         0.125
Name: gender, dtype: float64

You can filter using xs to get ratios for just men:

(i / j).sort_values(ascending=False).xs('M', level=1)

occupation
technician       0.250
administrator    0.250
writer           0.125
student          0.125
lawyer           0.125
executive        0.125
Name: gender, dtype: float64
Sign up to request clarification or add additional context in comments.

7 Comments

What i/j is doing, could you please explain
@subodhagrawal Division is aligned on the index. The first level of i's index, and j's index are aligned to correctly compute ratios.
but I am not getting the correct answer, I want to sort the ratio groupby gender= 'M'
@subodhagrawal Oh, oops. Check again.
Yup....your answer looks easy to understand and 2nd one is lil'bit complex for me as I am a beginner in python.
|
2

You can try this:

df_out = df.groupby(['gender','occupation'])['gender'].count()

(df_out / df_out.sum(level=0)).loc['M'].sort_values(ascending=False)

Output:

occupation
technician       0.250
administrator    0.250
writer           0.125
student          0.125
lawyer           0.125
executive        0.125
Name: gender, dtype: float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.