13

I've a pandas dataframe with columns, department and employee_count. I need to sort the employee_count column in descending order. But if there is a tie between 2 employee_counts then they should be sorted alphabetically based on department.

   Department Employee_Count
0    abc          10
1    adc          10
2    bca          11
3    cde          9
4    xyz          15

Required output:

   Department Employee_Count
0    xyz          15
1    bca          11
2    abc          10
3    adc          10
4    cde          9

This is what I've tried.

df = df.sort_values(['Department','Employee_Count'],ascending=[True,False])

But this just sorts the departments alphabetically.

I've also tried to sort by Department first and then by Employee_Count. Like this:

df = df.sort_values(['Department'],ascending=[True])
df = df.sort_values(['Employee_Count'],ascending=[False])

This doesn't give me correct output either:

   Department Employee_Count
4    xyz          15
2    bca          11
1    adc          10
0    abc          10
3    cde          9

It gives 'adc' first and then 'abc'.

2 Answers 2

18

You can swap columns in list and also values in ascending parameter:

Explanation:

Order of columns names is order of sorting, first sort descending by Employee_Count and if some duplicates in Employee_Count then sorting by Department only duplicates rows ascending.

df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
  Department  Employee_Count
4        xyz              15
2        bca              11
0        abc              10 <-
1        adc              10 <-
3        cde               9

Or for test if use second False then duplicated rows are sorting descending:

df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
  Department  Employee_Count
4        xyz              15
2        bca              11
1        adc              10 <-
0        abc              10 <-
3        cde               9
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a ton. Exactly the output I desired. Can you please tell me the rationale behind swapping the columns ?
It sorts the columns in the order of the list.
0

It's a good 4 years too late but OP's initial attempt would work if the sorting was stable. Pandas sort_values() uses 'quicksort' by default which is not guaranteed to be stable. However, if the second sort_values() call used a 'stable' sort, it would produce the expected output.

df = df.sort_values('Department', ascending=True)
df = df.sort_values('Employee_Count', kind='stable', ascending=False, ignore_index=True)
#                                     ^^^^^^^^^^^^^  <--- stable sort here

You can verify that for any dataframe, it produces the same result as sorting by a list of two columns:

df = pd.DataFrame(np.random.randint(10, size=(1000, 2)), columns=['A', 'B'])
a = df.sort_values('A', ascending=True).sort_values('B', kind='stable', ascending=False, ignore_index=True)
b = df.sort_values(['B', 'A'], ascending=[False, True], ignore_index=True)
a.equals(b)   # True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.