Pandas: Sort a dataframe based on multiple columns

Question

I've a pandas dataframe with columns, department and employee_count. I need to sort the employee_count column in descending order. But if there is a tie between 2 employee_counts then they should be sorted alphabetically based on department.

   Department Employee_Count
0    abc          10
1    adc          10
2    bca          11
3    cde          9
4    xyz          15

Required output:

   Department Employee_Count
0    xyz          15
1    bca          11
2    abc          10
3    adc          10
4    cde          9

This is what I've tried.

df = df.sort_values(['Department','Employee_Count'],ascending=[True,False])

But this just sorts the departments alphabetically.

I've also tried to sort by Department first and then by Employee_Count. Like this:

df = df.sort_values(['Department'],ascending=[True])
df = df.sort_values(['Employee_Count'],ascending=[False])

This doesn't give me correct output either:

   Department Employee_Count
4    xyz          15
2    bca          11
1    adc          10
0    abc          10
3    cde          9

It gives 'adc' first and then 'abc'.

jezrael · Accepted Answer · 2019-11-04 09:11:43Z

18

You can swap columns in list and also values in ascending parameter:

Explanation:

Order of columns names is order of sorting, first sort descending by Employee_Count and if some duplicates in Employee_Count then sorting by Department only duplicates rows ascending.

df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
  Department  Employee_Count
4        xyz              15
2        bca              11
0        abc              10 <-
1        adc              10 <-
3        cde               9

Or for test if use second False then duplicated rows are sorting descending:

df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
  Department  Employee_Count
4        xyz              15
2        bca              11
1        adc              10 <-
0        abc              10 <-
3        cde               9

edited Nov 4, 2019 at 9:11

answered Nov 4, 2019 at 8:56

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Impromptu_Coder Over a year ago

Thanks a ton. Exactly the output I desired. Can you please tell me the rationale behind swapping the columns ?

molybdenum42 Over a year ago

It sorts the columns in the order of the list.

cottontail · Accepted Answer · 2024-02-17 10:34:07Z

It's a good 4 years too late but OP's initial attempt would work if the sorting was stable. Pandas sort_values() uses 'quicksort' by default which is not guaranteed to be stable. However, if the second sort_values() call used a 'stable' sort, it would produce the expected output.

df = df.sort_values('Department', ascending=True)
df = df.sort_values('Employee_Count', kind='stable', ascending=False, ignore_index=True)
#                                     ^^^^^^^^^^^^^  <--- stable sort here

You can verify that for any dataframe, it produces the same result as sorting by a list of two columns:

df = pd.DataFrame(np.random.randint(10, size=(1000, 2)), columns=['A', 'B'])
a = df.sort_values('A', ascending=True).sort_values('B', kind='stable', ascending=False, ignore_index=True)
b = df.sort_values(['B', 'A'], ascending=[False, True], ignore_index=True)
a.equals(b)   # True

Collectives™ on Stack Overflow

Pandas: Sort a dataframe based on multiple columns

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related