How to sort Pandas DataFrame both by MultiIndex and by value?

Question

Sample data:

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                    [1,5,10],[2,8,80],
                    [2,5,65],[2,8,10]
                   ], columns=['src','dst','n']); mdf

    src dst n
0   1   2   50
1   1   2   20
2   1   5   10
3   2   8   80
4   2   5   65
5   2   8   10

groupby() gives a two-level multi-index:

test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test

        sum count
src dst 
1   2   70  2
    5   10  1
2   5   65  1
    8   90  2

Question: how to sort this DataFrame by src ascending and then by sum descending?

I'm a beginner with pandas, learned about sort_index() and sort_values(), but in this task it seems that I need both simultaneously.

Expected result, under each "src" sorting is determined by the "sum":

        sum count
src dst 
1   2   70  2
    5   10  1
2   8   90  2
    5   65  1

Pol · Accepted Answer · 2019-07-30 06:57:37Z

9

In case anyone else comes across this using google as well. Since pandas version 0.23, you can pass the name of the level as an argument to sort_values:

test.sort_values(['src','sum'], ascending=[1,0])

Result:
         sum  count
src dst            
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1

answered Jul 30, 2019 at 6:57

Pol

911 silver badge3 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MaxU - stand with Ukraine · Accepted Answer · 2018-03-13 22:08:56Z

7

IIUC:

In [29]: test.sort_values('sum', ascending=False).sort_index(level=0)
Out[29]:
         sum  count
src dst
1   2     80      2
    5     10      1
2   8     80      1

UPDATE: very similar to @anonyXmous's solution:

In [47]: (test.reset_index()
              .sort_values(['src','sum'], ascending=[1,0])
              .set_index(['src','dst']))
Out[47]:
         sum  count
src dst
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1

edited Mar 13, 2018 at 22:08

answered Mar 13, 2018 at 19:38

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

3 Comments

Serge Over a year ago

Thanks, that solution exactly demonstrates my problem: the later sort_index() overrides the previous value sorting. I updated the question with more data and expected outcome. Suggested solution gives result exactly the same as before any sorting.

Serge Over a year ago

Updated the question.

Serge Over a year ago

Thanks for the updated answer! Now I see how it works. Appreciate.

jose_bacoy · Accepted Answer · 2018-03-13 21:03:44Z

5

You can reset the index then sort them by chosen columns. Hope this helps.

import pandas as pd

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                [1,5,10],[2,8,80],
                [2,5,65],[2,8,10]
               ], columns=['src','dst','n']); 
mdf = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); 
mdf.reset_index(inplace=True)
mdf.sort_values(['src', 'sum'], ascending=[True, False], inplace=True)
print(mdf)

Result:
       src dst sum  count
    0   1   2   70   2
    1   1   5   10   1
    3   2   8   90   2
    2   2   5   65   1

answered Mar 13, 2018 at 21:03

jose_bacoy

12.7k1 gold badge25 silver badges41 bronze badges

2 Comments

Serge Over a year ago

Thank you for the answer and drawing my attention to the inplace parameter and reset_index(). I accepted the other answer because it also restored the initial multiindex.

Eric Duminil Over a year ago

You can undelete stackoverflow.com/questions/49296346/…, you just need to run the .py files with ipython, not just plain python.

Collectives™ on Stack Overflow

How to sort Pandas DataFrame both by MultiIndex and by value?

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related