10

Sample data:

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                    [1,5,10],[2,8,80],
                    [2,5,65],[2,8,10]
                   ], columns=['src','dst','n']); mdf

    src dst n
0   1   2   50
1   1   2   20
2   1   5   10
3   2   8   80
4   2   5   65
5   2   8   10

groupby() gives a two-level multi-index:

test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test

        sum count
src dst 
1   2   70  2
    5   10  1
2   5   65  1
    8   90  2

Question: how to sort this DataFrame by src ascending and then by sum descending?

I'm a beginner with pandas, learned about sort_index() and sort_values(), but in this task it seems that I need both simultaneously.

Expected result, under each "src" sorting is determined by the "sum":

        sum count
src dst 
1   2   70  2
    5   10  1
2   8   90  2
    5   65  1

3 Answers 3

9

In case anyone else comes across this using google as well. Since pandas version 0.23, you can pass the name of the level as an argument to sort_values:

test.sort_values(['src','sum'], ascending=[1,0])

Result:
         sum  count
src dst            
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1
Sign up to request clarification or add additional context in comments.

Comments

7

IIUC:

In [29]: test.sort_values('sum', ascending=False).sort_index(level=0)
Out[29]:
         sum  count
src dst
1   2     80      2
    5     10      1
2   8     80      1

UPDATE: very similar to @anonyXmous's solution:

In [47]: (test.reset_index()
              .sort_values(['src','sum'], ascending=[1,0])
              .set_index(['src','dst']))
Out[47]:
         sum  count
src dst
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1

3 Comments

Thanks, that solution exactly demonstrates my problem: the later sort_index() overrides the previous value sorting. I updated the question with more data and expected outcome. Suggested solution gives result exactly the same as before any sorting.
Updated the question.
Thanks for the updated answer! Now I see how it works. Appreciate.
5

You can reset the index then sort them by chosen columns. Hope this helps.

import pandas as pd

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                [1,5,10],[2,8,80],
                [2,5,65],[2,8,10]
               ], columns=['src','dst','n']); 
mdf = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); 
mdf.reset_index(inplace=True)
mdf.sort_values(['src', 'sum'], ascending=[True, False], inplace=True)
print(mdf)

Result:
       src dst sum  count
    0   1   2   70   2
    1   1   5   10   1
    3   2   8   90   2
    2   2   5   65   1

2 Comments

Thank you for the answer and drawing my attention to the inplace parameter and reset_index(). I accepted the other answer because it also restored the initial multiindex.
You can undelete stackoverflow.com/questions/49296346/…, you just need to run the .py files with ipython, not just plain python.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.