pandas custom sorting multilevel index

Question

I have the following example dataset, and I'd like to sort the index columns by a custom order that is not contained within the dataframe. So far looking on SO I haven't been able to solve this. Example:

import pandas as pd

data = {'s':[1,1,1,1], 
        'am':['cap', 'cap', 'sea', 'sea'], 
        'cat':['i', 'o', 'i', 'o'],
        'col1':[.55, .44, .33, .22],
        'col2':[.77, .66, .55, .44]}

df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)

Out[1]: 
           col1  col2
s am  cat            
1 cap i    0.55  0.77
      o    0.44  0.66
  sea i    0.33  0.55
      o    0.22  0.44

What I would like is the following:

Out[2]: 
           col1  col2
s am  cat            
1 sea i    0.33  0.55
      o    0.22  0.44
  cap i    0.55  0.77
      o    0.44  0.66

and I might also want to sort by 'cat' with the order ['o', 'i'], as well.

Abhi · Accepted Answer · 2018-10-13 08:52:31Z

5

Use sort_values and sort_index

df.sort_values(df.columns.tolist()).sort_index(level=1, ascending=False, 
                                                        sort_remaining=False)

              col1  col2
s   am   cat        
1   sea  i    0.33  0.55
         o    0.22  0.44
    cap  i    0.55  0.77
         o    0.44  0.66

Convert the index to categorical to get the custom order.

data = {'s':[1,1,1,1], 
            'am':['cap', 'cap', 'sea', 'sea'], 
            'cat':['i', 'j', 'k', 'l'],
            'col1':[.55, .44, .33, .22],
            'col2':[.77, .66, .55, .44]}

df = pd.DataFrame(data=data)
df.set_index(['s', 'am', 'cat'], inplace=True)

idx = pd.Categorical(df.index.get_level_values(2).values,
          categories=['j','i','k','l'],
          ordered=True)

df.index.set_levels(idx, level='cat', inplace=True)

df.reset_index().sort_values('cat').set_index(['s','am','cat'])

             col1   col2
s   am  cat     
1   cap  j   0.44   0.66
         i   0.55   0.77
    sea  k   0.33   0.55
         l   0.22   0.44

edited Oct 13, 2018 at 8:52

answered Oct 13, 2018 at 7:34

Abhi

4,2431 gold badge18 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

fffrost Over a year ago

Thank you, this does work for this specific case, but is there a way to actually specify which columns of the index to sort by and also maybe input a list to specify the sort order?

Abhi Over a year ago

@fffrost You can change the level to sort by specific index level. If you want to sort by cat then use sort_index(level=2).

fffrost Over a year ago

Right, and if I had 4 levels in 'cat', like ['i', 'j', 'k', 'l'], and wanted to custom sort them into the order ['j', 'l', 'k', 'i'], how would this work?

Abhi Over a year ago

@fffrost You can convert the cat index to categorical and specify the order you want to sort. I have updated the answer.

totalhack · Accepted Answer · 2020-09-17 11:58:52Z

As of Pandas 1.1 there is another option with the key param of sort_values.

SORT_VALS = {"am": ["sea", "cap"]}

def sorter(column):
    if column.name not in SORT_VALS:
        return column
    mapper = {val: order for order, val in enumerate(SORT_VALS[column.name])}
    return column.map(mapper)

new_df = df.sort_values(by=["s", "am", "cat"], key=sorter)

#            col1  col2
# s am  cat            
# 1 sea i    0.33  0.55
#       o    0.22  0.44
#   cap i    0.55  0.77
#       o    0.44  0.66

You can also use pd.Categorical in the sorter and return a categorical Series for custom sort columns which may have different performance implications depending on your scenario, but note that there is a soon-to-be-fixed bug in pandas that can prevent multi-column sorts with Categorical sorting.

Collectives™ on Stack Overflow

pandas custom sorting multilevel index

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related