4

Say I have a dataframe with a MultiIndex like this:

import pandas as pd
import numpy as np

my_index = pd.MultiIndex.from_product(
    [(3,1,2), ("small", "tall", "medium"), ("B", "A", "C")],
    names=["number", "size", "letter"]
)

df_0 = pd.DataFrame(np.random.rand(27, 2), columns=["x", "y"], index=my_index)
                             x         y
number size   letter
3      small  B       0.950073  0.599918
              A       0.014450  0.472736
              C       0.208064  0.778538
       tall   B       0.979631  0.367234
              A       0.832459  0.449875
              C       0.761929  0.053144
       medium B       0.460764  0.800131
              A       0.355746  0.573813
              C       0.078924  0.058865
1      small  B       0.405209  0.354636
              A       0.536242  0.012904
              C       0.458910  0.723627
       tall   B       0.859898  0.442954
              A       0.109729  0.885598
              C       0.378363  0.220695
       medium B       0.652191  0.685181
              A       0.503525  0.400973
              C       0.454671  0.188798
2      small  B       0.407654  0.168782
              A       0.393451  0.083023
              C       0.073432  0.165209
       tall   B       0.678226  0.108497
              A       0.718348  0.077935
              C       0.595500  0.146271
       medium B       0.719985  0.422167
              A       0.950950  0.532390
              C       0.687721  0.920229

Now I want to sort the index by the different levels, first number, then size, and finally letter.

If I do this...

df_1 = df_0.sort_index(level=["number", "size", "letter"], inplace=False)

... the size of course gets sorted in alphabetical order.

                             x         y
number size   letter
1      medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
       small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
       tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
       small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
       tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
       small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
       tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

But I want it to be sorted by a custom key. I know I can sort the size level with a custom sort function like this:

custom_key = np.vectorize(lambda x: {"small": 0, "medium": 1, "tall": 2}[x])

df_2 = df_0.sort_index(level=1, key=custom_key, inplace=False)
                             x         y
number size   letter
1      small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
2      small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
3      small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
1      medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
2      medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
3      medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
1      tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

But how can I sort by all levels like for df_1 and use the custom key on the second level? Expected output:

                             x         y
number size   letter
1      small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
       medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
       tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
       medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
       tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
       medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
       tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

And how should I define the custom key function, so that I also can access the level in sort_index by name like this?

df_3 = df_0.sort_index(level="size", key=custom_key, inplace=False)

Here, it gives a KeyError: 'Level size not found'

2 Answers 2

3

A solution with pandas.CategoricalIndex :

categories_order = ['small', 'medium', 'tall']
categories = pd.CategoricalIndex(df_0.index.levels[1].values,
                                 categories=categories_order,
                                 ordered=True)

df_0.index = df_0.index.set_levels(categories, level=1)

out = df_0.sort_index()

# Output :

print(out)

                             x         y
number size   letter                    
1      small  A       0.994537  0.984409
              B       0.153568  0.529762
              C       0.025857  0.750508
       medium A       0.977558  0.488506
              B       0.727121  0.189481
              C       0.465571  0.606248
       tall   A       0.638590  0.530700
              B       0.181290  0.963911
              C       0.068123  0.911834
2      small  A       0.183484  0.459770
              B       0.587073  0.935522
              C       0.068233  0.135137
       medium A       0.315199  0.047945
              B       0.568500  0.756994
              C       0.581868  0.725463
       tall   A       0.881304  0.739188
              B       0.566962  0.855521
              C       0.677635  0.851877
3      small  A       0.497159  0.804575
              B       0.290587  0.104613
              C       0.343545  0.468724
       medium A       0.868754  0.526125
              B       0.759151  0.439089
              C       0.515183  0.266913
       tall   A       0.486962  0.178156
              B       0.902421  0.133138
              C       0.613683  0.827321
Sign up to request clarification or add additional context in comments.

1 Comment

upvoted your answer but do you think it's better to use index.levels[1].values than index.get_level_values(1)?
2

The ideal would be to use ordered Categorical data.

Else, you can use a custom mapper based on the level name:

# define here custom sorters
# all other levels will be sorted by default order
order = {'size': ['small', 'medium', 'tall']}

def sorter(s):
    if s.name in order:
        return s.map({k:v for v,k in enumerate(order[s.name])})
    return s

out = df_0.sort_index(level=["number", "size", "letter"], key=sorter)

Output:

                             x         y
number size   letter                    
1      small  A       0.530753  0.687982
              B       0.722848  0.974920
              C       0.174058  0.695016
       medium A       0.397016  0.550404
              B       0.426989  0.843007
              C       0.929218  0.497728
       tall   A       0.159078  0.005675
              B       0.917871  0.384265
              C       0.685435  0.585242
2      small  A       0.423254  0.838356
              B       0.342158  0.209632
...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.