0

My question is basically the same as the one here: Sorting a pandas DataFrame by one level of a MultiIndex

id est, I want to sort a MultiIndex dataframe along one level, BUT I am facing the problem that the following index : ["foo2","foo1","foo10"] is sorted in ["foo1","foo10","foo2"] instead of ["foo1","foo2","foo10"] and I cannot pass a "key" argument like for the list.sort() function (see example below). How should I manage that ? Should I reset_index, sort the column, and then set the index again ?

import pandas as pd
import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [atoi(c) for c in re.split('(\d+)',text)]

# example on a list
L1=["foo2","foo1","foo10"]
print(sorted(L1))
print(sorted(L1,key=natural_keys))
print()

df = pd.DataFrame([{'I1':'foo2','I2':'b','val':2},{'I1':'foo1','I2':'a','val':1},{'I1':'foo10','I2':'c','val':3}])
df = df.set_index(['I1','I2'])
sorted_df = df.sort_index(level=0)
print(sorted_df)
print()

expected_df = pd.DataFrame([{'I1':'foo1','I2':'a','val':1},{'I1':'foo2','I2':'b','val':2},{'I1':'foo10','I2':'c','val':3}])
expected_df = expected_df.set_index(['I1','I2'])
print(expected_df)
          val
I1    I2
foo1  a     1
foo10 c     3
foo2  b     2

EXPECTED DF:
          val
I1    I2
foo1  a     1
foo2  b     2
foo10 c     3

Thanks

2
  • Which version of pandas are you using? In more recent versions it's possible to supply a key= but it doesn't work quite the same way as the builtin list.sort/sorted does. Commented Apr 26, 2022 at 13:00
  • I am on an older version of pandas (0.24) and I have not the possibility to upgrade it. However for the sake of the argument I tried on 1.3.4 with the key argument and the function def sort_index(index): return sorted(index,key=natural_keys) But I still don't have the expected result. How do I write code in a comment ?... Commented Apr 26, 2022 at 14:05

1 Answer 1

0

As explained by Jon Clements, if you are on a version of pandas >= 1.0.0 you can use the key argument of sort index. but if you also want to discriminate between several numbers in your index : foo_1_bar_2 foo_2_bar_1 in this order then you need to combine several function :

import pandas as pd
import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [atoi(c) for c in re.split('(\d+)',text)]

def sort_index(index):
    return [sorted(index,key=natural_keys,reverse=False).index(val) for val in index]

df = pd.DataFrame([{'I1':'foo2','I2':'b','val':2},{'I1':'foo1','I2':'a','val':1},{'I1':'foo10','I2':'c','val':3}])
df = df.set_index(['I1','I2'])
sorted_df=df.sort_index(level=0,key=sort_index)

I have not found any simple solution on previous version of pandas

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.