57

I want to sort by name length. There doesn't appear to be a key parameter for sort_values so I'm not sure how to accomplish this. Here is a test df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})
3
  • Possible duplicate of sort dataframe by length of string in a column Commented Sep 12, 2017 at 13:34
  • @jezrael Please read my reason. I mentioned it explicitly: stackoverflow.com/questions/46177362/… Commented Sep 12, 2017 at 14:01
  • There are more options there. If not, you can edit this answer and include all those other solutions. Commented Sep 12, 2017 at 14:02

6 Answers 6

53

You can use reindex of index of Series created by len with sort_values:

print (df.name.str.len())
0    5
1    2
2    6
3    4
Name: name, dtype: int64

print (df.name.str.len().sort_values())
1    2
3    4
0    5
2    6
Name: name, dtype: int64

s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')

print (df.reindex(s))
     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
     name  score
0      Al      4
1    Greg      3
2   Steve      2
3  Markus      2
Sign up to request clarification or add additional context in comments.

2 Comments

Great answer, I tried this approach with lists too (Sorting a DataFrame by list length), since .str.len() works with lists as mentioned in the question Pythonic way for calculating length of lists in pandas dataframe column in this link
This is clever, but you should note it's only safe to do when it's ok to trash the existing index.
52

Using DataFrame.sort_values we can pass an anonymous (lambda) function computing string length (using .str.len() Series method) to the key argument:

df = pd.DataFrame({
    'name': ['Steve', 'Al', 'Markus', 'Greg'], 
    'score': [2, 4, 2, 3]
})
print(df)

     name  score
0   Steve      2
1      Al      4
2  Markus      2
3    Greg      3
df.sort_values(by="name", key=lambda x: x.str.len())

     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

1 Comment

Thanks. Just in case someone needs to lower case and sort df.sort_index(key=lambda x: x.str.lower().str.len())
18

I found this solution more intuitive, specially if you want to do something depending on the column length later on.

df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)

Now your dataframe will have a column with name length with the value of string length from column name in it and the whole dataframe will be sorted in descending order.

1 Comment

This should be the accepted answer. Much simpler and easily reused.
3

The answer of @jezrael is great and explains well. Here is the final result :

index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)

Comments

3

A fancy and minimal solution:

df.iloc[df.agg({"name":len}).sort_values('name').index]



     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

1 Comment

Nice one! thanxx !!
1

It's worth using the key argument to avoid creating unnecessary columns:

df.sort_values("column_name", ascending=True, key=lambda col: col.str.len())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.