Sort dataframe by string length

Question

I want to sort by name length. There doesn't appear to be a key parameter for sort_values so I'm not sure how to accomplish this. Here is a test df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})

Possible duplicate of sort dataframe by length of string in a column — cs95
– cs95, Commented Sep 12, 2017 at 13:34
@jezrael Please read my reason. I mentioned it explicitly: stackoverflow.com/questions/46177362/… — cs95
– cs95, Commented Sep 12, 2017 at 14:01
There are more options there. If not, you can edit this answer and include all those other solutions. — cs95
– cs95, Commented Sep 12, 2017 at 14:02

jezrael · Accepted Answer · 2017-02-28 19:35:12Z

53

You can use reindex of index of Series created by len with sort_values:

print (df.name.str.len())
0    5
1    2
2    6
3    4
Name: name, dtype: int64

print (df.name.str.len().sort_values())
1    2
3    4
0    5
2    6
Name: name, dtype: int64

s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')

print (df.reindex(s))
     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
     name  score
0      Al      4
1    Greg      3
2   Steve      2
3  Markus      2

edited Feb 28, 2017 at 19:35

answered Feb 28, 2017 at 18:56

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

otayeby Over a year ago

Great answer, I tried this approach with lists too (Sorting a DataFrame by list length), since .str.len() works with lists as mentioned in the question Pythonic way for calculating length of lists in pandas dataframe column in this link

smci Over a year ago

This is clever, but you should note it's only safe to do when it's ok to trash the existing index.

mirekphd · Accepted Answer · 2022-12-20 19:22:37Z

52

Using DataFrame.sort_values we can pass an anonymous (lambda) function computing string length (using .str.len() Series method) to the key argument:

df = pd.DataFrame({
    'name': ['Steve', 'Al', 'Markus', 'Greg'], 
    'score': [2, 4, 2, 3]
})
print(df)

     name  score
0   Steve      2
1      Al      4
2  Markus      2
3    Greg      3

df.sort_values(by="name", key=lambda x: x.str.len())

     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

edited Dec 20, 2022 at 19:22

mirekphd

7,2314 gold badges62 silver badges89 bronze badges

answered Sep 20, 2020 at 19:48

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

1 Comment

Shovra Over a year ago

Thanks. Just in case someone needs to lower case and sort df.sort_index(key=lambda x: x.str.lower().str.len())

moshfiqur · Accepted Answer · 2019-12-31 13:00:29Z

18

I found this solution more intuitive, specially if you want to do something depending on the column length later on.

df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)

Now your dataframe will have a column with name length with the value of string length from column name in it and the whole dataframe will be sorted in descending order.

edited Dec 31, 2019 at 13:00

answered Oct 3, 2019 at 19:01

moshfiqur

2,1593 gold badges24 silver badges28 bronze badges

1 Comment

Display name Over a year ago

This should be the accepted answer. Much simpler and easily reused.

Thierry G. · Accepted Answer · 2020-02-05 14:03:01Z

3

The answer of @jezrael is great and explains well. Here is the final result :

index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)

answered Feb 5, 2020 at 14:03

Thierry G.

3054 silver badges13 bronze badges

Comments

Billy Bonaros · Accepted Answer · 2020-07-10 14:31:07Z

3

A fancy and minimal solution:

df.iloc[df.agg({"name":len}).sort_values('name').index]



     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2

answered Jul 10, 2020 at 14:31

Billy Bonaros

1,73114 silver badges19 bronze badges

1 Comment

luckyCasualGuy Over a year ago

Nice one! thanxx !!

matt91t · Accepted Answer · 2023-09-22 13:55:16Z

1

It's worth using the key argument to avoid creating unnecessary columns:

df.sort_values("column_name", ascending=True, key=lambda col: col.str.len())

answered Sep 22, 2023 at 13:55

matt91t

3493 silver badges11 bronze badges

Collectives™ on Stack Overflow

Sort dataframe by string length

6 Answers 6

2 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related