1

I want to sort 1 column in the dataframe by following logic:

  1. Numeric goes first - by ascending order;
  2. Alphabet follows - by ascending order;
  3. Lastly, string length - by descending order.

Example dataframe - using name column to sort and eventually adding an 'Order' column too:

import pandas as pd
df_1 = pd.DataFrame({'name': ['3D', '3DD', 'AC', 'AC-', 'BE', '2C','BED'], 'score': [2, 4, 2, 3, 10, 8, 2]})

I have tried sort_values() per below,

df_1['Len'] = df_1['name'].apply(lambda x: len(x))
df_1.sort_values(by=['name', 'Len'], ascending=[True, False], inplace=True,ignore_index=True)
df_1.drop(columns=['Len'], inplace=True)
df_1['Order'] = df_1.index+1

however, giving me this result - basically the string length by descending sorting didn't work:

   name score   Order
0   2C  8   1
1   3D  2   2
2   3DD 4   3
3   AC  2   4
4   AC- 3   5
5   BE  10  6
6   BED 2   7

Based on my above sorting logics, this is the desired results:

  name  score   Order
0   2C  8   1
1   3DD 4   2
2   3D  2   3
3   AC- 3   4
4   AC  2   5
5   BED 2   6
6   BE  10  7

Thank you!

1
  • You could write a scoring function to calculate an order from the name column and provide it to key of sort_values. Commented Mar 11, 2021 at 7:44

1 Answer 1

2

You can fill the names to have the same length using the last element of the ASCII table so pandas will know how to sort automatically.

          name
0           2C
1           3D
2          3DD
3           AC
4          AC-
5           BE
6          BED

max_length = max(df.name.str.len())

df.loc['sort_name']=df.name.str.pad(max_length,'right','~')

df.sort_values('sort_name', inplace=True, ignore_index=True)
  name sort_name
0   2C       2C~
2  3DD       3DD
1   3D       3D~
4  AC-       AC-
3   AC       AC~
6  BED       BED
5   BE       BE~

This will take the maximum length of the column as the number to pad.

After you have sorted the dataframe you can delete the column with

df = df.drop('sort_name', axis=1)
Sign up to request clarification or add additional context in comments.

6 Comments

This is brilliant!
Hi @sergiomahi: thank you for that. However, the maximum length is not always just 3, sometimes more than that, depending on the values provided by users, so we won’t know the maximum length.
@XavierSun see my edit. Adding that line the code will automatically calculate the maximum length of the column and use it, so you won't have to know it beforehand.
Hi @sergiomahi, Thank you so much! I modified the max_length to int as max_length = int(max(df['name'].str.len())) to make it work. it is working in my example data, but not working in my real data, maybe because of other columns in the same dataframe impacting the sorting by string length? Wondering if there will be other ways to achieve that?
@sergiomahi, actually when i modified to df.sort_values('sort_name', inplace=True, ignore_index=True), it was then working. Maybe it'll be great to edit your answers too so that other users can use the same method to apply if having the same need. I can mark it as Answer once you modify it. Thank you again!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.