Sort strings in pandas

Question

I have a data frame in pandas and I want to sort it by column. If I use .sort_values() like in the below code:

df.sort_values(by='id')

I get the output in the 'id' column as:

1075_2016-06-01_0_1
1075_2016-06-01_10_1
1075_2016-06-01_10_2
1075_2016-06-01_11_1
1075_2016-06-01_11_2
1075_2016-06-01_1_1
1075_2016-06-01_1_2

I expected:

1075_2016-06-01_0_1
1075_2016-06-01_1_1
1075_2016-06-01_1_2
1075_2016-06-01_10_1
1075_2016-06-01_10_2
1075_2016-06-01_11_1
1075_2016-06-01_11_2

What is the best way to do this in pandas?

jezrael · Accepted Answer · 2019-08-17 10:56:25Z

1

One possible solution with natsort for get indices of sorting values and change of original DataFrame by loc:

from natsort import index_natsorted, order_by_index

df2 = df.loc[order_by_index(df.index, index_natsorted(df['id']))]

Or split all values by _, then convert columns to integer and optionaly to datetimes, sorting for indices and last use loc with original DataFrame:

df1 = df['id'].str.split('_', expand=True)
df1[[0,2,3]] = df1[[0,2,3]].astype(int)
df1[1] = pd.to_datetime(df1[1])

df2 = df.loc[df1.sort_values([0,1,2,3]).index]
print (df2)
                     id
0   1075_2016-06-01_0_1
5   1075_2016-06-01_1_1
6   1075_2016-06-01_1_2
1  1075_2016-06-01_10_1
2  1075_2016-06-01_10_2
3  1075_2016-06-01_11_1
4  1075_2016-06-01_11_2

Next solution with argsort for sorting and change order by indexing by positions with iloc for working with any index values:

f = lambda x: [int(x[0]), pd.to_datetime(x[1]), int(x[2]), int(x[3])]
df2 = df.iloc[df['id'].str.split('_').map(f).argsort()]
print (df2)
                     id
0   1075_2016-06-01_0_1
5   1075_2016-06-01_1_1
6   1075_2016-06-01_1_2
1  1075_2016-06-01_10_1
2  1075_2016-06-01_10_2
3  1075_2016-06-01_11_1
4  1075_2016-06-01_11_2

edited Aug 17, 2019 at 10:56

answered Aug 17, 2019 at 9:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hyper Over a year ago

Thanks for the answer! 'id' is not a pandas index it's just a column

jezrael Over a year ago

@hyper - Yes, both solution working with id column.

hyper · Accepted Answer · 2019-08-17 10:43:48Z

1

Guys thank you very much! The combination of the two solutions worked:

df['sort_val']= df['id'].str.split('_') 
f = lambda x: [int(x[0]), pd.to_datetime(x[1]), int(x[2]), int(x[3])]
df['sort_val'] = df['sort_val'].map(f)
df.sort_values(by='sort_val').drop('sort_val',1)

answered Aug 17, 2019 at 10:43

hyper

13910 bronze badges

Comments

willeM_ Van Onsem · Accepted Answer · 2019-08-17 09:49:12Z

0

You can first split the values per underscores, and then sort these, like:

df['sort_val'] = df['id'].str.split('_')
df = df.sort_values('sort_val').drop('sort_val', 1)

The above should work on an arbitrary number of underscore separated values.

This gives us:

>>> df
                     id
0   1075_2016-06-01_0_1
1  1075_2016-06-01_10_1
2  1075_2016-06-01_10_2
3  1075_2016-06-01_11_1
4  1075_2016-06-01_11_2
5   1075_2016-06-01_1_1
6   1075_2016-06-01_1_2
>>> df['sort_val'] = df['id'].str.split('_')
>>> df = df.sort_values('sort_val').drop('sort_val', 1)
>>> df
                     id
0   1075_2016-06-01_0_1
5   1075_2016-06-01_1_1
6   1075_2016-06-01_1_2
1  1075_2016-06-01_10_1
2  1075_2016-06-01_10_2
3  1075_2016-06-01_11_1
4  1075_2016-06-01_11_2

answered Aug 17, 2019 at 9:49

willeM_ Van Onsem

482k33 gold badges483 silver badges624 bronze badges

2 Comments

hyper Over a year ago

for some reason this does not work if there are also other elements, for example: 1075_2016-06-01_1_3 1075_2016-06-01_2_0 1075_2016-06-01_6_0

willeM_ Van Onsem Over a year ago

@hyper: for the given sample data, I get the items sorted in the order you provide these, which seems to be the correct sorting?

Collectives™ on Stack Overflow

Sort strings in pandas

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related