1

I have a dataframe where one of the columns is a string with a software name and different versions. When trying to sort by this column the ordering is not respecting the versioning.

The column to sort has a format of this kind

>>> import pandas as pd
>>> df = pd.DataFrame({'versions': ['cd-2.8.10', 'cd-3.10.3', 'cd-3.3.1', 'cd-3.10.10', 'cd-3.12.0', 'ab-5.2.1', 'cd-3.1.3', 'cd-3.5.2', 'ab-3.0.2', 'cd-3.10.1', 'cd-3.20.1', 'cd-3.11.4']})
>>> df
      versions
0    cd-2.8.10
1    cd-3.10.3
2     cd-3.3.1
3   cd-3.10.10
4    cd-3.12.0
5     ab-5.2.1
6     cd-3.1.3
7     cd-3.5.2
8     ab-3.0.2
9    cd-3.10.1
10   cd-3.20.1
11   cd-3.11.4

when using sort_values() the string part with characters before the dash is perfectly sorted alphabetically, but for a given software the version number sorting is wrong by taking 3.10.1 as smaller to 3.3.1, or 3.10.10 as smaller than 3.10.3

>>> df.sort_values('versions')
      versions
8     ab-3.0.2
5     ab-5.2.1
0    cd-2.8.10
6     cd-3.1.3
9    cd-3.10.1
3   cd-3.10.10
1    cd-3.10.3
11   cd-3.11.4
4    cd-3.12.0
10   cd-3.20.1
2     cd-3.3.1
7     cd-3.5.2

I would like to get the correct version ordering as

      versions
8     ab-3.0.2
5     ab-5.2.1
0    cd-2.8.10
6     cd-3.1.3
2     cd-3.3.1
7     cd-3.5.2
9    cd-3.10.1
1    cd-3.10.3
3   cd-3.10.10
11   cd-3.11.4
4    cd-3.12.0
10   cd-3.20.1
1

1 Answer 1

1

This is a complicated issue because pandas does not directly support natural sorting. Thankfully, using the natsort module, this should be easy and also handle most version formats.

from natsort import natsorted
df.iloc[natsorted(df.index, key=lambda x: df.loc[x, 'versions'])]

      versions
8     ab-3.0.2
5     ab-5.2.1
0    cd-2.8.10
6     cd-3.1.3
2     cd-3.3.1
7     cd-3.5.2
9    cd-3.10.1
1    cd-3.10.3
3   cd-3.10.10
11   cd-3.11.4
4    cd-3.12.0
10   cd-3.20.1

Here's another way of sorting this data (it is slightly faster because we avoid the lambda),

d = df.versions.to_dict()
df.iloc[natsorted(d, key=d.get)]

      versions
8     ab-3.0.2
5     ab-5.2.1
0    cd-2.8.10
6     cd-3.1.3
2     cd-3.3.1
7     cd-3.5.2
9    cd-3.10.1
1    cd-3.10.3
3   cd-3.10.10
11   cd-3.11.4
4    cd-3.12.0
10   cd-3.20.1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.