1

If I have this pandas v1.3.4 dataframe:

index         col1          col2
  1      ['1','2','3']       'a'
  2      ['2','4','2']       'b'
  3      ['5','2','1']       'c'
  4      ['3','2','1']       'd'

How can I sort each value in col1 without changing the index or any other values (col2 in this case)? For this example, if I sort from lowest to highest (assuming lexographic sorting matched the numerical sorting) I would get:

index         col1          col2
  1      ['1','2','3']       'a'
  2      ['2','2','4']       'b'
  3      ['1','2','5']       'c'
  4      ['1','2','3']       'd'

I don't particularly care what sorting approach I take, I just want lists with the same items to have the same order so they are recognised as equivalent, for some downstream data visualisation.

Thanks!

Tim

4 Answers 4

2

As you want to sort string representations of integers, use natsort:

from natsort import natsorted
df['col1'] = df['col1'].apply(natsorted)

output:

   index             col1 col2
0      1  ['1', '2', '3']  'a'
1      2  ['2', '2', '4']  'b'
2      3  ['1', '2', '5']  'c'
3      4  ['1', '2', '3']  'd'
Sign up to request clarification or add additional context in comments.

Comments

1

In case you don't want to use any import (apart from pandas, of course):

import pandas as pd
df = pd.DataFrame({'col1': [['1', '2', '20'], ['2', '10', '2'], ['30', '2', '1'], ['3', '2', '1']]})

You can sort each list numerically using:

df[['col1']].apply(lambda x: sorted(map(int,x["col1"])), axis=1)

OUTPUT

0    [1, 2, 20]
1    [2, 2, 10]
2    [1, 2, 30]
3     [1, 2, 3]

Or as strings using:

df[['col1']].apply(lambda x: sorted(map(str,x["col1"])), axis=1)

OUTPUT

0    [1, 2, 20]
1    [10, 2, 2]
2    [1, 2, 30]
3     [1, 2, 3]

Comments

1

Or good old list comprehension.

df['col1'] = [sorted(i) for i in df.col1]

Example using iris:

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris['test'] = iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].values.tolist()
iris['test2'] = [sorted(i) for i in iris.test]

Comments

0

You could convert your column to list with ast.literal_eval if col1 is a string then sort it with apply:

import ast
df.col1 = df.col1.apply(lambda x: sorted(ast.literal_eval(x)))
print(df)

Output:

            col1 col2
index
1      [1, 2, 3]  'a'
2      [2, 2, 4]  'b'
3      [1, 2, 5]  'c'
4      [1, 2, 3]  'd'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.