Sorting selected multiple columns based on list in Pandas

Question

The objective is to sort a given multiple columns based on multiples list in pandas as below. Thanks to sammywemmy for the hint.

However, the suggestion produced a column of nan for the other columns that not being considered.

import pandas as pd
sort_a=['a','d','e']
sort_b=['s1','s3','s6']
sort_c=['t1','t2','t3']
df=pd.DataFrame(zip([1,2,3,4,5,6,7],['a', 'e', 'd','a','a','d','e'], ['s3', 's1', 's6','s6','s3','s3','s1'], ['t3', 't2', 't1','t2','t2','t3','t3']),columns=['var',"a", "b", "c"])

categories = {col : pd.CategoricalDtype(categories=cat, ordered=True)
              for col, cat
              in zip(df.columns, [sort_a, sort_b, sort_c])}

df_ouput=df.astype(categories).sort_values([*df.columns])



   var    a    b   c
2  NaN  NaN  NaN  t1
1  NaN  NaN  NaN  t2
3  NaN  NaN  NaN  t2
4  NaN  NaN  NaN  t2
0  NaN  NaN  NaN  t3
5  NaN  NaN  NaN  t3
6  NaN  NaN  NaN  t3

Whereas, the expected output

var a   b   c
5   a   s3  t2
1   a   s3  t3
4   a   s6  t2
6   d   s3  t3
3   d   s6  t1
2   e   s1  t2
7   e   s1  t3

Anurag Dabas · Accepted Answer · 2021-08-04 05:16:41Z

2

Instead of passing df.columns pass the column names that you want to include:

categories = {col : pd.CategoricalDtype(categories=cat, ordered=True)
              for col, cat
              in zip(['a','b','c'], [sort_a, sort_b, sort_c])}

Finally pass by parameter in sort_values() instead of unpacking df.columns pass the keys of categories and unpack it:

df=df.astype(categories).sort_values([*categories.keys()])

output of df:

  var   a   b   c
4   5   a   s3  t2
0   1   a   s3  t3
3   4   a   s6  t2
5   6   d   s3  t3
2   3   d   s6  t1
1   2   e   s1  t2
6   7   e   s1  t3

edited Aug 4, 2021 at 5:16

answered Aug 4, 2021 at 5:10

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

rpb · Accepted Answer · 2021-08-30 03:31:42Z

0

While not directly related to sort with reference to a list, but the key question when posting this OP as I wanted to sort columns with string in it.

Using sort_values for pandas >= 1.1.0

With the new key argument in DataFrame.sort_values, since pandas 1.1.0, we can directly sort a column without setting it as an index using natsort.natsort_keygen:

from natsort import natsort_keygen
df=df.sort_values(
    by=['a','b','c'],
    key=natsort_keygen()
)

Output:

 var  a   b   c
4    5  a  s3  t2
0    1  a  s3  t3
3    4  a  s6  t2
5    6  d  s3  t3
2    3  d  s6  t1
1    2  e  s1  t2
6    7  e  s1  t3

edited Aug 30, 2021 at 3:31

answered Aug 30, 2021 at 2:30

rpb

3,3073 gold badges32 silver badges72 bronze badges

Collectives™ on Stack Overflow

Sorting selected multiple columns based on list in Pandas

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related