0

The objective is to sort a given multiple columns based on multiples list in pandas as below. Thanks to sammywemmy for the hint.

However, the suggestion produced a column of nan for the other columns that not being considered.

import pandas as pd
sort_a=['a','d','e']
sort_b=['s1','s3','s6']
sort_c=['t1','t2','t3']
df=pd.DataFrame(zip([1,2,3,4,5,6,7],['a', 'e', 'd','a','a','d','e'], ['s3', 's1', 's6','s6','s3','s3','s1'], ['t3', 't2', 't1','t2','t2','t3','t3']),columns=['var',"a", "b", "c"])

categories = {col : pd.CategoricalDtype(categories=cat, ordered=True)
              for col, cat
              in zip(df.columns, [sort_a, sort_b, sort_c])}

df_ouput=df.astype(categories).sort_values([*df.columns])



   var    a    b   c
2  NaN  NaN  NaN  t1
1  NaN  NaN  NaN  t2
3  NaN  NaN  NaN  t2
4  NaN  NaN  NaN  t2
0  NaN  NaN  NaN  t3
5  NaN  NaN  NaN  t3
6  NaN  NaN  NaN  t3

Whereas, the expected output

var a   b   c
5   a   s3  t2
1   a   s3  t3
4   a   s6  t2
6   d   s3  t3
3   d   s6  t1
2   e   s1  t2
7   e   s1  t3

2 Answers 2

2

Instead of passing df.columns pass the column names that you want to include:

categories = {col : pd.CategoricalDtype(categories=cat, ordered=True)
              for col, cat
              in zip(['a','b','c'], [sort_a, sort_b, sort_c])}

Finally pass by parameter in sort_values() instead of unpacking df.columns pass the keys of categories and unpack it:

df=df.astype(categories).sort_values([*categories.keys()])

output of df:

  var   a   b   c
4   5   a   s3  t2
0   1   a   s3  t3
3   4   a   s6  t2
5   6   d   s3  t3
2   3   d   s6  t1
1   2   e   s1  t2
6   7   e   s1  t3
Sign up to request clarification or add additional context in comments.

Comments

0

While not directly related to sort with reference to a list, but the key question when posting this OP as I wanted to sort columns with string in it.

Using sort_values for pandas >= 1.1.0

With the new key argument in DataFrame.sort_values, since pandas 1.1.0, we can directly sort a column without setting it as an index using natsort.natsort_keygen:

from natsort import natsort_keygen
df=df.sort_values(
    by=['a','b','c'],
    key=natsort_keygen()
)

Output:

 var  a   b   c
4    5  a  s3  t2
0    1  a  s3  t3
3    4  a  s6  t2
5    6  d  s3  t3
2    3  d  s6  t1
1    2  e  s1  t2
6    7  e  s1  t3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.