11

I'm looking for a way to sort pandas DataFrame. pd.DataFrame.sort_values doesn't accept a key function, pre-1.0. I can convert it to list and apply a key to sorted function, but that will be slow. The other way seems something related to categorical index. I don't have a fixed number of rows so I don't know if categorical index will be applicable.

I have given an example case of what kind of data I want to sort:

Input DataFrame:

     clouds  fluff
0    {[}      1
1    >>>      2
2     {1      3
3    123      4
4  AAsda      5
5    aad      6

Output DataFrame:

     clouds  fluff
0    >>>      2
1    {[}      1
2     {1      3
3    123      4
4    aad      6
5  AAsda      5

The rule for sorting (priority):

  • First special characters (sort by ascii among themselves)

  • Next is by numbers

  • next is by lower case alphabets (lexicographical)

  • next is Capital case alphabets (lexicographical)

In plain python I'd do it like

from functools import cmp_to_key

def ks(a, b):
    # "Not exactly this but similar"
    if a.isupper():
        return -1
    else:
        return 1

Case

sorted(['aa', 'AA', 'dd', 'DD'], key=cmp_to_key(ks))

Answer:

['DD', 'AA', 'aa', 'dd']

How would you do it with Pandas?

4
  • stackoverflow.com/questions/38662826/… have you checked this? Commented Sep 24, 2018 at 8:55
  • @hellpanderr, Yes but I don't know how to translate this problem to that way. Commented Sep 25, 2018 at 7:36
  • @Goyo, Changed the example from Column to pd.DataFrame. Commented Sep 25, 2018 at 7:48
  • 1
    There is an issue filed on GitHub related to this. Commented Sep 25, 2018 at 7:51

4 Answers 4

9

As of pandas 1.1.0, pandas.DataFrame.sort_values accepts an argument key with type callable.

So in this case we would use:

df.sort_values(by='clouds', key=kf)

where kf is the key function that operates on type Series. Accepts and returns Series.

Sign up to request clarification or add additional context in comments.

Comments

8

As of pandas 1.2.0, I did this

import numpy as np
import pandas as pd

df = pd.DataFrame(['aa', 'dd', 'DD', 'AA'], columns=["data"])

# This is the sorting rule
rule = {
    "DD": 1,
    "AA": 10,
    "aa": 20,
    "dd": 30,
    }


def particular_sort(series):
    """
    Must return one Series
    """
    return series.apply(lambda x: rule.get(x, 1000))


new_df = df.sort_values(by=["data"], key=particular_sort)
print(new_df)  # DD, AA, aa, dd

Of course, you can do this too, but it may be difficult to understand,smile

new_df = df.sort_values(by=["data"], key=lambda x: x.apply(lambda y: rule.get(y, 1000)))
print(new_df)  # DD, AA, aa, dd

Comments

2

This might be useful, yet still not sure about special characters! can they actally be sorted!!

import pandas as pd

a = [2, 'B', 'c', 1, 'a', 'b',3, 'C', 'A']

df = pd.DataFrame({"a": a})
df['upper'] = df['a'].str.isupper()
df['lower'] = df['a'].str.islower()
df['int'] = df['a'].apply(isinstance,args = [int])

df2 = pd.concat([df[df['int'] == True].sort_values(by=['a']), 
           df[df['lower'] == True].sort_values(by=['a']),
           df[df['upper'] == True].sort_values(by=['a'])])

print(df2)

   a    upper   lower   int
3   1   NaN     NaN     True
0   2   NaN     NaN     True
6   3   NaN     NaN     True
4   a   False   True    False
5   b   False   True    False
2   c   False   True    False
8   A   True    False   False
1   B   True    False   False
7   C   True    False   False

you can also do it in one step with creating new True False columns!

a = [2, 'B', 'c', 1, 'a', 'b',3, 'C', 'A']
df = pd.DataFrame({"a": a})
df2 = pd.concat([df[df['a'].apply(isinstance,args = [int])].sort_values(by=['a']), 
           df[df['a'].str.islower() == True].sort_values(by=['a']),
           df[df['a'].str.isupper() == True].sort_values(by=['a'])])

    a
3   1
0   2
6   3
4   a
5   b
2   c
8   A
1   B
7   C

Comments

2

This seems to work:

def sort_dataframe_by_key(dataframe: DataFrame, column: str, key: Callable) -> DataFrame:
    """ Sort a dataframe from a column using the key """
    sort_ixs = sorted(np.arange(len(dataframe)), key=lambda i: key(dataframe.iloc[i][column]))
    return DataFrame(columns=list(dataframe), data=dataframe.iloc[sort_ixs].values)

It passes tests:

def test_sort_dataframe_by_key():
    dataframe = DataFrame([{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3, 'b': 4, 'c': 0}])
    assert sort_dataframe_by_key(dataframe, column='a', key=lambda x: x).equals(
           DataFrame([{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3, 'b': 4, 'c': 0}]))
    assert sort_dataframe_by_key(dataframe, column='a', key=lambda x: -x).equals(
           DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 2, 'b': 1, 'c': 1}, {'a': 1, 'b': 2, 'c': 3}]))
    assert sort_dataframe_by_key(dataframe, column='b', key=lambda x: -x).equals(
           DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}]))
    assert sort_dataframe_by_key(dataframe, column='c', key=lambda x: x).equals(
           DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 2, 'b': 1, 'c': 1}, {'a': 1, 'b': 2, 'c': 3}]))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.