25

Based on python, sort descending dataframe with pandas:

Given:

from pandas import DataFrame
import pandas as pd

d = {'x':[2,3,1,4,5],
     'y':[5,4,3,2,1],
     'letter':['a','a','b','b','c']}

df = DataFrame(d)

df then looks like this:

df:
      letter    x    y
    0      a    2    5
    1      a    3    4
    2      b    1    3
    3      b    4    2
    4      c    5    1

I would like to have something like:

f = lambda x,y: x**2 + y**2
test = df.sort(f('x', 'y'))

This should order the complete dataframe with respect to the sum of the squared values of column 'x' and 'y' and give me:

test:
      letter    x    y
    2      b    1    3
    3      b    4    2
    1      a    3    4
    4      c    5    1
    0      a    2    5

Ascending or descending order does not matter. Is there a nice and simple way to do that? I could not yet find a solution.

5 Answers 5

34

You can create a temporary column to use in sort and then drop it:

df.assign(f = df['one']**2 + df['two']**2).sort_values('f').drop('f', axis=1)
Out: 
  letter  one  two
2      b    1    3
3      b    4    2
1      a    3    4
4      c    5    1
0      a    2    5
Sign up to request clarification or add additional context in comments.

4 Comments

this seems to be the best way to go, but it sorta sucks... it would be way more elegant to pass a lambda function into sort_values, the same way you'd do that for python's native sorted() call
@AlexSpangher, looks like we still don't have this feature supported yet for now, 2020 Feb :-(
The advantage of python is that when it doesn't exist you can just add the method.
How is this the top voted answer? This is not going to work well for anything except purely numerical data with no NANs and where the function doesn't result in floating point roundoff
18
df.loc[(df.x ** 2 + df.y ** 2).sort_values().index]

after How to sort pandas dataframe by custom order on string index

3 Comments

Thank you this is a realy nice solution! The index of the sorted data is used in combination with iloc. This is neat. No further column is needed.
That indeed look like the correct approach, on the other hand you should use .loc instead of .iloc because this wouldn't work with most indexes (it will only work with indexes like list(range(n)). I'll add an alternative this just in case.
There using iloc with argsort which is very similar to this strategy.
3

Have you tried to create a new column and then sorting on that. I cannot comment on the original post, so i am just posting my solution.

df['c'] = df.a**2 + df.b**2
df = df.sort_values('c')

1 Comment

The "problem" with this solution is that it actually creates another column which is not the exact goal here (input and output column should be the same).
1
from pandas import DataFrame
import pandas as pd

d = {'one':[2,3,1,4,5],
     'two':[5,4,3,2,1],
     'letter':['a','a','b','b','c']}

df = pd.DataFrame(d)

#f = lambda x,y: x**2 + y**2
array = []
for i in range(5):
    array.append(df.ix[i,1]**2 + df.ix[i,2]**2)
array = pd.DataFrame(array, columns = ['Sum of Squares'])
test = pd.concat([df,array],axis = 1, join = 'inner')
test = test.sort_index(by = "Sum of Squares", ascending = True).drop('Sum of Squares',axis =1)

Just realized that you wanted this:

    letter  one  two
2      b    1    3
3      b    4    2
1      a    3    4
4      c    5    1
0      a    2    5

Comments

0

Another approach, similar to this one is to use argsort which returns the indexes permutation directly:

f = lambda r: r.x**2 + r.y**2
df.iloc[df.apply(f, axis=1).argsort()]

I think using argsort better translates the idea than a regular sort (we don't care about the value of this computation, only about the resulting indexes).

It could also be interesting to patch the DataFrame to add this functionality:

def apply_sort(self, *, key):
    return self.iloc[self.apply(key, axis=1).argsort()]

pd.DataFrame.apply_sort = apply_sort

We can then simply write:

>>> df.apply_sort(key=f)

   x  y letter
2  1  3      b
3  4  2      b
1  3  4      a
4  5  1      c
0  2  5      a

1 Comment

since you do a row-wise apply here wouldnt this be trading a fair bit of performance on any vectorized operation compared to andrewkittredge's method? Does the sort vs argsort offset these concerns?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.