sort_values() with key in Python

Question

I have a dataframe where the column names are times (0:00, 0:10, 0:20, ..., 23:50). Right now, they're sorted in a string order (so 0:00 is first and 9:50 is last) but I want to sort them after time (so 0:00 is first and 23:50 is last).

If time is a column, you can use

df = df.sort(columns='Time',key=float)

But 1) that only works if time is a column itself, rather than the column names, and 2) sort() is deprecated so I try to abstain from using it.

I'm trying to use

df = df.sort_index(axis = 1)

but since the column names are in string format, they get sorted according to a string key. I've tried

df = df.sort_index(key=float, axis=1)

but that gives an error message:

Traceback (most recent call last):
  File "<ipython-input-112-5663f277da66>", line 1, in <module>
      df.sort_index(key=float, axis=1)
TypeError: sort_index() got an unexpected keyword argument 'key'

Does anyone have ideas for how to fix this? So annoying that sort_index() - and sort_values() for that matter - don't have the key argument!!

Abdou: Your solution gave an error: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00 — J.D
– J.D, Commented May 4, 2017 at 15:48
@J.Dahlgren, my example below works just fine with the provided data. Provide share your actual headers or the first 5 or 10 rows of your data. Also, 1-01-01 00:00:00 is clearly not a date or time. — Abdou
– Abdou, Commented May 4, 2017 at 15:53

Abdou · Accepted Answer · 2017-05-04 15:47:50Z

3

Try sorting the columns with the sorted builtin function and passing the output to the dataframe for indexing. The following should serve as a working example:

import pandas as pd


records = [(2, 33, 23, 45), (3, 4, 2, 4), (4, 5, 7, 19), (4, 6, 71, 2)]
df = pd.DataFrame.from_records(records, columns = ('0:00', '23:40', '12:30', '11:23'))
df
#    0:00  23:40  12:30  11:23
# 0     2     33     23     45
# 1     3      4      2      4
# 2     4      5      7     19
# 3     4      6     71      2

df[sorted(df,key=pd.to_datetime)]

#    0:00  11:23  12:30  23:40
# 0     2     45     23     33
# 1     3      4      2      4
# 2     4     19      7      5
# 3     4      2     71      6

I hope this helps

answered May 4, 2017 at 15:47

Abdou

13.3k4 gold badges44 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

J.D Over a year ago

I wanted to make sure this worked with a time such as '9:10' too and it did. I did google a lot before posting this question so I'm surprised I didn't come across the sorted() function.

Martin Krämer · Accepted Answer · 2017-05-04 15:29:12Z

2

Just prepend a leading zero to one-digit hours. This should be the simplest solution as you can simply sort lexically then.

E.g. 5:30 -> 05:30.

answered May 4, 2017 at 15:29

Martin Krämer

5773 silver badges17 bronze badges

2 Comments

J.D Over a year ago

Breaking out column names, extracting the one-digit hours, prepending a zero, and replacing the column names with the new prepended ones seems like not a very smooth solution.

Martin Krämer Over a year ago

You do not have to actually change the data. You can just use a sorting function that respects this, i.e. call reindex_axis from pandas on sorted(cmp=x) where x sorts the data respecting the implicit leading zero. This basically is just a check whether the third character is a colon or not. I doubt you can make it more efficient. Any conversion routines are bound to be more expensive.

Community · Accepted Answer · 2017-05-23 12:26:17Z

2

Here is a working demo, which implements @MartinKrämer's idea:

import re

In [259]: df
Out[259]:
   23:40  0:00  19:19  12:30  09:00  11:23
0     33     2      1     23     12     45
1      4     3      1      2     13      4
2      5     4      1      7     14     19
3      6     4      1     71     14      2

In [260]: df.rename(columns=lambda x: re.sub(r'^(\d{1})\:', r'0\1:', x)).sort_index(axis=1)
Out[260]:
   00:00  09:00  11:23  12:30  19:19  23:40
0      2     12     45     23      1     33
1      3     13      4      2      1      4
2      4     14     19      7      1      5
3      4     14      2     71      1      6

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered May 4, 2017 at 16:07

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

1 Comment

J.D Over a year ago

This works too, but I think Abdou's solution is cleaner.

W. Streyer · Accepted Answer · 2021-08-06 18:31:55Z

0

I know this question is a few years old, but since it's the top Google result for this question, I wanted to provide the root cause of the error.

The 'key' argument was added to sort_values in version 1.1.0. See the note in the documentation linked below.

pandas.DataFrame.sort_values

This feature will very like work as you intended if you upgrade to 1.1.0 or higher.

answered Aug 6, 2021 at 18:31

W. Streyer

1

Comments

shiva · Accepted Answer · 2023-01-04 13:10:29Z

0

It seems sort_values() with key may not work. However, sort_index() with key can do the thing. Referring Abdou enter image description here

edited Jan 4, 2023 at 13:10

shiva

5,5555 gold badges27 silver badges45 bronze badges

answered Jan 4, 2023 at 12:54

Sachin Gurarikar

11 silver badge3 bronze badges

Collectives™ on Stack Overflow

sort_values() with key in Python

5 Answers 5

1 Comment

2 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related