1

I have a Pandas DataFrame df with a DateTime column ('DateTime') and a column with numeric values ('load'). I want to sort the DataFrame based on the DateTime.

Hence I used the following code:

df.sort_values('DateTime')

However, the sorting is obviously not correct (I do have entries for every hour of the year).

    DateTime             load
0   2017-01-04 00:00:00 52223.4500
1   2017-01-04 01:00:00 51392.4225
2   2017-01-04 02:00:00 51523.6875
3   2017-01-04 03:00:00 52356.4525
4   2017-01-04 04:00:00 54685.1125
5   2017-01-04 05:00:00 60150.9925
6   2017-01-04 06:00:00 66820.7375
7   2017-01-04 07:00:00 70047.9175
8   2017-01-04 08:00:00 71457.6350
9   2017-01-04 09:00:00 72288.9975
10  2017-01-04 10:00:00 73059.6850
11  2017-01-04 11:00:00 72965.4000
12  2017-01-04 12:00:00 71860.8625
13  2017-01-04 13:00:00 70186.3825
14  2017-01-04 14:00:00 69362.5425
15  2017-01-04 15:00:00 70146.8800
16  2017-01-04 16:00:00 71641.2275
17  2017-01-04 17:00:00 70686.6700
18  2017-01-04 18:00:00 69214.0275
19  2017-01-04 19:00:00 65552.7600
20  2017-01-04 20:00:00 62177.0875
21  2017-01-04 21:00:00 60257.1750
22  2017-01-04 22:00:00 56170.3500
23  2017-01-04 23:00:00 52265.3050
24  2017-01-15 00:00:00 46725.7725
25  2017-01-15 01:00:00 45447.4650
26  2017-01-15 02:00:00 44887.1600
27  2017-01-15 03:00:00 44230.0025
28  2017-01-15 04:00:00 43838.2300
29  2017-01-15 05:00:00 42747.1475
... ... ...
8730    2017-12-28 02:00:00 40675.2025
8731    2017-12-28 03:00:00 42022.7050
8732    2017-12-28 04:00:00 44010.7025
8733    2017-12-28 05:00:00 46842.8875
8734    2017-12-28 06:00:00 51119.2625
8735    2017-12-28 07:00:00 55059.5600
8736    2017-12-28 08:00:00 58077.6375
8737    2017-12-28 09:00:00 59538.5075
8738    2017-12-28 10:00:00 60753.6975
8739    2017-12-28 11:00:00 60720.7275
8740    2017-12-28 13:00:00 58208.7925
8741    2017-12-28 12:00:00 59299.2325
8742    2017-12-28 15:00:00 58370.4075
8743    2017-12-28 16:00:00 61120.1675
8744    2017-12-28 17:00:00 61194.5025
8745    2017-12-28 18:00:00 59644.1900
8746    2017-12-28 19:00:00 56113.4500
8747    2017-12-28 20:00:00 53672.4725
8748    2017-12-28 21:00:00 52312.3350
8749    2017-12-28 22:00:00 48750.4325
8750    2017-12-28 23:00:00 45816.2225
8751    2017-12-29 00:00:00 43684.6650
8752    2017-12-29 01:00:00 42797.5800
8753    2017-12-29 02:00:00 42608.9925
8754    2017-12-29 03:00:00 43510.8925
8755    2017-12-29 04:00:00 44424.2175
8756    2017-12-29 05:00:00 46470.2750
8757    2017-12-29 06:00:00 50801.7100
8758    2017-12-29 07:00:00 54854.4375
8759    2017-12-29 08:00:00 56226.2575

I think that the columns are in the correct data type:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 2 columns):
DateTime    8760 non-null datetime64[ns]
load        8760 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 136.9 KB

If I search for the min or max value in my DateTime column, I find the correct entries. Only the sorting seems not to work. What can I try?

df.loc[df['DateTime'].idxmax()]

DateTime    2017-12-31 23:00:00
load                    43802.8
Name: 8706, dtype: object



df.loc[df['DateTime'].idxmin()]

DateTime    2017-01-01 00:00:00
load                    43202.4
Name: 48, dtype: object
8
  • 1
    That looks sorted to me, is there a place where it isn't sorted ? Commented Feb 12, 2019 at 12:00
  • 1
    It starts with 2017-01-04 and ends with 2017-12-29, but as the min/max code shows, there are also records with 2017-01-01 and 2017-12-31. Commented Feb 12, 2019 at 12:14
  • 2
    Are you sure this isn't an assignment issue? Try df = df.sort_values('DateTime') (or df.sort_values('DateTime', inplace=True)) Commented Feb 12, 2019 at 12:15
  • Thank you @JoshFriedlander ! This worked, so easy. However, I don't really understand, why the sorting without direct assignment did not work? Commented Feb 12, 2019 at 12:24
  • Not sure - was the output you pasted the direct result of the sorting call? If not, Pandas doesn't change the df itself Commented Feb 12, 2019 at 12:27

1 Answer 1

5

(Turning my comment into an answer as suggested by @Wai Ha Lee)

df.sort_values('DateTime') returns a sorted copy of the dataframe, but doesn't change the original.

That can be done either by explicit reassignment:

df = df.sort_values('DateTime')

or by using the inplace flag

df.sort_values('DateTime', inplace=True)

Although the latter is discouraged and slated for deprecation.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.