2

I have read a couple of similar post regarding the issue before, but none of the solutions worked for me. so I got the followed csv :


    Score    date       term
0      72   3 Feb ·      1
1      47   1 Feb ·      1
2     119   6 Feb ·      1
8     101   7 hrs ·      1
9     536  11 min ·      1
10     53   2 hrs ·      1
11     20  11 Feb ·      3
3      15   1 hrs ·      2
4      33   7 Feb ·      1
5     153   4 Feb ·      3
6      34   3 min ·      2
7      26   3 Feb ·      3

I want to sort the csv by date. What's the easiest way to do that ?

3
  • 1
    what will happen to 11 min and 2 hrs ? also are all the years the same? Commented Feb 15, 2020 at 11:21
  • Years is all the same. the date is when something happened, something like "happened before" Commented Feb 15, 2020 at 12:18
  • My answer was unaccepted, there is some problem? Commented Feb 27, 2020 at 6:14

1 Answer 1

4

You can create 2 helper columns - one for datetimes created by to_datetime and second for timedeltas created by to_timedelta, only necessary format HH:MM:SS, so added Series.replace by regexes, so last is possible sorting by 2 columns by DataFrame.sort_values:

df['date1'] = pd.to_datetime(df['date'], format='%d %b', errors='coerce')
times = df['date'].replace({'(\d+)\s+min': '00:\\1:00',
                             '\s+hrs': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')

df = df.sort_values(['times','date1'])
print (df)

    Score    date  term      date1    times
6      34   3 min     2        NaT 00:03:00
9     536  11 min     1        NaT 00:11:00
3      15   1 hrs     2        NaT 01:00:00
10     53   2 hrs     1        NaT 02:00:00
8     101   7 hrs     1        NaT 07:00:00
1      47   1 Feb     1 1900-02-01      NaT
0      72   3 Feb     1 1900-02-03      NaT
7      26   3 Feb     3 1900-02-03      NaT
5     153   4 Feb     3 1900-02-04      NaT
2     119   6 Feb     1 1900-02-06      NaT
4      33   7 Feb     1 1900-02-07      NaT
11     20  11 Feb     3 1900-02-11      NaT
Sign up to request clarification or add additional context in comments.

3 Comments

Hey, @jezrael , thanks for your answer ! I just tried it, it was working very well for sorting the minutes from hours and so on.. But for some reason, it didn't sorted the days well. here i got an screenshot of the problem : prnt.sc/r2p0d2 . I got to say, that i tried it on other files, with other times, but the concept is the same. didn't got any minutes in the test file, can that cause the problem ?
@AlexKalaidjiev - I think there should be some whitespaces in data like '1 Feb ' ? If yes, then use df['date1'] = pd.to_datetime(df['date'].str.strip(), format='%d %b', errors='coerce')
you was right, there was an "whitespace". fixed it with: nf=len(df) for i in range(nf): asd = df.at[i,"date"] asd = asd.split() new = asd[0] + " " + asd[1] df.at[i,"date"] = new Thank you very much for your time !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.