4

I'm trying to sort a CSV file and I want the items sorted by date in reverse order: newest first.

def SortCsvByField( filename, fieldNo, sep = ',' ):
   records = [line.split(sep) for line in file(filename)]

As far as that it's pretty easy, but how do I compare dates?

2
  • What you show is pretty easy but also pretty fragile. It is best to use Python's CSV library: docs.python.org/library/csv.html About comparing dates (I'm assuming some dates are in a specific column on each row) you can use the datetime module: docs.python.org/library/datetime.html. You can use the standard comparison operators on date objects. Commented Nov 20, 2009 at 23:48
  • it depends on the format of dates Commented Nov 20, 2009 at 23:49

3 Answers 3

5

I'd recommend installing the excellent dateutil module. (In Ubuntu/Debian, it is provided by the python-dateutil package).

dateutil can parse date strings into datetime objects: It can handle many different date formats without you having to lift a finger(*):

import dateutil.parser as dparser
date=dparser.parse("Mon May 7 1883 10:36:28")
print(date)
# 1883-05-07 10:36:28

date=dparser.parse("1685-3-21")
print(date)
# 1685-03-21 00:00:00

date=dparser.parse("12/17/1770")
print(date)
# 1770-12-17 00:00:00

Note that parse is interpretting "12/17/1770" as being of the form "MM/DD/YYYY". You can change this behavior using parse's dayfirst and yearfirst options. (See http://labix.org/python-dateutil)

print(type(date))
# <type 'datetime.datetime'>

datetime objects can be sorted easily:

dates=[dparser.parse("Mon May 7 1883 10:36:28"),dparser.parse("1685-3-21"),dparser.parse("12/17/1770"),]
dates.sort()
print(dates)
# [datetime.date(1685, 3, 21), datetime.date(1770, 12, 17), datetime.date(1833, 5, 7)]

If you prefer to not install the dateutil package, then you'll have to roll your own method of converting date strings into datetime objects. This requires more work since you'll have to define the format. Below, '%Y-%m-%d' defines the YYYY-MM-DD format. See https://www.php.net/strftime (or the man page of strftime) for more information on available format codes.

For example,

dates=[datetime.datetime.strptime(date_str,'%Y-%m-%d') for date_str in
       ('1883-5-7','1685-3-21','1770-12-17',)]
print([str(date) for date in dates])
# ['1883-05-07 00:00:00', '1685-03-21 00:00:00', '1770-12-17 00:00:00']
dates.sort()
print([str(date) for date in dates])
# ['1685-03-21 00:00:00', '1770-12-17 00:00:00', '1883-05-07 00:00:00']

To control the format when converting datetime objects back into printable strings, you can use the datetime.datetime.strftime() method.

Sign up to request clarification or add additional context in comments.

6 Comments

(1) "If you are using linux"?? Is there something platform dependant about dateutil? It appears to work fine on Windows [but see point 3 below] (2) dates preceded by a # e.g. # 2009-09-25 10:36:28?? Output from print()? If so, they're wrong. Should be e.g. 1883-05-07 10:36:28 (3) It will interpret "01/02/2009" as the first day of February for those adhering to the DD/MM/YYYY convention without them lifting a finger, will it? It wouldn't do something nasty like silently taking "31/12/2008" as 31 December while also taking "01/02/2009" as the 2nd of January, would it?
That's a good start, but you haven't addressed the issue of it cheerfully accepting mixed dd/mm and mm/dd dates by default with (AFAICT) no option to select "strict" behaviour.
I don't understand why it is appending 10-19 by default to 'month,date'-less dates. For instance 1946 --> 1946-10-19 and 46 --> 2046-10-19.. Can we change the default behavior?
@ThinkCode: It's appending 10-19 because today is Oct 19. What would you like dparser.parse('1946') to return?
oops, how did I even miss that! well, I applied this to date of birth column and those that had just 1946 was showing as 1946-10-19. Thanks for clearing it up. Us specifying what default date/month should be applied will make it more awesome. It is just if else blocks I guess. Thanks misspelled Ubuntu!
|
2

If your dates are in ISO-8601 format (YYYY-MM-DD) then you can sort them as strings, otherwise you will have to parse them first (datetime.strptime).

Then you can sort using for example sorted(records, key=lambda a:a[1]), if the date is the second field.

Comments

2

Assuming that you know the format of the dates, and that they're in column 1 of your CSV file:

>>> import csv
>>> from datetime import datetime
>>> def date_key(row):
        return datetime.strptime(row[1].strip(), "%m/%d/%Y")

>>> with open('c:\\temp\\test\\date_test.csv', 'rb') as f:
        data = list(csv.reader(f))

>>> data
[['foo', ' 3/11/2004'], ['bar', ' 2/15/2001'], ['baz', '11/15/2007'], ['bat', '10/13/2002']]
>>> data.sort(key=date_key)
>>> data
[['bar', ' 2/15/2001'], ['bat', '10/13/2002'], ['foo', ' 3/11/2004'], ['baz', '11/15/2007']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.