1

I have a list of CSV files that are in a file on my google drive. The main file is called all_files and the CSV files are listed like below.

all_files

['/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-07-2020.csv',
 '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-12-2020.csv',
 '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-28-2020.csv',
 '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/03-16-2020.csv',
 '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-18-2020.csv']

I am trying to find a way to sort these files based on the date in the file name (ex. 03-16-2020) (basically Jan 1 to latest file date) and return the sorted list of files while retaining all file data. Im not sure if I should sort them as a string because the resulting sort would not carry the file data with it.

Thanks for any help in advance

7
  • Do you have a list of strings as indicated, or a list of open()ed file handles, or something else? Commented Dec 16, 2020 at 13:42
  • Just a list of strings. Files are imported using glob.glob and the data path is assigned to all_files. When all_files is run the output is all the files names listed exactly as they are above, just more of them. Commented Dec 16, 2020 at 13:52
  • Ah okay, so by "would not carry the file data with it" you mean it doesn't have the original filename? Luckily sort[ed] has the key= keyword for exactly that sort of thing Commented Dec 16, 2020 at 13:57
  • Yes I believe that will do the trick Commented Dec 16, 2020 at 14:03
  • I posted some code below for you to have a look Commented Dec 16, 2020 at 14:05

2 Answers 2

1

This solution builds a (year, month, day) tuple for each filename fn and then just uses Python's built-in sorting which will compare a tuple front-to-back

# gets 'mm-dd-yyyy' part of filename string
get_date = lambda fn: fn.split('/')[-1].split('.')[0].split('-')

# actual sorting using tuples
sorted(all_files, key=
        lambda fn: (get_date(fn)[2], get_date(fn)[0], get_date(fn)[1]))
Sign up to request clarification or add additional context in comments.

Comments

0

Since your dates are all in 2020, and the month figure before the day, and every month and day has exactly two digits, and the only difference between two filenames is the date: sorting the list of strings naively would produce the correct order.

In general though, to correctly parse a date, I recommend using datetime.datetime.strptime. You can supply it as the key argument to sorted or list.sort:

import datetime

allfiles = [
  '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-07-2020.csv',
  '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-12-2020.csv',
  '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-28-2020.csv',
  '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/03-16-2020.csv',
  '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-18-2020.csv'
]

naively_sorted = sorted(allfiles)

sorted_by_date = sorted(allfiles,
                        key=lambda x: datetime.datetime.strptime(x[-14:-4], '%m-%d-%Y'))

print(naively_sorted)
print(sorted_by_date)

# ['/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/03-16-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-18-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-28-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-07-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-12-2020.csv']
# ['/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/03-16-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-18-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/07-28-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-07-2020.csv', '/content/drive/MyDrive/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/11-12-2020.csv']

Note that the slice x[-14:-4] makes no assumption on what the beginning of the string looks like; but it does assume that the string ends with "mm-dd-yyyy.csv".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.