1

I'm using the pandas built-in DataReader to download data from the Fama-French data library. The dates are initially just integers in yyyymm format:

import pandas.io.data as web
ff = web.DataReader("F-F_Research_Data_Factors", "famafrench")[0]
ff.head()

enter image description here

I want to convert the index to a datetime, where the date is the last day of the month. Right now, I'm doing this:

ff.reset_index(inplace=True)

import calendar
def dateParser(dt):
    yyyy = int(dt[0:4])
    mm = int(dt[4:6])
    dd = calendar.monthrange(yyyy,mm)[1]   #last day of month
    return pd.datetime(yyyy,mm,dd)

ff['date'] = ff['index'].astype(str).apply(dateParser)
ff.index = ff['date']

ff.drop(['index', 'date'], axis=1, inplace=True)

Is there a faster/more elegant way to accomplish this? For example, is there a way to apply dateParser directly to the index (perhaps inplace) so I don't have to reset_index first?

2
  • possible duplicate of How to change Pandas dataframe index value? Commented Jun 11, 2015 at 15:31
  • I think it's different. I'm asking about applying a date function to an index. Commented Jun 11, 2015 at 15:37

1 Answer 1

2
In [35]: ff = web.DataReader("F-F_Research_Data_Factors", "famafrench")[0]

In [36]: ff.head()
Out[36]: 
        1 Mkt-RF  2 SMB  3 HML  4 RF
192607      2.96  -2.30  -2.87  0.22
192608      2.64  -1.40   4.19  0.25
192609      0.36  -1.32   0.01  0.23
192610     -3.24   0.04   0.51  0.32
192611      2.53  -0.20  -0.35  0.31

In [38]: ff.index
Out[38]: 
Int64Index([192607, 192608, 192609, 192610, 192611, 192612, 192701, 192702, 192703, 192704, 
            ...
            201407, 201408, 201409, 201410, 201411, 201412, 201501, 201502, 201503, 201504],
           dtype='int64', length=1066)

In [39]: ff.index = pd.to_datetime(ff.index,format='%Y%m') + pd.offsets.MonthEnd()

In [40]: ff.index
Out[40]: 
DatetimeIndex(['1926-07-31', '1926-08-31', '1926-09-30', '1926-10-31', '1926-11-30', '1926-12-31', '1927-01-31', '1927-02-28', '1927-03-31', '1927-04-30', 
               ...
               '2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31', '2014-11-30', '2014-12-31', '2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30'],
              dtype='datetime64[ns]', length=1066, freq='M', tz=None)

In [41]: ff.head()
Out[41]: 
            1 Mkt-RF  2 SMB  3 HML  4 RF
1926-07-31      2.96  -2.30  -2.87  0.22
1926-08-31      2.64  -1.40   4.19  0.25
1926-09-30      0.36  -1.32   0.01  0.23
1926-10-31     -3.24   0.04   0.51  0.32
1926-11-30      2.53  -0.20  -0.35  0.31

Note that its actually faster to convert the index like following as the format has a fast path.

pd.to_datetime(ff.index*100+1,format='%Y%m%d') + pd.offsets.MonthEnd()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.