pythonpandaszipline

Get last date in each month of a time series pandas


Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days. The time series is roughly daily but with some gaps.

My goal is to get the last date in the DateTimeIndex for each month.

.to_period('M') & .to_timestamp('M') don't work since they give the last day of the month rather than the last value of the variable in each month.

As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.

['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01']


Solution

  • Condla's answer came closest to what I needed except that since my time index stretched for more than a year I needed to groupby by both month and year and then select the maximum date. Below is the code I ended up with.

    # tempTradeDays is the initial DatetimeIndex
    dateRange = []  
    tempYear = None  
    dictYears = tempTradeDays.groupby(tempTradeDays.year)
    for yr in dictYears.keys():
        tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
        for m in tempYear.keys():
            dateRange.append(max(tempYear[m]))
    dateRange = pd.DatetimeIndex(dateRange).order()