pythonpandasohlc

5 minute OHLC data to hourly at quarter past the hour


I have OHLC data at 5 minute interval. I want to convert it to hourly, but I want those values at quarter past the hour, meaning between 9:15-10:15, 10:15-11:15, 11:15-12:15 and so forth. The following code can do it between the o'clock measurements. How can I modify the code to get the behaviour I want?

import pandas as pd

df = pd.read_csv('data.csv')
df['Datetime'] = pd.to_datetime(df['Datetime'])

df.set_index('Datetime', inplace=True)

hourly_data = df.resample('H').agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last', 'Volume': 'sum'})

Sample data.csv:

Datetime,Open,High,Low,Close,Volume
2015-02-03 09:15:00+05:30,200.4,202,199.1,200.6,286228
2015-02-03 09:20:00+05:30,200.6,201.5,200.45,201.15,110749
2015-02-03 09:25:00+05:30,201.2,201.7,201.15,201.65,108342
2015-02-03 09:30:00+05:30,201.75,202.5,201.45,202.2,162976
2015-02-03 09:35:00+05:30,202.2,202.2,201.4,202,118370
2015-02-03 09:40:00+05:30,201.9,202,201.5,201.7,49873
2015-02-03 09:45:00+05:30,201.7,201.85,201.6,201.65,27798
2015-02-03 09:50:00+05:30,201.65,201.7,201.5,201.5,41560
2015-02-03 09:55:00+05:30,201.5,201.75,201.5,201.65,105351
2015-02-03 10:00:00+05:30,201.65,201.9,201.6,201.8,79346
2015-02-03 10:05:00+05:30,201.65,201.9,201.65,201.75,23715
2015-02-03 10:10:00+05:30,201.65,202.85,201.65,202.8,112551
2015-02-03 10:15:00+05:30,202.85,203.2,202.65,202.95,165270
2015-02-03 10:20:00+05:30,202.95,203.25,202.8,203,105614
2015-02-03 10:25:00+05:30,203,203.4,203,203.2,51944
2015-02-03 10:30:00+05:30,203.2,203.8,203,203.75,78474
2015-02-03 10:35:00+05:30,203.75,204.35,203.75,203.75,125514
2015-02-03 10:40:00+05:30,203.75,205,203.75,204.9,249231
2015-02-03 10:45:00+05:30,204.9,206.2,204.6,205.6,388224
2015-02-03 10:50:00+05:30,205.85,205.85,204.7,205,118267
2015-02-03 10:55:00+05:30,205,205.25,204.4,204.9,72310
2015-02-03 11:00:00+05:30,204.9,205.35,204.05,205,94266
2015-02-03 11:05:00+05:30,205,205.4,204.8,205,53333

Solution

  • resample with 'h' plus an offset of 15m:

    hourly_data = df.resample("h", offset="15m").agg(
        {"Open": "first", "High": "max", "Low": "min", "Close": "last", "Volume": "sum"}
    )
    
                                 Open    High     Low  Close   Volume
    Datetime                                                         
    2015-02-03 09:15:00+05:30  200.40  202.85  199.10  202.8  1226859
    2015-02-03 10:15:00+05:30  202.85  206.20  202.65  205.0  1502447