I have a time series OHLC 1 min date.
date,open,high,low,close,volume
2023-04-20 09:15:00+05:30,262.0,267.85,262.0,262.5,7300
2023-04-20 09:16:00+05:30,262.5,264.3,261.8,262.6,6300
2023-04-20 09:17:00+05:30,262.6,266.95,262.0,265.75,3600
2023-04-20 09:18:00+05:30,265.75,274.25,265.75,274.25,2200
2023-04-20 09:19:00+05:30,273.95,275.8,271.95,274.9,5750
2023-04-20 09:20:00+05:30,275.0,284.4,275.0,283.55,6350
2023-04-20 09:21:00+05:30,285.0,286.95,282.6,283.55,3650
2023-04-20 09:22:00+05:30,283.0,283.45,280.1,281.2,3150
2023-04-20 09:23:00+05:30,281.2,286.55,281.2,286.55,1800
2023-04-20 09:24:00+05:30,286.6,288.5,286.5,287.2,1600
2023-04-20 09:25:00+05:30,287.2,288.75,286.4,286.7,1450
2023-04-20 09:26:00+05:30,287.0,288.0,287.0,287.45,500
2023-04-20 09:27:00+05:30,287.45,287.45,285.0,285.0,900
2023-04-20 09:28:00+05:30,284.9,286.85,283.55,286.4,2600
2023-04-20 09:29:00+05:30,286.4,286.4,285.0,286.4,1600
2023-04-20 09:30:00+05:30,287.25,291.7,286.9,290.45,4500
2023-04-20 09:31:00+05:30,290.55,291.9,289.9,290.9,1750
2023-04-20 09:32:00+05:30,289.25,289.3,286.7,286.7,1550
2023-04-20 09:33:00+05:30,286.7,290.15,286.7,288.0,250
2023-04-20 09:34:00+05:30,288.0,291.0,288.0,289.85,850
2023-04-20 09:35:00+05:30,289.85,289.85,289.4,289.85,1500
2023-04-20 09:36:00+05:30,289.85,290.0,288.1,288.75,4050
2023-04-20 09:37:00+05:30,288.75,288.75,287.45,288.05,1150
2023-04-20 09:38:00+05:30,288.05,288.45,285.85,287.45,1650
2023-04-20 09:39:00+05:30,287.45,293.25,287.45,291.15,2250
2023-04-20 09:40:00+05:30,291.1,291.15,290.7,291.15,550
2023-04-20 09:41:00+05:30,291.15,292.0,291.0,291.95,550
2023-04-20 09:42:00+05:30,291.95,291.95,288.35,288.55,200
2023-04-20 09:43:00+05:30,288.55,290.0,288.0,288.55,1350
2023-04-20 09:44:00+05:30,288.55,290.0,288.55,289.0,100
2023-04-20 09:45:00+05:30,289.0,289.0,286.05,286.05,400
2023-04-20 09:46:00+05:30,286.05,286.05,282.25,283.3,2650
2023-04-20 09:47:00+05:30,283.3,283.5,283.3,283.3,450
2023-04-20 09:48:00+05:30,283.3,283.3,279.85,279.85,750
I need to resample the same to 4 min data starting from 9:15.
I wrote the following code.
import pandas as pd
df = pd.read_csv('re_s.csv',parse_dates=True,index_col=0)
ohlc_dict = {'open':'first','high':'max','low':'min','close': 'last','volume':'sum'}
new_df=df.resample('4T').apply(ohlc_dict).ffill().between_time('09:15','15:30')
>>> print(new_df)
open high low close volume
date
2023-04-20 09:16:00+05:30 262.50 275.80 261.80 274.90 17850
2023-04-20 09:20:00+05:30 275.00 286.95 275.00 286.55 14950
2023-04-20 09:24:00+05:30 286.60 288.75 285.00 285.00 4450
2023-04-20 09:28:00+05:30 284.90 291.90 283.55 290.90 10450
2023-04-20 09:32:00+05:30 289.25 291.00 286.70 289.85 4150
... ... ... ... ... ...
2023-04-24 13:44:00+05:30 282.35 285.95 282.00 283.45 8250
2023-04-24 13:48:00+05:30 283.45 284.50 274.05 277.35 9900
2023-04-24 13:52:00+05:30 277.10 280.70 276.00 276.15 12150
2023-04-24 13:56:00+05:30 276.15 279.00 274.90 275.10 5600
2023-04-24 14:00:00+05:30 275.10 282.60 273.00 280.50 4350
[448 rows x 5 columns]
If we see , the same starts from 09:16 , so its skipping by 1 min.
I am trying to get the data starting from 9:15.
Example: The values should be as follows (Taken from the portal) 09:15 09:19 09:23 09:27 09:31
Thanks Saurabh
Tried to use resample method and between_time method.
Code
ohlc_dict = {'open':'first','high':'max','low':'min','close': 'last','volume':'sum'}
out = df.resample('4T', origin='start').agg(ohlc_dict)
out
open high low close volume
date
2023-04-20 09:15:00+05:30 262.00 274.25 261.80 274.25 19400
2023-04-20 09:19:00+05:30 273.95 286.95 271.95 281.20 18900
2023-04-20 09:23:00+05:30 281.20 288.75 281.20 287.45 5350
2023-04-20 09:27:00+05:30 287.45 291.70 283.55 290.45 9600
2023-04-20 09:31:00+05:30 290.55 291.90 286.70 289.85 4400
2023-04-20 09:35:00+05:30 289.85 290.00 285.85 287.45 8350
2023-04-20 09:39:00+05:30 287.45 293.25 287.45 288.55 3550
2023-04-20 09:43:00+05:30 288.55 290.00 282.25 283.30 4500
2023-04-20 09:47:00+05:30 283.30 283.50 279.85 279.85 1200