I have some stock 5min data, like as:
Date Open High Low Close Volume
0 2024-11-19 09:35:00 11.75 11.79 11.55 11.78 32673600
1 2024-11-19 09:40:00 11.78 11.81 11.73 11.79 14802700
2 2024-11-19 09:45:00 11.79 11.84 11.79 11.82 13837400
3 2024-11-19 09:50:00 11.81 11.83 11.76 11.82 8534200
4 2024-11-19 09:55:00 11.82 11.87 11.80 11.87 8540500
5 2024-11-19 10:00:00 11.87 11.96 11.87 11.90 20659800
6 2024-11-19 10:05:00 11.89 11.90 11.82 11.82 11691000
7 2024-11-19 10:10:00 11.82 11.82 11.73 11.74 8762900
8 2024-11-19 10:15:00 11.74 11.74 11.71 11.73 6870500
9 2024-11-19 10:20:00 11.73 11.73 11.68 11.70 6244800
10 2024-11-19 10:25:00 11.70 11.70 11.66 11.69 5083000
11 2024-11-19 10:30:00 11.70 11.73 11.69 11.71 5342400
12 2024-11-19 10:35:00 11.72 11.74 11.71 11.73 3311800
13 2024-11-19 10:40:00 11.73 11.74 11.71 11.72 2331900
14 2024-11-19 10:45:00 11.72 11.72 11.70 11.72 3024100
15 2024-11-19 10:50:00 11.71 11.74 11.70 11.71 2774200
16 2024-11-19 10:55:00 11.70 11.72 11.70 11.71 1313000
17 2024-11-19 11:00:00 11.72 11.75 11.71 11.74 1737400
18 2024-11-19 11:05:00 11.75 11.75 11.73 11.75 1690600
19 2024-11-19 11:10:00 11.74 11.76 11.73 11.76 1751800
20 2024-11-19 11:15:00 11.76 11.76 11.72 11.73 2248700
21 2024-11-19 11:20:00 11.73 11.73 11.70 11.71 2464200
22 2024-11-19 11:25:00 11.71 11.71 11.69 11.70 1033600
23 2024-11-19 11:30:00 11.69 11.70 11.67 11.69 2063600
I use df.resample to convert them to 30m data, the code is:
df = df.set_index('Date')
df = df.resample('30T').agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
'Volume':'sum'}, closed='right', label = 'right').dropna()
But I got strange result like these:
Open High Low Close Volume
Date
2024-11-19 09:30:00 11.75 11.87 11.55 11.87 78388400
2024-11-19 10:00:00 11.87 11.96 11.66 11.69 59312000
2024-11-19 10:30:00 11.70 11.74 11.69 11.71 18097400
2024-11-19 11:00:00 11.72 11.76 11.69 11.70 10926300
2024-11-19 11:30:00 11.69 11.70 11.67 11.69 2063600
Here are correct 30m data export from my trading software:
Time Open High Low Close Volume
2024/11/19-10:00 11.75 11.96 11.55 11.9 99048200
2024/11/19-10:30 11.89 11.9 11.66 11.71 43994600
2024/11/19-11:00 11.72 11.75 11.7 11.74 14492400
2024/11/19-11:30 11.75 11.76 11.67 11.69 11252500
The data at 9:30 is irrelevant, mainly because the following data are not correct. but I did not find more parameters of df.sample. How can I correctly aggregate the data?
By default the reference in resample
is the start of the day. It looks like you want the start of the data. You should set origin='start'
instead of the default origin='start_day'
:
(df.resample('30min', origin='start')
.agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
'Volume':'sum'}, closed='right', label = 'right')
.dropna()
)
Output:
Open High Low Close Volume
Date
2024-11-19 09:35:00 11.75 11.96 11.55 11.90 99048200
2024-11-19 10:05:00 11.89 11.90 11.66 11.71 43994600
2024-11-19 10:35:00 11.72 11.75 11.70 11.74 14492400
2024-11-19 11:05:00 11.75 11.76 11.67 11.69 11252500
And presumably, you want to pass the label
parameter to resample
:
(df.resample('30min', origin='start', label='right')
.agg({'Open':'first', 'High':'max', 'Low':'min','Close':'last',
'Volume':'sum'})
.dropna()
)
Output:
Open High Low Close Volume
Date
2024-11-19 10:05:00 11.75 11.96 11.55 11.90 99048200
2024-11-19 10:35:00 11.89 11.90 11.66 11.71 43994600
2024-11-19 11:05:00 11.72 11.75 11.70 11.74 14492400
2024-11-19 11:35:00 11.75 11.76 11.67 11.69 11252500
Finally, if you want to round to 30 minutes:
(df.resample('30min', origin='start', label='right')
.agg({'Open':'first', 'High':'max', 'Low':'min',
'Close':'last', 'Volume':'sum'})
.dropna()
.pipe(lambda x: x.set_axis(x.index.floor('30min')))
)
Output:
Open High Low Close Volume
Date
2024-11-19 10:00:00 11.75 11.96 11.55 11.90 99048200
2024-11-19 10:30:00 11.89 11.90 11.66 11.71 43994600
2024-11-19 11:00:00 11.72 11.75 11.70 11.74 14492400
2024-11-19 11:30:00 11.75 11.76 11.67 11.69 11252500