I'm resampling daily OHLC data fetched from Yahoo Finance to weekly with the following code:
import pandas as pd
import yfinance as yf
ticker = yf.Ticker('AAPL')
daily_data = ticker.history(start='2024-04-01', end='2024-05-03', interval='1d')
weekly_data = daily_data.resample('W-Sat').agg({
'Open': 'first',
'High': 'max',
'Low': 'min',
'Close': 'last',
'Volume': 'sum'
}).shift(-1).dropna()
The code works, the weekly data matches Yahoo Finance weekly data that would be fetched by:
ticker = yf.Ticker('AAPL')
yahoo_weekly_data = ticker.history(
start='2024-04-01', end='2024-05-03', interval='1w')
However I do not understand why I have to use shift(-1)
to get the correct data.
Any insight would be much appreciated.
I don't think you should shift
. This will get you to lose one row of data.
If you compare closely the outputs (without shift
):
weekly_data = daily_data.resample('W-Sat').agg({
'Open': 'first',
'High': 'max',
'Low': 'min',
'Close': 'last',
'Volume': 'sum'
})
Open High Low Close Volume
Date
2024-04-06 00:00:00-04:00 171.190002 171.919998 168.229996 169.580002 239021300
2024-04-13 00:00:00-04:00 169.029999 178.360001 167.110001 176.550003 322249600
2024-04-20 00:00:00-04:00 175.360001 176.630005 164.080002 165.000000 309039200
2024-04-27 00:00:00-04:00 165.520004 171.339996 164.770004 169.300003 241302700
2024-05-04 00:00:00-04:00 173.369995 176.029999 169.110001 173.029999 277367900
ticker = yf.Ticker('AAPL')
yahoo_weekly_data = ticker.history(
start='2024-04-01', end='2024-05-03', interval='1wk')
Open High Low Close Volume Dividends Stock Splits
Date
2024-04-01 00:00:00-04:00 171.190002 171.919998 168.229996 169.580002 239021300 0.0 0.0
2024-04-08 00:00:00-04:00 169.029999 178.360001 167.110001 176.550003 322249600 0.0 0.0
2024-04-15 00:00:00-04:00 175.360001 176.630005 164.080002 165.000000 309039200 0.0 0.0
2024-04-22 00:00:00-04:00 165.520004 171.339996 164.770004 169.300003 241302700 0.0 0.0
2024-04-29 00:00:00-04:00 173.369995 176.029999 169.110001 173.029999 277367900 0.0 0.0
The values are identical, but the labels are a bit different.
For resample
, this is by default the right boundary, for yfinance, it seems to be the Monday of the week.
To get the same output you could use:
weekly_data = daily_data.resample('W-Mon', closed='left', label='left').agg({
'Open': 'first',
'High': 'max',
'Low': 'min',
'Close': 'last',
'Volume': 'sum'
})
Open High Low Close Volume
Date
2024-04-01 00:00:00-04:00 171.190002 171.919998 168.229996 169.580002 239021300
2024-04-08 00:00:00-04:00 169.029999 178.360001 167.110001 176.550003 322249600
2024-04-15 00:00:00-04:00 175.360001 176.630005 164.080002 165.000000 309039200
2024-04-22 00:00:00-04:00 165.520004 171.339996 164.770004 169.300003 241302700
2024-04-29 00:00:00-04:00 173.369995 176.029999 169.110001 173.029999 277367900