pythonpython-3.xpandasyfinance

Pandas daily to weekly OHLC data resampling


I'm resampling daily OHLC data fetched from Yahoo Finance to weekly with the following code:

import pandas as pd
import yfinance as yf

ticker = yf.Ticker('AAPL')
daily_data = ticker.history(start='2024-04-01', end='2024-05-03', interval='1d')

weekly_data = daily_data.resample('W-Sat').agg({
    'Open': 'first',
    'High': 'max',
    'Low': 'min',
    'Close': 'last',
    'Volume': 'sum'
}).shift(-1).dropna()

The code works, the weekly data matches Yahoo Finance weekly data that would be fetched by:

ticker = yf.Ticker('AAPL')
yahoo_weekly_data = ticker.history(
    start='2024-04-01', end='2024-05-03', interval='1w')

However I do not understand why I have to use shift(-1) to get the correct data.
Any insight would be much appreciated.


Solution

  • I don't think you should shift. This will get you to lose one row of data.

    If you compare closely the outputs (without shift):

    weekly_data = daily_data.resample('W-Sat').agg({
        'Open': 'first',
        'High': 'max',
        'Low': 'min',
        'Close': 'last',
        'Volume': 'sum'
    })
    
                                     Open        High         Low       Close     Volume
    Date                                                                                
    2024-04-06 00:00:00-04:00  171.190002  171.919998  168.229996  169.580002  239021300
    2024-04-13 00:00:00-04:00  169.029999  178.360001  167.110001  176.550003  322249600
    2024-04-20 00:00:00-04:00  175.360001  176.630005  164.080002  165.000000  309039200
    2024-04-27 00:00:00-04:00  165.520004  171.339996  164.770004  169.300003  241302700
    2024-05-04 00:00:00-04:00  173.369995  176.029999  169.110001  173.029999  277367900
    
    ticker = yf.Ticker('AAPL')
    yahoo_weekly_data = ticker.history(
        start='2024-04-01', end='2024-05-03', interval='1wk')
    
                                     Open        High         Low       Close     Volume  Dividends  Stock Splits
    Date                                                                                                         
    2024-04-01 00:00:00-04:00  171.190002  171.919998  168.229996  169.580002  239021300        0.0           0.0
    2024-04-08 00:00:00-04:00  169.029999  178.360001  167.110001  176.550003  322249600        0.0           0.0
    2024-04-15 00:00:00-04:00  175.360001  176.630005  164.080002  165.000000  309039200        0.0           0.0
    2024-04-22 00:00:00-04:00  165.520004  171.339996  164.770004  169.300003  241302700        0.0           0.0
    2024-04-29 00:00:00-04:00  173.369995  176.029999  169.110001  173.029999  277367900        0.0           0.0
    

    The values are identical, but the labels are a bit different.

    For resample, this is by default the right boundary, for yfinance, it seems to be the Monday of the week.

    To get the same output you could use:

    weekly_data = daily_data.resample('W-Mon', closed='left', label='left').agg({
        'Open': 'first',
        'High': 'max',
        'Low': 'min',
        'Close': 'last',
        'Volume': 'sum'
    })
    
                                     Open        High         Low       Close     Volume
    Date                                                                                
    2024-04-01 00:00:00-04:00  171.190002  171.919998  168.229996  169.580002  239021300
    2024-04-08 00:00:00-04:00  169.029999  178.360001  167.110001  176.550003  322249600
    2024-04-15 00:00:00-04:00  175.360001  176.630005  164.080002  165.000000  309039200
    2024-04-22 00:00:00-04:00  165.520004  171.339996  164.770004  169.300003  241302700
    2024-04-29 00:00:00-04:00  173.369995  176.029999  169.110001  173.029999  277367900