pythonpandasdataframepython-datetimepython-dateutil

Calculating rate of return for multiple time frames (annualized, quarterly) with daily time series data (S&P 500 (SPX index) daily prices)


I have a CSV file with some 30 years worth of daily close prices for the S&P 500 (SPX) stock market index, and I read it as Dataframe Series with Dates set as Index.

Dataframe:

Date Open High Low Close
2023-01-13 3960.60 4003.95 3947.67 3999.09
2023-01-12 3977.57 3997.76 3937.56 3983.17
2023-01-11 3932.35 3970.07 3928.54 3969.61
2023-01-10 3888.57 3919.83 3877.29 3919.25
2023-01-09 3910.82 3950.57 3890.42 3892.09
1990-01-08 353.79 354.24 350.54 353.79
1990-01-05 352.20 355.67 351.35 352.20
1990-01-04 355.67 358.76 352.89 355.67
1990-01-03 358.76 360.59 357.89 358.76
1990-01-02 359.69 359.69 351.98 359.69

It effectively has a date (as index) column, and four columns (open, high, low, close) of daily prices. I am using close prices.

I would like a flexible function to calculate annual returns from the chosen start date to the end date using the formula:

(end_price / beginning_price - 1) * 100

So, the annual return for 2022 would be:

(SPX_Close_price_at_31_December_2022 - SPX_Close_price_at_31_December_2021 - 1)*100

It would be ideal if the same function could handle monthly or quarterly date inputs. Then, I would like these periodic returns (%) to be added to the dataframe in a separate column, and/or a new dataframe, and match the start and end dates across rows, so I can produce consecutive annual returns on a Matplotlib line chart. And I would like to do this for the whole time series of 30 years.

This is the what I would like for the final dataframe to look like (return numbers below are examples only):

Date Annual Return (%)
m/d/2022 -18
m/d/2021 20
m/d/2020 15
m/d/2019 18

I am a beginner with Python am and still struggling working with date and datetime formats and matching those dates to data in columns across selected rows.

Below is what I got to so far, but it doesn't work properly. I will try the dateutil library, but I think that concepts of building out efficient functions is still something I need to work on. This is my first question on Stack Overflow, so thanks for having me :)

def spx_return(df, sdate, edate):

    delta = dt.timedelta(days=365)
    while (sdate <= edate):

        df2 = df['RoR'] = (df['Close'] / df['Close'].shift(-365) - 1) * 100

        sdate += delta
        #print(sdate, end="\n")

    return df2

Solution

  • To calculate annual and quarterly rates in a generic way as well, I came up with a function that takes as arguments the start date, end date, and a pattern that distinguishes between years and quarters as the type of frequency. For the data frames extracted by start and end date, we use pd.Grouper() to extract the target data rows. For the result of that extraction, we will incorporate your formula in the next line. Also, when determining the rate from the start date, we need to go back further in time, so we subtract '366 days' or '90 days' for the frequency keyword. I have not verified that this value leads to the correct result in all cases. This is due to market holidays such as the year-end and New Year holidays. Setting a larger number of days may solve this problem.

    import pandas as pd
    import yfinance as yf
    
    df = yf.download("^GSPC", start="2016-01-01", end="2022-01-01")
    df.index = pd.to_datetime(df.index)
    df.index = df.index.tz_localize(None)
    
    def rating(data, startdate, enddate, freq):
        offset = '366 days' if freq == 'Y' else '90 days'
        #dff = df.loc[(df.index >= startdate) & (df.index <= enddate)]
        dff = df.loc[(df.index >= pd.Timestamp(startdate) - pd.Timedelta(offset)) & (df.index <= pd.Timestamp(enddate))]
        dfy = dff.groupby(pd.Grouper(level='Date', freq=freq)).tail(1)
        ratio = (dfy['Close'] / dfy['Close'].shift()-1)*100
        return ratio
        
    period_rating = rating(df, '2017-01-01', '2019-12-31', freq='Y')
    print(period_rating)
    
    Date
    2016-12-30          NaN
    2017-12-29    19.419966
    2018-12-31    -6.237260
    2019-12-31    28.878070
    Name: Close, dtype: float64
    
    period_rating = rating(df, '2017-01-01', '2019-12-31', freq='Q')
    print(period_rating)
    
    Date
    2016-12-30          NaN
    2017-03-31     5.533689
    2017-06-30     2.568647
    2017-09-29     3.959305
    2017-12-29     6.122586
    2018-03-29    -1.224561
    2018-06-29     2.934639
    2018-09-28     7.195851
    2018-12-31   -13.971609
    2019-03-29    13.066190
    2019-06-28     3.787754
    2019-09-30     1.189083
    2019-12-31     8.534170
    Name: Close, dtype: float64