pythonpandaspandas-groupbyapplyrolling-computation

Python: Calculate 5-year rolling CAGR of values that need to be grouped from a dataframe


I have a dataframe with historical market caps for which I need to compute their 5-year compound annual growth rates (CAGRs). However, the dataframe has hundreds of companies with 20 years of values each, so I need to be able to isolate each company's data to compute their CAGRs. How do I go about doing this?

The function to calculate a CAGR is: (end/start)^(1/# years)-1. I have never used .groupby() or .apply(), so I don't know how to implement the CAGR equation for rolling values.

Here is a screenshot of part of the dataframe so you have a visual representation of what I am trying to use: Screeshot of dataframe.

Any guidance would be greatly appreciated!


Solution

  • Assuming there is 1 value per company per year. You can reduce the date to year. This is a lot simpler. No need for groupby or apply.

    Say your dataframe is name df. First, reduce date to year:

    df['year'] = df['Date'].dt.year
    

    Second, add year+5

    df['year+5'] = df['year'] + 5
    

    Third, merge the 'df' with itself:

    df_new = pandas.merge(df, df, how='inner', left_on=['Instrument', 'year'], right_on=['Instrument','year+5'], suffixes=['_start', '_end'])
    

    Finally, calculate rolling CAGR

    df_new['CAGR'] = (df_new['Company Market Cap_end']/df_new['Company Market Cap_start'])**(0.2)-1