pythonpandaspandas-groupbyfinancepandas-rolling

Workaround for pct_change not working with .rolling() in Pandas?


I have a dataframe that is over 2mm rows. It has the adjusted closing prices for all of the stocks in the S&P 500 over the past 20 years (so, a mixture of companies/tickers). The bottom section is a snapshot of my dataframe. I will be putting this data into a predictive model that uses various momentum columns as its features and it will try to predict if a stock is likely to have a 6+% monthly return. However, producing the appropriate "momentum columns" is proving to be difficult.

The "one_month_6_pct+_gain" column is my target & it is derived from the "21_day_performance" column. My momentum columns can't "see" that data.

The below code works to produce columns that use information from the current row (where I don't have to shift).

    df['daily_performance'] = df.groupby('Ticker').adj_close.pct_change().fillna(0)
    df['twenty_one_day_performance'] = df.groupby('Ticker').adj_close.pct_change(periods=21).fillna(0)
    df.loc[df['twenty_one_day_performance'] >= 0.06, 'one_month_6_pct+_gain'] = "Yes"
    df.loc[df['twenty_one_day_performance'] < 0.06, 'one_month_6_pct+_gain'] = "No"

However, when I try to create new columns that are based on adj_close from prior days (days that don't "see" the 21 days I mentioned before), I have various problems.

This is close, but, using "sum()" here isn't correct. If a stock is trading at $20 on day one and goes up 100% on day two it'll be at $40. However, if it goes down 100% on day three it'll be at $0, not $20. So, summing isn't correct here. But it's as close to working code as I've gotten.

    df['preceeding_5_day_perf']=df.groupby('Ticker').daily_performance.rolling(26).sum().fillna(0).reset_index(0,drop=True)-df.groupby('Ticker').daily_performance.rolling(21).sum().fillna(0).reset_index(0,drop=True)

I think the below would work if pct_change() was supported after rolling(). But, I get "'RollingGroupby' object has no attribute 'pct_change'"

    df['preceeding_5_day_perf']=df.groupby('Ticker').adj_close.rolling(26).pct_change().fillna(0).reset_index(0,drop=True)-df.groupby('Ticker').adj_close.rolling(21).pct_change().fillna(0).reset_index(0,drop=True)

Below throws an error. "TypeError: unsupported operand type(s) for -: 'RollingGroupby' and 'RollingGroupby'"

    df['preceeding_5_day_perf']=df.groupby('Ticker').adj_close.rolling(26)-df.groupby('Ticker').adj_close.rolling(21)

Below is close, but, it gives the negated/opposite value of "twenty_one_day_performance" for the 5 days between 21 and 26. I'd like it to give 0.0 for those 5 days.

    df['preceeding_5_day_perf'] = (df.groupby('Ticker').adj_close.pct_change(periods=26).fillna(0)) - (df.groupby('Ticker').adj_close.pct_change(periods=21).fillna(0))

Below is also close, but, since there are 500 different stocks in the df, when one stock ends and another begins, this is a problem. Below works until you consider the overlapping stocks/tickers problem. I can't figure out how to get groupby to work here.

    preceeding_5_day_perf = (df['adj_close'].shift(26) - df['adj_close'].shift(21)) / df['adj_close'].shift(21)
    df = df.assign(preceeding_5_day_perf=preceeding_5_day_perf)

A small sample of my dataframe:

Index     Date      High        Low         adj_close   Ticker daily_performance 21_day_performance one_month_6_pct+_gain
8930    2020-10-15  35.380001   33.639999   34.250000   UAL -0.038192   -0.105744                               No
8931    2000-01-03  18.625000   17.312500   10.834452   TROW    0.000000    0.000000                             No
8932    2000-01-04  17.562500   17.000000   10.506723   TROW    -0.030249   0.000000                             No
8933    2000-01-05  17.468750   16.593750   10.487448   TROW    -0.001835   0.000000                             No
8934    2000-01-06  17.500000   16.687500   10.795900   TROW    0.029412    0.000000                             No
8935    2000-01-07  17.750000   17.250000   10.718788   TROW    -0.007143   0.000000                             No
8936    2000-01-10  17.875000   17.437500   10.834452   TROW    0.010791    0.000000                             No
8937    2000-01-11  17.781250   17.281250   10.795900   TROW    -0.003558   0.000000                             No
8938    2000-01-12  18.250000   17.593750   11.258587   TROW    0.042858    0.000000                             No
8939    2000-01-13  19.843750   18.296875   11.875495   TROW    0.054794    0.000000                             No
8940    2000-01-14  19.562500   19.187500   12.048991   TROW    0.014610    0.000000                             No
8941    2000-01-18  19.437500   18.593750   11.586312   TROW    -0.038400   0.000000                             No
8942    2000-01-19  19.625000   18.812500   11.759820   TROW    0.014975    0.000000                             No
8943    2000-01-20  19.312500   18.875000   11.759820   TROW    0.000000    0.000000                             No
8944    2000-01-21  20.093750   19.187500   12.318892   TROW    0.047541    0.000000                             No
8945    2000-01-24  20.937500   19.625000   12.183943   TROW    -0.010955   0.000000                             No
8946    2000-01-25  21.250000   19.250000   13.070749   TROW    0.072785    0.000000                             No
8947    2000-01-26  21.312500   19.812500   12.955079   TROW    -0.008850   0.000000                             No
8948    2000-01-27  21.437500   19.593750   12.106827   TROW    -0.065476   0.000000                             No
8949    2000-01-28  19.687500   18.187500   11.258587   TROW    -0.070063   0.000000                             No
8950    2000-01-31  19.437500   17.937500   11.991161   TROW    0.065068    0.000000                             No
8951    2000-02-01  19.656250   18.687500   12.068275   TROW    0.006431    0.000000                             No
8952    2000-02-02  19.875000   19.000000   12.222500   TROW    0.012779    0.128114                             Yes
8953    2000-02-03  20.062500   18.750000   11.856213   TROW    -0.029968   0.128441                             Yes
8954    2000-02-04  19.375000   18.406250   11.451367   TROW    -0.034146   0.091912                             Yes
8955    2000-02-07  19.406250   18.281250   11.316417   TROW    -0.011785   0.048214                             No
8956    2000-02-08  18.625000   18.250000   11.316417   TROW    0.000000    0.055755                             No
8957    2000-02-09  18.750000   17.812500   11.085073   TROW    -0.020443   0.023132                             No
8958    2000-02-10  18.437500   17.750000   11.007962   TROW    -0.006956   0.019643                             No
8959    2000-02-11  18.000000   17.500000   10.824816   TROW    -0.016638   -0.038528                            No
8960    2000-02-14  17.718750   16.593750   10.391053   TROW    -0.040071   -0.125000                            No

ā€‹


Solution

  • I ended up finding a solution to this problem by creating new dataframes. I'm sure there is a better solution.

        #creating an array of tickers
        unique_tickers = df.Ticker.unique()
        #creating a dictionary to be used in a for loop
        companydict = {elem : pd.DataFrame() for elem in unique_tickers}
        for key in companydict.keys():
            companydict[key] = df[:][df.Ticker == key]
            #creating the 5 day performance column that preceeds the 21 day prediction period
            preceeding_5_day_perf = (companydict[key]['adj_close'].shift(21) - companydict[key]['adj_close'].shift(26)) / companydict[key]['adj_close'].shift(26)
            companydict[key] = companydict[key].assign(preceeding_5_day_perf=preceeding_5_day_perf).fillna(0)
        #creating a new df for each ticker (eg. CSCO ticker will be in df_CSCO)
        globals()['df_{}'.format(key)] = companydict[key]
    

    Then, I just created a list of all of the dfs and used concat to combine all of the dataframes vertically.

        df_final = pd.concat(df_list)
    

    If pct_change() was supported after rolling() in Pandas, this would've been much easier.