I have a dataframe that is over 2mm rows. It has the adjusted closing prices for all of the stocks in the S&P 500 over the past 20 years (so, a mixture of companies/tickers). The bottom section is a snapshot of my dataframe. I will be putting this data into a predictive model that uses various momentum columns as its features and it will try to predict if a stock is likely to have a 6+% monthly return. However, producing the appropriate "momentum columns" is proving to be difficult.
The "one_month_6_pct+_gain" column is my target & it is derived from the "21_day_performance" column. My momentum columns can't "see" that data.
The below code works to produce columns that use information from the current row (where I don't have to shift).
df['daily_performance'] = df.groupby('Ticker').adj_close.pct_change().fillna(0)
df['twenty_one_day_performance'] = df.groupby('Ticker').adj_close.pct_change(periods=21).fillna(0)
df.loc[df['twenty_one_day_performance'] >= 0.06, 'one_month_6_pct+_gain'] = "Yes"
df.loc[df['twenty_one_day_performance'] < 0.06, 'one_month_6_pct+_gain'] = "No"
However, when I try to create new columns that are based on adj_close from prior days (days that don't "see" the 21 days I mentioned before), I have various problems.
This is close, but, using "sum()" here isn't correct. If a stock is trading at $20 on day one and goes up 100% on day two it'll be at $40. However, if it goes down 100% on day three it'll be at $0, not $20. So, summing isn't correct here. But it's as close to working code as I've gotten.
df['preceeding_5_day_perf']=df.groupby('Ticker').daily_performance.rolling(26).sum().fillna(0).reset_index(0,drop=True)-df.groupby('Ticker').daily_performance.rolling(21).sum().fillna(0).reset_index(0,drop=True)
I think the below would work if pct_change() was supported after rolling(). But, I get "'RollingGroupby' object has no attribute 'pct_change'"
df['preceeding_5_day_perf']=df.groupby('Ticker').adj_close.rolling(26).pct_change().fillna(0).reset_index(0,drop=True)-df.groupby('Ticker').adj_close.rolling(21).pct_change().fillna(0).reset_index(0,drop=True)
Below throws an error. "TypeError: unsupported operand type(s) for -: 'RollingGroupby' and 'RollingGroupby'"
df['preceeding_5_day_perf']=df.groupby('Ticker').adj_close.rolling(26)-df.groupby('Ticker').adj_close.rolling(21)
Below is close, but, it gives the negated/opposite value of "twenty_one_day_performance" for the 5 days between 21 and 26. I'd like it to give 0.0 for those 5 days.
df['preceeding_5_day_perf'] = (df.groupby('Ticker').adj_close.pct_change(periods=26).fillna(0)) - (df.groupby('Ticker').adj_close.pct_change(periods=21).fillna(0))
Below is also close, but, since there are 500 different stocks in the df, when one stock ends and another begins, this is a problem. Below works until you consider the overlapping stocks/tickers problem. I can't figure out how to get groupby to work here.
preceeding_5_day_perf = (df['adj_close'].shift(26) - df['adj_close'].shift(21)) / df['adj_close'].shift(21)
df = df.assign(preceeding_5_day_perf=preceeding_5_day_perf)
A small sample of my dataframe:
Index Date High Low adj_close Ticker daily_performance 21_day_performance one_month_6_pct+_gain
8930 2020-10-15 35.380001 33.639999 34.250000 UAL -0.038192 -0.105744 No
8931 2000-01-03 18.625000 17.312500 10.834452 TROW 0.000000 0.000000 No
8932 2000-01-04 17.562500 17.000000 10.506723 TROW -0.030249 0.000000 No
8933 2000-01-05 17.468750 16.593750 10.487448 TROW -0.001835 0.000000 No
8934 2000-01-06 17.500000 16.687500 10.795900 TROW 0.029412 0.000000 No
8935 2000-01-07 17.750000 17.250000 10.718788 TROW -0.007143 0.000000 No
8936 2000-01-10 17.875000 17.437500 10.834452 TROW 0.010791 0.000000 No
8937 2000-01-11 17.781250 17.281250 10.795900 TROW -0.003558 0.000000 No
8938 2000-01-12 18.250000 17.593750 11.258587 TROW 0.042858 0.000000 No
8939 2000-01-13 19.843750 18.296875 11.875495 TROW 0.054794 0.000000 No
8940 2000-01-14 19.562500 19.187500 12.048991 TROW 0.014610 0.000000 No
8941 2000-01-18 19.437500 18.593750 11.586312 TROW -0.038400 0.000000 No
8942 2000-01-19 19.625000 18.812500 11.759820 TROW 0.014975 0.000000 No
8943 2000-01-20 19.312500 18.875000 11.759820 TROW 0.000000 0.000000 No
8944 2000-01-21 20.093750 19.187500 12.318892 TROW 0.047541 0.000000 No
8945 2000-01-24 20.937500 19.625000 12.183943 TROW -0.010955 0.000000 No
8946 2000-01-25 21.250000 19.250000 13.070749 TROW 0.072785 0.000000 No
8947 2000-01-26 21.312500 19.812500 12.955079 TROW -0.008850 0.000000 No
8948 2000-01-27 21.437500 19.593750 12.106827 TROW -0.065476 0.000000 No
8949 2000-01-28 19.687500 18.187500 11.258587 TROW -0.070063 0.000000 No
8950 2000-01-31 19.437500 17.937500 11.991161 TROW 0.065068 0.000000 No
8951 2000-02-01 19.656250 18.687500 12.068275 TROW 0.006431 0.000000 No
8952 2000-02-02 19.875000 19.000000 12.222500 TROW 0.012779 0.128114 Yes
8953 2000-02-03 20.062500 18.750000 11.856213 TROW -0.029968 0.128441 Yes
8954 2000-02-04 19.375000 18.406250 11.451367 TROW -0.034146 0.091912 Yes
8955 2000-02-07 19.406250 18.281250 11.316417 TROW -0.011785 0.048214 No
8956 2000-02-08 18.625000 18.250000 11.316417 TROW 0.000000 0.055755 No
8957 2000-02-09 18.750000 17.812500 11.085073 TROW -0.020443 0.023132 No
8958 2000-02-10 18.437500 17.750000 11.007962 TROW -0.006956 0.019643 No
8959 2000-02-11 18.000000 17.500000 10.824816 TROW -0.016638 -0.038528 No
8960 2000-02-14 17.718750 16.593750 10.391053 TROW -0.040071 -0.125000 No
ā
I ended up finding a solution to this problem by creating new dataframes. I'm sure there is a better solution.
#creating an array of tickers
unique_tickers = df.Ticker.unique()
#creating a dictionary to be used in a for loop
companydict = {elem : pd.DataFrame() for elem in unique_tickers}
for key in companydict.keys():
companydict[key] = df[:][df.Ticker == key]
#creating the 5 day performance column that preceeds the 21 day prediction period
preceeding_5_day_perf = (companydict[key]['adj_close'].shift(21) - companydict[key]['adj_close'].shift(26)) / companydict[key]['adj_close'].shift(26)
companydict[key] = companydict[key].assign(preceeding_5_day_perf=preceeding_5_day_perf).fillna(0)
#creating a new df for each ticker (eg. CSCO ticker will be in df_CSCO)
globals()['df_{}'.format(key)] = companydict[key]
Then, I just created a list of all of the dfs and used concat to combine all of the dataframes vertically.
df_final = pd.concat(df_list)
If pct_change() was supported after rolling() in Pandas, this would've been much easier.