I have a question related to plotting the moving average by doing group by. I have taken the dataset from Kaggle https://www.kaggle.com/code/kp4920/s-p-500-stock-data-time-series-analysis/comments. I have extracted the few rows by applying the below condition.
new_df_A = new_df[(new_df.Name == 'A')]
new_df_A.sort_values(by=['Name', 'Date'])
And I tried to calculate the moving average for 30 days by implementing this code
for cols in new_df_A.columns:
if cols not in ['Name', 'Date',]:
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
And I got this warning error
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
new_df_A['ma_'+cols]=new_df_A.groupby('Name').rolling(30)[cols].mean().reset_index(drop=True)
/var/folders/6j/0bj57ss10ggbdk87dtdkbgyw0000gn/T/ipykernel_130/1482748670.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
And When I tried to plot the figure, it is blank. Can somebody help me with this?
Thank You
To get a moving average for time series data, the period of interest is specified differently: for 30 days, use '30D'. And since it is column-wise, we use loc
to specify the column. Since it is already a single issue, groupby
is not necessary. To create the graph, I used pandas visualization, which is the simplest way to do it.
df_A = new_df_A.copy()
df_A['Date'] = pd.to_datetime(df_A['Date'])
df_A.set_index('Date', inplace=True)
for cols in df_A.columns:
if cols not in ['Name', 'Date',]:
df_A['ma_'+cols] = df_A.loc[:,cols].rolling('30D').mean()
df_A.iloc[:,6:10].plot()
import seaborn as sns
sns.set(rc={'figure.figsize':(20,8)})
for cols in df_A.columns:
if cols not in ['Name', 'Date', 'Open', 'High', 'Close']:
sns.lineplot(x=df_A.index, y=df_A[cols])
#plt.show()