Say I have a pd.DataFrame and want to calculate the rolling Standard deviation. In pandas I can use rolling(window=x).std()
, but it gives me the SD by column. I however want the standard deviation over all columns in a given row.
As an example consider the pd dataframe
df = pd.DataFrame({'col1': [1,2,3,4,5,6], 'col2': [-1,-2,-3,-4,-5,-6], 'col3': [1,2,3,4,5,6]})
df
col1 col2 col3
0 1 -1 1
1 2 -2 2
2 3 -3 3
3 4 -4 4
4 5 -5 5
5 6 -6 6
When calculating for a window size of 2, for instance, I would like to have the standard deviation in row 2 as the sum of the two rows divided by 6 (or 6-1), doesn't matter), so: np.std([2,-2,2,1,-1,1]).
I tried to calculate it on a melted dataframe, but I didn't get the result as expected:
df.reset_index().melt(id_vars='index').set_index('index')['value'].rolling(2).std()
Does anyone have an idea how to do it? I appreciate your feedback.
n = len(df.columns)
window = 2
out = df.stack().rolling(window * n).std().xs(df.columns[-1], level=-1)
out:
0 NaN
1 1.643168
2 2.639444
3 3.656045
4 4.679744
5 5.706721
dtype: float64
out
is only std.
you said divide by 6 or 5, so i dont divide anything for you to choose.