pythonpandasrolling-computationstandard-deviation

Pandas - calculate rolling standard deviation over all columns


Say I have a pd.DataFrame and want to calculate the rolling Standard deviation. In pandas I can use rolling(window=x).std(), but it gives me the SD by column. I however want the standard deviation over all columns in a given row.

As an example consider the pd dataframe

df = pd.DataFrame({'col1': [1,2,3,4,5,6], 'col2': [-1,-2,-3,-4,-5,-6], 'col3': [1,2,3,4,5,6]})
df
   col1  col2  col3
0     1    -1     1
1     2    -2     2
2     3    -3     3
3     4    -4     4
4     5    -5     5
5     6    -6     6

When calculating for a window size of 2, for instance, I would like to have the standard deviation in row 2 as the sum of the two rows divided by 6 (or 6-1), doesn't matter), so: np.std([2,-2,2,1,-1,1]).

I tried to calculate it on a melted dataframe, but I didn't get the result as expected:

df.reset_index().melt(id_vars='index').set_index('index')['value'].rolling(2).std()

Does anyone have an idea how to do it? I appreciate your feedback.


Solution

  • n = len(df.columns)
    window = 2
    out = df.stack().rolling(window * n).std().xs(df.columns[-1], level=-1)
    

    out:

    0         NaN
    1    1.643168
    2    2.639444
    3    3.656045
    4    4.679744
    5    5.706721
    dtype: float64
    

    out is only std.

    you said divide by 6 or 5, so i dont divide anything for you to choose.