pythonarraysnumpygeometric-mean

How to efficiently compute running geometric mean of a Numpy array?


Rolling arithmetic mean can simply be computed with Numpy's 'convolve' function, but how could I efficiently create an array of running geometric means of some array a and a given window size?

To give an example, for an array: [0.5 , 2.0, 4.0]

and window size 2, (with window size decreasing at the edges)

I want to quickly generate the array: [0.5, 1.0, 2.83, 4.0]


Solution

  • import numpy as np
    from numpy.lib.stride_tricks import sliding_window_view
    from scipy.stats import gmean
    
    window = 2
    a = [0.5, 2.0, 4.0]
    
    padded = np.pad(a, window - 1, mode="constant", constant_values=np.nan)
    windowed = sliding_window_view(padded, window)
    result = gmean(windowed, axis=1, nan_policy="omit")
    print(result)
    # >>> [0.5        1.         2.82842712 4.        ]
    

    If you don't need the decreasing window sizes at the boundaries (this is referring to your comment), you can skip the padding step and choose the nan_policy that suits you best.

    Update: Realizing that gmean() provides a weights argument, we can replace the nan padding with an equivalent weights array (1 for the actual values, 0 for the padded values), and then are free again to choose the nan_policy of our liking, even in the case of decreasing window sizes at the boundaries. This means, we could write:

    padded = np.pad(a, window - 1, mode="constant", constant_values=1.)
    windowed = sliding_window_view(padded, window)
    weights = sliding_window_view(
        np.pad(np.ones_like(a), window - 1, mode="constant", constant_values=0.),
        window)
    result = gmean(windowed, axis=1, weights=weights)
    

    ā€“ which will produce exactly the same result as above. My gut feeling tells me that the original version is faster, but I did not do any speed tests.