How to calculate the correlation coefficient on a rolling window of a vector using numba?

People were kind enough to explain : How to calculate the correlation coefficient on a rolling window of a vector using numpy? with this answer where I picked up:

f_PH_numpy is my approach, which uses the sliding_window_view and the vectorized function for row-wise calculation of the vector correlation coefficient

I'm now trying to solve the same problem using my GPU with numba. Unfortunately, adding

import numba as nb
from numba import jit, njit, prange, jit_module
@njit(fastmath=True, cache=True, parallel=True)

on top of the code is leading to errors I cannot solve (I'm not good enough), such as:

TypingError: Unknown attribute 'sliding_window_view' of type Module(<module 'numpy.lib.stride_tricks' from 'C:\\Users\\didie\\miniconda3\\envs\\spyder-cf\\lib\\site-packages\\numpy\\lib\\stride_tricks.py'>)

Any (other) idea on how to have a solution with a numba (or equivalent for the GPU) code?

EDIT: As requested, the code I run (data is a float series):

# v is a 1-D array that stores all correlation coef.
max_l = int(serie_o.shape[0] * 0.5)  # max_l is half the serie_o size
for l in range(3, max_l, 1):
   v = np.zeros(serie_o.shape[0]).astype('float32')
   y = np.arange(l)  # sequential sequence for the correlation
   ym = y - np.mean(y)
   swindow = np.lib.stride_tricks.sliding_window_view(serie_o.copy(), (l,))  # all sub-series to be correlated
   # https://stackoverflow.com/questions/75072435/how-to-create-a-rolling-correlation-using-only-numpy-from-a-1d-array#75072873
   xm = swindow - np.mean(swindow, axis=1, keepdims=True)
   xm[xm == 0] = np.nan  # to avoid any potential division by 0
   v[-(serie_o.shape[0] - l + 1):] =  np.roll(np.sum(xm * ym, axis=1) / np.sqrt(np.sum(xm**2, axis=1) * np.sum(ym**2)), 1)  # tous les coefficients de corrélation

EDIT: the code I try to develop (x is here for timing purposes):

    # v is a 1-D array that stores all correlation coef.
    @njit(fastmath=True, cache=True, parallel=True)
    def vecteur_correlation_46(s, max_l):
        for l in range(3, max_l, 1):
           v = np.zeros(s.shape[0]).astype('float32')
           y = np.arange(l)  # sequential sequence for the correlation
           ym = y - np.mean(y)
           swindow = np.lib.stride_tricks.sliding_window_view(s.copy(), (l,))  # all sub-series to be correlated
           xm = x - np.mean(x, axis=1, keepdims=True)
           xm[xm == 0] = np.nan  # to avoid any potential division by 0
           correl =  np.sum(xm * ym, axis=1) / np.sqrt(np.sum(xm**2, axis=1) * np.sum(ym**2))
           v[-(s.shape[0] - l + 1):] =  np.roll(correl, 1)

# start
max_l = int(serie_o.shape[0] * 0.5)  # max_l is half the serie_o size
x = 1000
for i in range(x):
    b = vecteur_correlation_46(data, max_l)

Solution

np.lib.stride_tricks.sliding_window_view is not supported by Numba, but this is fine. Indeed, using stride_tricks is not efficient here because using it would result in the creation of big temporary arrays that are slow to read/fill. The main memory is slow compared to the computing power of modern CPU cores. Instead, you should keep your initial loop-based code. This is fine with Numba and this is actually often faster to use loops, especially if you avoid allocations in it.

Besides, parallel=True may not be a good idea because it can only parallelize each Numpy call (it cannot automagically parallelize sequential loops). The thing is it only worth it if the array is pretty big as providing work to many threads and waiting them introduces a significant overhead.

Additionally, Numba does not use a GPU by default. It cannot run CPU-based codes on a GPU transparently. This is simply not possible to do (efficiently) since GPUs and CPUs operate very differently. If you just want to convert a CPU Numpy code to a GPU Numpy code without much effort, then CuPy is certainly a better choice. It turns out Cupy apparently supports np.lib.stride_tricks.sliding_window_view.