[SOLVED] NumPy arrays from Numba-accelerated QR decomposition are not contiguous

NumPy arrays from Numba-accelerated QR decomposition are not contiguous

I encounter a strange warning when performing matrix multiplication after QR decomposition in a Numba-accelerated function. For example:

# Python 3.10

import numpy as np
from numba import jit

@jit
def qr_check(x):
    q,r = np.linalg.qr(x)
    return q @ r

x = np.random.rand(3,3)
qr_check(x)

Running the above code, I get the following NumbaPerformanceWarning:

'@' is faster on contiguous arrays, called on (array(float64, 2d, A), array(float64, 2d, F))

I'm not sure what's going wrong here. I know F is for Fortran, so array r is Fortran-contiguous, but why isn't array q as well?

Solution

It is about the details of how QR decomposition is implemented in numba.

As you noted F - stands for Fortran-contiguous (column-major).

A stands for strided memory layout.

Numba does not call numpy.linalg.qr directly. Let's take a look into source code of numba:

https://github.com/numba/numba/blob/251061051ea13c8618c5fbd5e6b3f90a3315fec9/numba/np/linalg.py#L1418

@overload(np.linalg.qr)
def qr_impl(a):
    ...

As you can see numba overloads the function qr. Inside this function numba calls lapack function for QR decomposition which is implemented in FORTRAN so the result is Fortran-contiguous. But additionally q is sliced:

q[:, :minmn]

https://github.com/numba/numba/blob/251061051ea13c8618c5fbd5e6b3f90a3315fec9/numba/np/linalg.py#L1490

So the final layouts are:

A (strided) for Q

F (fortran) for R

You will get the same warning in a similar case with a matrix product:

@jit
def qr_check(x):
    q = np.zeros((100, 64))
    r = np.zeros((64, 200))
    return q @ r[:1000, :1000]