numpymemorynumpy-ndarraystride

How strides help in traversing an array in numpy?


arr = np.arange(16).reshape((2, 2, 4))

arr.strides
(32, 16, 4)

So, I believe from my knowledge that in memory it would be something like the image below. The strides are marked along with the axis (on the arrows).

Initial stacking in memory block

And this is what I think I have after transposing one of the axes using the command:

arr.transpose((1, 0, 2))

After transposing i.e. switching 2 of the axes

I understand there is no changes in the memory block but I am not able to understand how exactly does the strides help in traversing the array in the memory block to produce the expected array. (does it traverse the elements in different axes in reverse order?)

[[[ 0  1  2  3]
[ 4  5  6  7]] 
[[ 8  9 10 11]
[12 13 14 15]]]

I tried going through the official numpy code in C to understand, but I was not able to understand the same.

It would be great if somebody could just provide the explanation in a simpler way.


Solution

  • Numpy always iterate through the axis from the biggest one to the smallest one (ie. decreasing order) unless explicitly ask for (eg. with the axis parameter). Thus, in your example, it first read the item of the view at the offset 0 in memory, then add the stride of the axis 2 (4 here) and read the next item and so on until the end of the axis is reached. It then add the stride of the axis 1 once and repeat the previous loop again, and so on with other axis.

    The internal Numpy C code behave like this:

    // Numpy array are not strongly typed internally.
    // The raw memory buffer is always contiguous.
    char* data = view.rawData;
    
    const size_t iStride = view.stride[0];
    const size_t jStride = view.stride[1];
    const size_t kStride = view.stride[2];
    
    for(int i=0 ; i<view.shape[0] ; ++i) {
        for(int j=0 ; j<view.shape[1] ; ++j) {
            for(int k=0 ; k<view.shape[2] ; ++k) {
                // Compute the offset from the view strides
                const size_t offset = iStride * i + jStride * j + kStride * k;
    
                // Extract an item at the memory offset
                Type item = (Type*)(data + offset);
    
                // Do something with item here (eg. print it)
            }
        }
    }
    

    When you apply a transposition, Numpy change the stride so that iStride and jStride are swapped. It also update the shape (view.shape[0] and view.shape[1] are swapped too). The code will execute as if the two loops are swapped except the memory accesses will be less efficient because they are less contiguous. Here is an example:

    arr = np.arange(16).reshape((2, 2, 4))
    arr.strides   # (32, 16, 4)
    view = arr.transpose((1, 0, 2))
    view.strides  # (16, 32, 4)  <--  note that 16 and 32 have been swapped
    

    Note that the strides are in bytes (and not a number of items).