arr = np.arange(16).reshape((2, 2, 4))
arr.strides
(32, 16, 4)
So, I believe from my knowledge that in memory it would be something like the image below. The strides are marked along with the axis (on the arrows).
And this is what I think I have after transposing one of the axes using the command:
arr.transpose((1, 0, 2))
I understand there is no changes in the memory block but I am not able to understand how exactly does the strides help in traversing the array in the memory block to produce the expected array. (does it traverse the elements in different axes in reverse order?)
[[[ 0 1 2 3]
[ 4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]]
I tried going through the official numpy code in C to understand, but I was not able to understand the same.
It would be great if somebody could just provide the explanation in a simpler way.
Numpy always iterate through the axis from the biggest one to the smallest one (ie. decreasing order) unless explicitly ask for (eg. with the axis
parameter). Thus, in your example, it first read the item of the view at the offset 0 in memory, then add the stride of the axis 2 (4 here) and read the next item and so on until the end of the axis is reached. It then add the stride of the axis 1 once and repeat the previous loop again, and so on with other axis.
The internal Numpy C code behave like this:
// Numpy array are not strongly typed internally.
// The raw memory buffer is always contiguous.
char* data = view.rawData;
const size_t iStride = view.stride[0];
const size_t jStride = view.stride[1];
const size_t kStride = view.stride[2];
for(int i=0 ; i<view.shape[0] ; ++i) {
for(int j=0 ; j<view.shape[1] ; ++j) {
for(int k=0 ; k<view.shape[2] ; ++k) {
// Compute the offset from the view strides
const size_t offset = iStride * i + jStride * j + kStride * k;
// Extract an item at the memory offset
Type item = (Type*)(data + offset);
// Do something with item here (eg. print it)
}
}
}
When you apply a transposition, Numpy change the stride so that iStride
and jStride
are swapped. It also update the shape (view.shape[0]
and view.shape[1]
are swapped too). The code will execute as if the two loops are swapped except the memory accesses will be less efficient because they are less contiguous. Here is an example:
arr = np.arange(16).reshape((2, 2, 4))
arr.strides # (32, 16, 4)
view = arr.transpose((1, 0, 2))
view.strides # (16, 32, 4) <-- note that 16 and 32 have been swapped
Note that the strides are in bytes (and not a number of items).