numpymatrix-indexing

numpy - why Z[(0,2)] is view but Z[(0, 2), (0)] is copy?


Question

Why are the numpy tuple indexing behaviors inconsistent? Please explain the rational or design decision behind these behaviors. In my understanding, Z[(0,2)] and Z[(0, 2), (0)] are both tuple indexing and expected the consistent behavior for copy/view. If this is incorrect, please explain,

import numpy as np
Z = np.arange(36).reshape(3, 3, 4)
print("Z is \n{}\n".format(Z))

b =  Z[
    (0,2)      # Select Z[0][2]
]
print("Tuple indexing Z[(0,2)] is \n{}\nIs view? {}\n".format(
    b,
    b.base is not None
))

c = Z[         # Select Z[0][0][1] & Z[0][2][1]
    (0,2),
    (0)
]
print("Tuple indexing Z[(0, 2), (0)] is \n{}\nIs view? {}\n".format(
    c,
    c.base is not None
))
Z is 
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]

 [[24 25 26 27]
  [28 29 30 31]
  [32 33 34 35]]]

Tuple indexing Z[(0,2)] is 
[ 8  9 10 11]
Is view? True

Tuple indexing Z[(0, 2), (0)] is 
[[ 0  1  2  3]
 [24 25 26 27]]
Is view? False

Numpy indexing is confusing and wonder how people built the understanding. If there is a good way to understand or cheat-sheets, please advise.


Solution

  • It's the comma that creates a tuple. The () just set boundaries where needed.

    Thus

    Z[(0,2)]
    Z[0,2]
    

    are the same, select on the first 2 dimension. Whether that returns an element, or an array depends on how many dimensions Z has.

    The same interpretation applies to the other case.

    Z[(0, 2), (0)]
    Z[( np.array([0,2]), 0)]
    Z[ np.array([0,2]), 0]
    

    are the same - the first dimensions is indexed with a list/array, and thus is advanced indexing. It's a copy.

    [ 8  9 10 11]
    

    is a row of the 3d array; its a contiguous block of Z

    [[ 0  1  2  3]
     [24 25 26 27]]
    

    is 2 rows from Z. They aren't contiguous, so there's no way of identifying them with just shape and strides (and offset in the databuffer).

    details

    __array_interface__ gives details about the underlying data of an array

    In [146]: Z = np.arange(36).reshape(3,3,4)
    In [147]: Z.__array_interface__
    Out[147]: 
    {'data': (38255712, False),
     'strides': None,
     'descr': [('', '<i8')],
     'typestr': '<i8',
     'shape': (3, 3, 4),
     'version': 3}
    In [148]: Z.strides
    Out[148]: (96, 32, 8)
    

    For the view:

    In [149]: Z1 = Z[0,2]
    In [150]: Z1
    Out[150]: array([ 8,  9, 10, 11])
    In [151]: Z1.__array_interface__
    Out[151]: 
    {'data': (38255776, False),    # 38255712+8*8
     'strides': None,
     'descr': [('', '<i8')],
     'typestr': '<i8',
     'shape': (4,),
     'version': 3}
    

    The data buffer pointer is 8 elements further along in Z buffer. Shape is much reduced.

    In [152]: Z2 = Z[[0,2],0]
    In [153]: Z2
    Out[153]: 
    array([[ 0,  1,  2,  3],
           [24, 25, 26, 27]])
    In [154]: Z2.__array_interface__
    Out[154]: 
    {'data': (31443104, False),     # an entirely different location
     'strides': None,
     'descr': [('', '<i8')],
     'typestr': '<i8',
     'shape': (2, 4),
     'version': 3}
    

    Z2 is the same as two selections:

    In [158]: Z[0,0]
    Out[158]: array([0, 1, 2, 3])
    In [159]: Z[2,0]
    Out[159]: array([24, 25, 26, 27])
    

    It is not

    Z[0][0][1] & Z[0][2][1]
    Z[0,0,1] & Z[0,2,1]
    

    Compare that with a 2 row slice:

    In [156]: Z3 = Z[0:2,0]
    In [157]: Z3.__array_interface__
    Out[157]: 
    {'data': (38255712, False),   # same as Z's
     'strides': (96, 8),
     'descr': [('', '<i8')],
     'typestr': '<i8',
     'shape': (2, 4),
     'version': 3}
    

    A view is returned if the new array can be described with shape, strides and all or part of the original data buffer.