pythonarraysnumpynumpy-ndarraynumpy-indexing

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)


I have an np.ndarray of shape (5, 5, 2, 2, 2, 10, 8) named table. I can succesfully slice it like this:

table[4, [0, 1], 1, 1, 1, slice(0, 10, None), slice(0, 8, None)]
table[4, [0, 1], 1, 1, 1, [0, 2], slice(0, 8, None)]

But for some reason when I try to specify three values for dimension 5 (of length 10) like this:

table[4, [0, 1], 1, 1, 1, [0, 2, 6], slice(0, 8, None)]

I get:

>>> IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 

The same is for:

table[4, [0, 1, 4], 1, 1, 1, [0, 2], slice(0, 8, None)]

This does not happen with:

table[4, [0, 1, 4], 1, 1, 1, slice(0, 10, None), slice(0, 8, None)]
table[4, [1, 0, 4], 1, 1, 1, slice(0, 10, None), slice(0, 8, None)]

which output the correct result.

I tried to read similar questions here on broadcasting but I was still confused why Numpy can't make sense of this slice notation. Why does it act all puzzled when I give it more than two points along an axis to slice with when there's already another array in the indices?


Solution

  • In [219]: table = np.zeros((5, 5, 2, 2, 2, 10, 8),int)    
    In [220]: table.shape
    Out[220]: (5, 5, 2, 2, 2, 10, 8)
    

    The fact that you use slice instead of : doesn't matter; same for the fact that the trailing slices don't have to be specified.

    In [221]: table[4, [0, 1], 1, 1, 1, slice(0, 10, None), slice(0, 8, None)].shape
    Out[221]: (2, 10, 8)
    

    This has an advanced indexing array/list of length 2 - the other dimensions are either scalars or slices. So they disappear or 'pass through'.

    In [222]: table[4, [0, 1], 1, 1, 1, [0, 2], slice(0, 8, None)].shape
    Out[222]: (2, 8)
    

    Here you have two advanced indexing lists - both length 2, so they 'broadcast' together to select 2 values (I think of this as a kind of 'diagonal').

    In [223]: table[4, [0, 1, 4], 1, 1, 1, slice(0, 10, None), slice(0, 8, None)].shape
    Out[223]: (3, 10, 8)
    

    Same as before but with a length 3 list.

    But when the 2 lists have different length you get an error:

    In [225]: table[4, [0, 1], 1, 1, 1, [0, 2, 6], slice(0, 8, None)].shape
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    Input In [225], in <cell line: 1>()
    ----> 1 table[4, [0, 1], 1, 1, 1, [0, 2, 6], slice(0, 8, None)].shape
    
    IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 
    

    If one list is (2,1), then it works - it selects 2 in one dimension, and 3 in the other:

    In [226]: table[4, [[0], [1]], 1, 1, 1, [0, 2, 6], slice(0, 8, None)].shape
    Out[226]: (2, 3, 8)
    

    In indexing, 'broadcasting' follows the same rules as when adding (or multiplying) arrays.

    (2,) with (2,) => (2,)
    (2,1) with (3,) => (2,3)
    (2,) with (3,) error
    
    In [227]: np.ones(2)+np.ones(2)
    Out[227]: array([2., 2.])
    
    In [228]: np.ones(2)+np.ones(3)
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [228], in <cell line: 1>()
    ----> 1 np.ones(2)+np.ones(3)
    
    ValueError: operands could not be broadcast together with shapes (2,) (3,) 
    
    In [229]: np.ones((2,1))+np.ones(3)
    Out[229]: 
    array([[2., 2., 2.],
           [2., 2., 2.]]
    

    edit

    Look at a simpler 2d array:

    In [261]: arr = np.arange(6).reshape(2,3)
    
    In [262]: arr
    Out[262]: 
    array([[0, 1, 2],
           [3, 4, 5]])
    

    If I index with 2 (2,) arrays I get 2 values:

    In [264]: arr[np.array([0,1]), np.array([1,2])]
    Out[264]: array([1, 5])
    

    But if I index with a (2,1) and (2,), I get a (2,2) shape result. Note where the [1,5] values are:

    In [265]: arr[np.array([0,1])[:,None], np.array([1,2])]
    Out[265]: 
    array([[1, 2],
           [4, 5]])
    

    ix_ is a handy tool for constructing such a "cartesian" set of indexing arrays. For example 3 lists I get:

    In [266]: np.ix_([1,2],[3,4,5],[6,7])
    Out[266]: 
    (array([[[1]],
     
            [[2]]]),
     array([[[3],
             [4],
             [5]]]),
     array([[[6, 7]]]))
    
    In [267]: [i.shape for i in np.ix_([1,2],[3,4,5],[6,7])]
    Out[267]: [(2, 1, 1), (1, 3, 1), (1, 1, 2)]
    

    Together those will select a block of shape (2,3,2) from a 3d (or larger) array.

    Formally this is described in https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing

    (Your slices are all at the end. There is a nuance to this indexing when slices occur in the middle. See the subsection about Combining advanced and basic indexing if that arises.)