
Will the NumPy broadcast array ever be created during a binary operation?

I have two numpy.ndarray instances with different shapes. If I add these two arrays, broadcasting will occur between them:

import numpy as np

x = np.array([1, 2, 3])
y = np.array([[2,  3,  5],
              [7, 11, 13]])

print(x + y)
# [[ 3  5  8]
#  [ 8 13 16]]

Will the broadcast array ever be created? That is, will the following array be physically created from x before the operation?

[[1, 2, 3],
 [1, 2, 3]]

The problem is less significant with smaller arrays, but with larger arrays, the difference can be considerable. When implicit broadcasting leads to the creation of a new array, a significant amount of memory can be wasted by repeating the same numbers:

x = np.random.rand(10000)
y = np.random.rand(10000, 10000)

print(x + y)

When the broadcast array is actually created with x, the memory wastage becomes very large.

If such broadcasting occurs, is there a way to avoid creating a new array? If not (i.e. a new array is not created), how are binary operations between mismatching shapes implemented?


  • The array(s) is expanded via the magic of strides. np.broadcast_arrays and np.broadcast_to let you see the intermediate product, which will be a view.

    In [75]: x=np.array([1,2,3]); y = np.array([[1,2,3],[4,5,6]])
    In [76]: X=np.broadcast_to(x,y.shape)
    In [77]: y.shape, y.strides
    Out[77]: ((2, 3), (12, 4))
    In [78]: X.shape, X.strides
    Out[78]: ((2, 3), (0, 4))
    In [79]: X.base
    Out[79]: array([1, 2, 3])

    So X has the same shape as y, but the leading strides value is 0. This allows that dimension to be 'repeated' without actually copying. Note the base.

    In [80]: X+y
    array([[2, 4, 6],
           [5, 7, 9]])

    We get the same strides if we add a leading dimension with None/np.newaxis:

    In [81]: x.strides
    Out[81]: (4,)    
    In [82]: x[None,:].strides
    Out[82]: (0, 4)