pythonnumpysplit

Is there an efficient way to sum over numpy splits


I would like to split array into chunks, sum the values in each chunk, and return the result as another array. The chunks can have different sizes. This can be naively done by using numpy split function like this

def split_sum(a: np.ndarray, breakpoints: np.ndarray) -> np.ndarray:
    return np.array([np.sum(subarr) for subarr in np.split(a, breakpoints)])

However, this still uses a python for-loop and is thus inefficient for large arrays. Is there a faster way?


Solution

  • You wouldn't really split an array in numpy unless this is the last step.

    Numpy can handle your operation natively with numpy.add.reduceat (a minor difference with your function is how the breakpoints are defined, you will need to prepend 0 with reduceat):

    arr = np.arange(20)
    breakpoints = np.array([2, 5, 10, 12])
    
    def split_sum(a: np.ndarray, breakpoints: np.ndarray) -> np.ndarray:
        return np.array([np.sum(subarr) for subarr in np.split(a, breakpoints)])
    
    split_sum(arr, breakpoints)
    # array([  1,   9,  35,  21, 124])
    
    np.add.reduceat(arr, np.r_[0, breakpoints])
    # array([  1,   9,  35,  21, 124])