pythonnumpynumpy-ndarraynumpy-indexing

Add repeated elements of array indexed by another array


I have a relatively simple problem that I cannot solve without using loops. It is difficult for me to figure out the correct title for this problem. Lets say we have two numpy arrays:

array_1 = np.array([[0, 1, 2],
                    [3, 3, 3],
                    [3, 3, 4],
                    [3, 6, 2]])

array_2 = np.array([[0, 0, 0], 
                    [1, 1, 1],
                    [2, 2, 2],
                    [3, 3, 3],
                    [4, 4, 4],
                    [5, 5, 5],
                    [6, 6, 6]])

array_1 represents indices of the rows in array_2 that we want to sum. So for example, 4th row in result array should contain summed all rows in array_2 that have same row indices as all 3s in array_1.

It is much easier to understand it in the code:

result = np.empty(array_2.shape)

for i in range(array_1.shape[0]):
    for j in range(array_1.shape[1]):
        index = array_1[i, j]
        result[index] = result[index] + array_2[i]

Result should be:

[[ 0  0  0]
 [ 0  0  0]
 [ 3  3  3]
 [10 10 10]
 [ 2  2  2]
 [ 0  0  0]
 [ 3  3  3]]

I tried to use np.einsum but I need to use both elements in array as indices and also its rows as indices so I'm not sure if np.einsum is the best path here.

This is the problem I have in graphics. array_1 represent indices of vertices for triangles and array_2 represents normals where index of a row corresponds to the index of the vertex


Solution

  • Any time you're adding something from a repeated index, normal ufuncs like np.add don't work out of the box because they only process a repeated fancy index once. Instead, you have to use the unbuffered version, which is np.add.at.

    Here, you have a pair of indices: the row in array_1 is the row index into array_2, and the element of array_1 is the row index into the output.

    First, construct the indices explicitly as fancy indices. This will make it much simpler to use them:

    output_row = array_1.ravel()
    input_row = np.repeat(np.arange(array_1.shape[0]), array_1.shape[1]).ravel()
    

    You can apply input_row directly to array_2, but you need add.at to use output_row:

    output = np.zeros_like(array_2)
    np.add.at(output, output_row, array_2[input_row])
    

    You really only use the first four rows of array_2, so it could be truncated to

    array_2 = array2[:array_1.shape[0]]
    

    In that case, you would want to initialize the output as:

    output = np.zeros_like(array_2, shape=(output_row.max() + 1, array2.shape[1]))