pythonarraysnumpyweighted-average

How to calculate the weighted average of the rows of a matrix, but with different weights per row?


As the title implies, I have a numpy matrix (2d array) that happens to be symmetric with 0s in its diagonal. I wanted to use the np.average method in order to collapse its rows into a 1d column array of weighted averages using a weight array from the same length of the rows of the matrix. However, since the diagonals are zeros for a justified reason, I don't want to count it in the result of the row's weighted average. In other words, I want a varying set of weights for each row, such that for the row i the corresponding weight[i] will be zero and the rest of the weights will remain the same.

Is it possible to do this without an explicit loop?
What is the best way to do it?

Code example-
Generate the matrix and the weights:

mat = np.array([[       0,     2436,     2434,     2428,     2416],
                [    2436,        0,     2454,     2446,     2435],
                [    2434,     2454,        0,     2447,     2436],
                [    2428,     2446,     2447,        0,     2428],
                [    2416,     2435,     2436,     2428,        0]])
weights = np.array([262140,   196608,   196608, 196608, 196608])

Current (wrong) implementation:
Calculate the weighted average:

weighted_avg = np.average(mat, axis=-1, weights=weights)
print(weighted_avg)

Out: [1821.38194802 1984.31077694 1984.18578409 1979.68578982 1972.56080841]

Loop implementation:

weighted_avg = []
for i in range(mat.shape[0]):
    curr_weights = weights.copy()
    curr_weights[i] = 0
    weighted_avg.append(np.average(mat[i], axis=-1, weights=curr_weights))

weighted_avg = np.array(weighted_avg)
print(weighted_avg)

Out: [2428.5        2442.23079848 2442.076961   2436.53850163 2427.76928603]

How can I make this loop implementation work using 'proper numpy'?


Solution

  • This can be done in this vectorized way:

    wr = np.repeat(weights[None,:], repeats=mat.shape[0],axis=0) 
    # expand weights array to match the shape of mat array
    # fill the diagonal with 0
    np.fill_diagonal(wght_repeat, 0)
    wght_avg = np.average(mat, axis=-1, weights = wr)
    print(wght_avg)
    >>array([2428.5       , 2442.23079848, 2442.076961  , 2436.53850163,
       2427.76928603])