pythonnumpy

aggregate 3D array using zone and time index arrays


Using the small example below, I'm seeking to aggregate (sum) the values in the 3D dat_arr array using two other arrays to guide the grouping. The first index of dat_arr is related to time. The second and third indices are related to spatial (X, Y) locations. How can I sum values in dat_arr such that the temporal binning is guided by the contents of tim_idx (same length as the first dimension of dat_arr) and the spatial binning uses zon_arr (has same dimensions as the last two indices of dat_arr)?

import numpy as np
import matplotlib.pyplot as plt

zon_arr = np.zeros((3,5))
tim_idx = np.array([0,0,1,1,2,2,3,3])

# set up arbitrary zones
zon_arr[1, :3] = 1
zon_arr[1, 3:] = 2

# plt.imshow(zon_arr)
# plt.show()

# generate arbitrary array with data
# first index = time; last 2 indices represent X-Y pts in space
# last two indices must have same dims as zon_arr
np.random.seed(100)
dat_arr = np.random.rand(8, 3, 5)

So the output I'm expecting would give me the sum of values contained in dat_arr for each unique value in tim_idx and zon_arr. In other words, I would expect output that has 3 value (corresponding to 3 zones) for each of the 4 unique time values in tim_idx?


Solution

  • First, let's compute this with a loop to get a sense of the potential output:

    sums = {}
    # for each combination of coordinates
    for i in range(len(tim_idx)):
        for j in range(zon_arr.shape[0]):
            for k in range(zon_arr.shape[1]):
                # add the value to the (time, zone) key combination
                key = (tim_idx[i], zon_arr[j, k])
                sums[key] = sums.get(key, 0) + dat_arr[i, j, k]
    

    which gives us:

    {(0, 0): 8.204124414317336,
     (0, 1): 3.8075543426771645,
     (0, 2): 1.2233223229754382,
     (1, 0): 7.920231812858928,
     (1, 1): 4.150642040307019,
     (1, 2): 2.4211020016615836,
     (2, 0): 10.363684964675313,
     (2, 1): 3.06163710842573,
     (2, 2): 1.9547272492467518,
     (3, 0): 10.841595367423158,
     (3, 1): 2.6617183569891893,
     (3, 2): 2.0222766813453674}
    

    Now we can leverage numpy indexing to perform the same thing in a vectorial way. meshgrid to generate the indexer, unique to get the unique combinations, and bincount to compute the sum per group:

    # create the indexer from the combination of time/zone
    i, j = np.meshgrid(tim_idx, zon_arr, indexing='ij')
    coord = np.c_[i.ravel(), j.ravel()]
    # alternatively
    # coord = np.c_[np.repeat(tim_idx, zon_arr.size),
    #               np.tile(zon_arr.flat, len(tim_idx))]
    
    # identify the unique combinations for later aggregation
    keys, idx = np.unique(coord, return_inverse=True, axis=0)
    
    # compute the counts per key
    sums = np.bincount(idx, dat_arr.ravel())
    

    Output:

    # keys
    array([[0, 0],
           [0, 1],
           [0, 2],
           [1, 0],
           [1, 1],
           [1, 2],
           [2, 0],
           [2, 1],
           [2, 2],
           [3, 0],
           [3, 1],
           [3, 2]])
    
    # sums
    array([ 8.20412441,  3.80755434,  1.22332232,  7.92023181,  4.15064204,
            2.421102  , 10.36368496,  3.06163711,  1.95472725, 10.84159537,
            2.66171836,  2.02227668])