Using the small example below, I'm seeking to aggregate (sum) the values in the 3D dat_arr
array using two other arrays to guide the grouping. The first index of dat_arr
is related to time. The second and third indices are related to spatial (X, Y) locations. How can I sum values in dat_arr
such that the temporal binning is guided by the contents of tim_idx
(same length as the first dimension of dat_arr
) and the spatial binning uses zon_arr
(has same dimensions as the last two indices of dat_arr
)?
import numpy as np
import matplotlib.pyplot as plt
zon_arr = np.zeros((3,5))
tim_idx = np.array([0,0,1,1,2,2,3,3])
# set up arbitrary zones
zon_arr[1, :3] = 1
zon_arr[1, 3:] = 2
# plt.imshow(zon_arr)
# plt.show()
# generate arbitrary array with data
# first index = time; last 2 indices represent X-Y pts in space
# last two indices must have same dims as zon_arr
np.random.seed(100)
dat_arr = np.random.rand(8, 3, 5)
So the output I'm expecting would give me the sum of values contained in dat_arr
for each unique value in tim_idx
and zon_arr
. In other words, I would expect output that has 3 value (corresponding to 3 zones) for each of the 4 unique time values in tim_idx
?
First, let's compute this with a loop to get a sense of the potential output:
sums = {}
# for each combination of coordinates
for i in range(len(tim_idx)):
for j in range(zon_arr.shape[0]):
for k in range(zon_arr.shape[1]):
# add the value to the (time, zone) key combination
key = (tim_idx[i], zon_arr[j, k])
sums[key] = sums.get(key, 0) + dat_arr[i, j, k]
which gives us:
{(0, 0): 8.204124414317336,
(0, 1): 3.8075543426771645,
(0, 2): 1.2233223229754382,
(1, 0): 7.920231812858928,
(1, 1): 4.150642040307019,
(1, 2): 2.4211020016615836,
(2, 0): 10.363684964675313,
(2, 1): 3.06163710842573,
(2, 2): 1.9547272492467518,
(3, 0): 10.841595367423158,
(3, 1): 2.6617183569891893,
(3, 2): 2.0222766813453674}
Now we can leverage numpy indexing to perform the same thing in a vectorial way. meshgrid
to generate the indexer, unique
to get the unique combinations, and bincount
to compute the sum per group:
# create the indexer from the combination of time/zone
i, j = np.meshgrid(tim_idx, zon_arr, indexing='ij')
coord = np.c_[i.ravel(), j.ravel()]
# alternatively
# coord = np.c_[np.repeat(tim_idx, zon_arr.size),
# np.tile(zon_arr.flat, len(tim_idx))]
# identify the unique combinations for later aggregation
keys, idx = np.unique(coord, return_inverse=True, axis=0)
# compute the counts per key
sums = np.bincount(idx, dat_arr.ravel())
Output:
# keys
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2],
[2, 0],
[2, 1],
[2, 2],
[3, 0],
[3, 1],
[3, 2]])
# sums
array([ 8.20412441, 3.80755434, 1.22332232, 7.92023181, 4.15064204,
2.421102 , 10.36368496, 3.06163711, 1.95472725, 10.84159537,
2.66171836, 2.02227668])