The problem I am facing is to create a new array from a set of indices. Namely, I have a set of particles and jets, for each jet there is a list of indices of which particles belong to the given jet.
import awkward as ak
import vector
particles = ak.Array({
"mass": ak.Array([1, 2, 3, 4, 5]),
"px": ak.Array([1, 2, 3, 4, 5]),
"py": ak.Array([1, 2, 3, 4, 5]),
"pz": ak.Array([1, 2, 3, 4, 5]),
})
particles_p4 = vector.awk(ak.zip({
"mass": particles["mass"],
"x": particles["px"],
"y": particles["py"],
"z": particles["pz"]
}))
jet = ak.Array({
"constituents": ak.Array([[1, 2, 4, 5], [3]]),
"energy": ak.Array([1.2, 3.4])
})
What I would like to get is for each jet the particle_p4 values in a new array like so:
<Array [[{x: 1, y: 1, z: 1, ... z: 4, tau: 4}]] type='2 * var * {"x": int64, "y"...'>
where the first element of that would be:
<Array [{x: 1, y: 1, z: 1, ... z: 5, tau: 5}] type='4 * {"x": int64, "y": int64,...'>
Doing this with a for loop is trivial, however I have the notion that this can be done in a more efficient way with the tools available in Awkward array.
Bonus: What about even more nested arrays, for example where each event has multiple jets?
First, I think you mean you have
jet = ak.Array({
"constituents": ak.Array([[0, 1, 3, 4], [2]]),
"energy": ak.Array([1.2, 3.4])
})
because I'd expect the "constituents"
indexes to be 0-based, not 1-based. But even if it is 1-based, just start by subtracting 1.
>>> jet.constituents - 1
<Array [[0, 1, 3, 4], [2]] type='2 * var * int64'>
The biggest problem here is that these indexes are nested one level deeper than the particles_p4
that you want to slice. You want the 0
, 1
, 3
, 4
, and also the 2
, in your jet.constituents
to be indexes in the not-nested list, particles_p4
.
If we just arbitrarily flatten them (axis=-1
means to squash the last/deepest dimension):
>>> ak.flatten(jet.constituents, axis=-1)
<Array [0, 1, 3, 4, 2] type='5 * int64'>
these indexes are exactly what you'd need to apply to particles_p4
. Here, I'm using the current (2.x) version of Awkward Array, so that I can use .show()
, but the integer-array slice works in any version of Awkward Array.
>>> particles_p4[ak.flatten(jet.constituents, axis=-1)].show(type=True)
type: 5 * Momentum4D[
x: int64,
y: int64,
z: int64,
tau: int64
]
[{x: 1, y: 1, z: 1, tau: 1},
{x: 2, y: 2, z: 2, tau: 2},
{x: 4, y: 4, z: 4, tau: 4},
{x: 5, y: 5, z: 5, tau: 5},
{x: 3, y: 3, z: 3, tau: 3}]
If we take that as a partial solution, all we need to do now is put the nested structure back into the result.
ak.flatten has an opposite, ak.unflatten, which takes a flat array and adds nestedness from an array of list lengths. You can get the list lengths from the original jet.constituents
with ak.num. Again, I'll use axis=-1
so that this answer will generalize to deeper nestings.
>>> lengths = ak.num(jet.constituents, axis=-1)
>>> lengths
<Array [4, 1] type='2 * int64'>
>>> rearranged = particles_p4[ak.flatten(jet.constituents, axis=-1)]
>>> rearranged
<MomentumArray4D [{x: 1, y: 1, z: 1, tau: 1}, ..., {...}] type='5 * Momentu...'>
>>> result = ak.unflatten(rearranged, lengths, axis=-1)
>>> result.show(type=True)
type: 2 * var * Momentum4D[
x: int64,
y: int64,
z: int64,
tau: int64
]
[[{x: 1, y: 1, z: 1, tau: 1}, {x: 2, ...}, ..., {x: 5, y: 5, z: 5, tau: 5}],
[{x: 3, y: 3, z: 3, tau: 3}]]
For the bonus round, if all of the above arrays (particles_p4
and jet
) were arrays of lists, where each list represents one event, rather than an array representing one event, then the above would hold. I'm taking it as a given that the length of the particles_p4_by_event
is equal to the length of the jet_by_event
arrays, and the values of jet_by_event.constituents
are indexes within each event in particles_p4_by_event
(not global indexes; each event should restart at zero). That is, all of your arrays agree on how many events there are, and each event is handled individually, with no cross-over between events.