I'm trying to use the excellent uproot
and awkward-array
to read some analysis data stored in a TTree. I understand that ROOT doesn't write nested vectors (ie. std::vector<std::vector<int>>
) in a columnar format, but following this discussion, I modified my tree output to contain two separate branches: one std::vector<int>
with the content, and one std::vector<int>
with the offsets. The contents vector has values pushed into it multiple times between filling the tree. Each time it has values pushed in, the size of the contents vector is stored in the offsets.
My idea was that I would recreate the structure that I need via a nested JaggedArray
when I read the tree. However, reading through the awkward-array documentation, I can't seem to figure out the right way to construct this nested JaggedArray
without looping in python. fromoffsets
requires a 1D index, which means that the jagged indices must be flattened, which then loses their structure. None of the other classmethod
s seem to fit. The example below uses a generator, which I think will be rather slow due to looping in python. Is there a better way to construct the JaggedArray
? Or a better way to store the data in the tree?
import awkward as ak
all_jagged_indices = ak.fromiter([[0, 1, 4], [0, 1, 2, 3]])
all_constituents = ak.fromiter([[12, 14, 3, 4], [2, 8, 3]])
output = ak.fromiter(
(ak.JaggedArray.fromoffsets(jagged_indices, constituents)
for jagged_indices, constituents in
zip(all_jagged_indices, all_constituents))
)
expected = ak.fromiter([[[12], [14, 3, 4]], [[2], [8], [3]]])
assert (output == expected).all().all().all()
Thanks!
You've got the right idea, but ultimately, there isn't a way to convert a jagged ObjectArray
into a doubly jagged array without a "for" loop. The structure of the data requires it.
This is a key issue, though, and it's a reason why some of these algorithms are being ported into C++. The last plot in this talk directly addresses this kind of data (jagged^N of numbers) with a "for" loop moved into C++. This is in development for Awkward 1.0 and Uproot 4.0, which is scheduled to be ready for users at the end of April. (At which point, the conversion of std::vector<std::vector<numbers>>
will be automatic, because there's no performance penality anymore.)
At the moment, however, a Python "for" loop, implicitly within fromiter
, is the best you can do.