pythonphysicsuprootawkward-array

Combine awkward-array JaggedArray contents and offsets into nested JaggedArray


I'm trying to use the excellent uproot and awkward-array to read some analysis data stored in a TTree. I understand that ROOT doesn't write nested vectors (ie. std::vector<std::vector<int>>) in a columnar format, but following this discussion, I modified my tree output to contain two separate branches: one std::vector<int> with the content, and one std::vector<int> with the offsets. The contents vector has values pushed into it multiple times between filling the tree. Each time it has values pushed in, the size of the contents vector is stored in the offsets.

My idea was that I would recreate the structure that I need via a nested JaggedArray when I read the tree. However, reading through the awkward-array documentation, I can't seem to figure out the right way to construct this nested JaggedArray without looping in python. fromoffsets requires a 1D index, which means that the jagged indices must be flattened, which then loses their structure. None of the other classmethods seem to fit. The example below uses a generator, which I think will be rather slow due to looping in python. Is there a better way to construct the JaggedArray? Or a better way to store the data in the tree?

import awkward as ak
all_jagged_indices = ak.fromiter([[0, 1, 4], [0, 1, 2, 3]])
all_constituents = ak.fromiter([[12, 14, 3, 4], [2, 8, 3]])
output = ak.fromiter(
    (ak.JaggedArray.fromoffsets(jagged_indices, constituents)
     for jagged_indices, constituents in
     zip(all_jagged_indices, all_constituents))
)
expected = ak.fromiter([[[12], [14, 3, 4]], [[2], [8], [3]]])
assert (output == expected).all().all().all()

Thanks!


Solution

  • You've got the right idea, but ultimately, there isn't a way to convert a jagged ObjectArray into a doubly jagged array without a "for" loop. The structure of the data requires it.

    This is a key issue, though, and it's a reason why some of these algorithms are being ported into C++. The last plot in this talk directly addresses this kind of data (jagged^N of numbers) with a "for" loop moved into C++. This is in development for Awkward 1.0 and Uproot 4.0, which is scheduled to be ready for users at the end of April. (At which point, the conversion of std::vector<std::vector<numbers>> will be automatic, because there's no performance penality anymore.)

    At the moment, however, a Python "for" loop, implicitly within fromiter, is the best you can do.