pythonawkward-array

Accessing elements of an awkward array that are not a passed-in index


I'm trying to access the elements of an awkward array that do not correspond to some particular set of indices. I have 3 events in total with one jet per event and some number of leptons. Each lepton has a particular flag associated with it. For each jet I keep track of the indices of the leptons in that jet:

jet_lepton_indices = ak.Array([[0, 2], [1], [2,3]])
print(f'jet_lepton_indices\n{jet_lepton_indices}\n')

lepton_flags = ak.Array([[0, 10, 20, 30], [0, 10, 20, 30], [0, 10, 20, 30, 40]])
print(f'lepton_flags\n{lepton_flags}\n')

The output:

jet_lepton_indices
[[0, 2], [1], [2, 3]]

lepton_flags
[[0, 10, 20, 30], [0, 10, 20, 30], [0, 10, 20, 30, 40]]

If I want the flags of the leptons that are in each jet I do lepton_flags[jet_lepton_indices] and get:

[[0, 20],
 [10],
 [20, 30]]

But I also need to access all the lepton flags associated with leptons that are not in the jets. I'd like to be able to produce:

[[10, 30],
 [0, 20, 30],
 [0, 10, 40]]

I thought I could do lepton_flags[~jet_lepton_indices], but that has behavior I don't understand. A way to flatten/unflatten I can't figure that out either.


Solution

  • (The "TL;DR" is at the bottom, below the horizontal line.)

    The ~ (bitwise not) didn't work on your array of integers because it just inverted the bits in the integers:

    >>> jet_lepton_indices
    <Array [[0, 2], [1], [2, 3]] type='3 * var * int64'>
    >>> ~jet_lepton_indices
    <Array [[-1, -3], [-2], [-3, -4]] type='3 * var * int64'>
    

    Ultimately, what you want is to convert your integer array slice into a boolean array slice. As slices, integer arrays have strictly more information than boolean arrays: they can duplicate and change the order of elements from the sliced array in addition to just dropping elements. Thus, integer array slices can always be converted into boolean array slices, but not the other way around. In fact, there has been a request for such a function, #497, and that issue describes several ways of getting it, all different from the one I worked out below. (I'm still going to show the example that I just worked out because it's simpler and demonstrates a common pattern: cartesian to increase dimensions, do something in the new dimension, then aggregate over it to get back to the old number of dimensions.)

    Another fact about boolean array slices is that they have to agree with the list-lengths of the array that they slice. (Footnote: to invert a selection, you need to know the universal set, so that's why it's only possible to invert a boolean array slice.) Therefore, to convert an integer array slice into a boolean array slice, we need to use the array to be sliced, lepton_flags. We can use ak.local_index to make an integer array of integer indexes for all elements that exist in lepton_flags:

    >>> all_indices = ak.local_index(lepton_flags)
    >>> all_indices
    <Array [[0, 1, 2, 3], [0, ..., 3], [0, 1, 2, 3, 4]] type='3 * var * int64'>
    

    Now the goal will be to find booleans for each one of these indices that say whether the index is in jet_lepton_indices or not. That kind of question has the form, "for each item in X (the local index), is there any item in Y (jet_lepton_indices) for which Z (they're equal)?" The "for each" of one array with another array is handled by ak.cartesian, and since we'll want to aggregate over everything associated with a single item of X ("is ak.any item equal?") we'll need nested=True to make a new dimension, to later aggregate over.

    >>> pairs = ak.cartesian([all_indices, jet_lepton_indices], nested=True)
    >>> pairs.show(type=True)
    type: 3 * var * var * (
        int64,
        int64
    )
    [[[(0, 0), (0, 2)], [(1, 0), (1, 2)], [(2, ...), ...], [(3, 0), (3, 2)]],
     [[(0, 1)], [(1, 1)], [(2, 1)], [(3, 1)]],
     [[(0, 2), (0, 3)], [(1, 2), (1, 3)], ..., [(3, ...), ...], [(4, 2), (4, 3)]]]
    

    The pairs are more deeply nested (var * var *) than all_indices and jet_lepton_indices (var *) because we asked for the results to be grouped by same-first-index (nested=True).

    The left item in each of these pairs is from all_indices and the right is from jet_lepton_indices, for all combinations. To separate them, use ak.unzip:

    >>> whole_set, in_set = ak.unzip(pairs)
    >>> whole_set
    <Array [[[0, 0], [1, 1], [...], [3, 3]], ...] type='3 * var * var * int64'>
    >>> in_set
    <Array [[[0, 2], [0, 2], [...], [0, 2]], ...] type='3 * var * var * int64'>
    

    The whole_set and in_set line up because they come from the same pairs. Since they line up, we can use == on them, to get a boolean that's True if and only if a member of the whole_set is in the in_set.

    >>> whole_set == in_set
    <Array [[[True, False], ..., [False, ...]], ...] type='3 * var * var * bool'>
    

    If any (ak.any) of these innermost lists (axis=-1) is True, then we want to say that the whole group, representing an item from all_indices, is in jet_lepton_indices.

    >>> jet_lepton_boolean = ak.any(whole_set == in_set, axis=-1)
    >>> jet_lepton_boolean
    <Array [[True, False, True, False], ..., [False, ...]] type='3 * var * bool'>
    

    This jet_lepton_boolean is a boolean array that can be used as a slice to produce the same elements as jet_lepton_indices. As a boolean, it can be negated with ~.

    >>> lepton_flags[~jet_lepton_boolean]
    <Array [[10, 30], [0, 20, 30], [0, 10, 40]] type='3 * var * int64'>
    

    That's the selection of lepton_flags that you want: it's everything except what was in

    >>> lepton_flags[jet_lepton_indices]
    <Array [[0, 20], [10], [20, 30]] type='3 * var * int64'>
    

    As an alternative, you could have constructed the negated booleans directly by using != instead of ==.


    Here's a summary of this method, as a function:

    def indices_to_booleans(indices, array_to_slice):
        whole_set, in_set = ak.unzip(ak.cartesian([
            ak.local_index(array_to_slice), indices
        ], nested=True))
        return ak.any(whole_set == in_set, axis=-1)
    

    This solution depends on the fact that your original arrays are only one level deep (var *), though I think it might generalize if you pass the appropriate axis argument to ak.cartesian, but I haven't thought about it enough to be sure.

    Also, #497 provides more ways of doing it.