uprootawkward-array

awkward array ak.unzip behaviour


When I acess a root file and extract the data I want like in following example:

events=uproot.open(filename)["btagana/ttree;6"]    
jet_data=events.arrays(filter_name=["Jet_nFirstTrack","Jet_nLastTrack","Jet_pt","Jet_phi","Jet_eta"],library="ak")

Where the sorting of the keys of this array doesn't resemble the sorting of the list used to filter the keys.If I now use ak.unzip():

jet_data=ak.unzip(jet_data)

is the sorting reliable and reproducable? If I open different root files, would I be able to achieve the same "sorting"


Solution

  • This is actually a question about Uproot. In this line:

    >>> jet_data=events.arrays(filter_name=["Jet_nFirstTrack","Jet_nLastTrack","Jet_pt","Jet_phi","Jet_eta"],library="ak")
    

    the filter_name is just a filter, accepting or rejecting branches from the ROOT file. Those branches have a natural order in the file, and the output is probably that order (and therefore stable upon repeated attempts, unless a dict is involved at some point and you're using Python <= 3.5).

    If you want to enforce an order, pass your list of branch names as expressions, rather than filter_name. That argument has different meaning: expressions can be simple formulas; filter_name can have wildcards—therefore, a character like * has very different meanings in each!

    Alternatively, you can reorder the fields after reading the array by slicing with a list of strings. There's no performance penalty for doing so—it's just rearranging metadata (time to completion does not scale with the length of the array). This documentation has some examples (including more complex cases where you're selecting fields within fields, but the simple case is enough for your issue).

    Edit: I should add that fields of records in Awkward Arrays have a reproducible order. They're not unstable hashmaps like dicts in Python <= 3.5. They're actually two equal length lists: the ordered fields (which is what ak.unzip returns) and ordered fields names (which ak.fields returns). The names are optional—without field names, records become tuples.