TLorentz vector features in uproot4/vector when calculating invariant mass of a jet

I wish to sum all the 4-momenta of the constituents in a jet. In uproot3 (+ uproot3-methods) there was the functionality of creating a TLorentzVectorArray and just doing .sum()

So this worked fine:

import uproot3
import akward0 as ak

input_file = uproot3.open(input_path)
tree = input_file['Jets']
pt = tree.array('Constituent_pt')
phi = tree.array('Constituent_phi')
eta = tree.array('Constituent_eta')
energy = tree.array('Constituent_energy')
mass = tree.array('Constituent_mass')
p4 = uproot3_methods.TLorentzVectorArray.from_ptetaphie(pt, eta, phi, energy)
jet_p4_u3 = p4.sum()
jet_pt_u3 = jet_p4.pt
jet_eta_u3 = jet_p4.eta
jet_phi_u3 = jet_p4.phi
jet_energy_u3 = jet_p4.energy

However, since uproot3 is deprecated, the way to go according to TLorentz vector in Uproot 4 seems to be the vector package. What I tried was the following.

import uproot
import awkward
import vector

input_file = uproot.open(input_path)
tree = input_file['Jets']
pt = tree.arrays()['Constituent_pt']
phi = tree.arrays()['Constituent_phi']
eta = tree.arrays()['Constituent_eta']
energy = tree.arrays()['Constituent_energy']
mass = tree.arrays()['Constituent_mass']
p4 = vector.awk({"pt": pt, "phi": phi, "eta": eta, "energy": energy})

The problem now is that this functionality p4.sum() seems to not exist there. The other possibility that I found was shown in the vector discussion #117. So, now I add after the imports vector.register_awkward() and to the end jet_p4_u4 = ak.Array(p4, with_name="Momentum4D"),

import uproot
import awkward
import vector
vector.register_awkward()

input_file = uproot.open(input_path)
tree = input_file['Jets']
pt = tree.arrays()['Constituent_pt']
phi = tree.arrays()['Constituent_phi']
eta = tree.arrays()['Constituent_eta']
energy = tree.arrays()['Constituent_energy']
mass = tree.arrays()['Constituent_mass']
p4 = ak.Array({"pt": pt, "phi": phi, "eta": eta, "energy": energy})
jet_p4_u4 = ak.Array(p4, with_name="Momentum4D")

The question remains, how do I sum the 4-momenta? When doing ak.sum(jet_p4_u4, axis=-1), only pt and energy seem to have the correct values, eta and phi however are completely different from the result from uproot3.

Update: It seems that since the ```ak.sum`` function is not able to add together the angles in the wanted way, then replacing the summing part with summing x, y, z and energy and constructing the vector like this solves the problem. However, I believe there must be a better way than this. So current working version:

import uproot
import awkward
import vector

input_file = uproot.open(input_path)
tree = input_file['Jets']
pt = tree.arrays()['Constituent_pt']
phi = tree.arrays()['Constituent_phi']
eta = tree.arrays()['Constituent_eta']
energy = tree.arrays()['Constituent_energy']
mass = tree.arrays()['Constituent_mass']
p4 = vector.awk({"pt": pt, "phi": phi, "eta": eta, "energy": energy})
p4_lz = vector.awk({"x": p4.x, "y": p4.y, "z": p4.z, "t": energy})
lz_sum = ak.sum(p4_lz, axis=-1)
jet_p4 = vector.awk({
    "x": lz_sum.x,
    "y": lz_sum.y,
    "z": lz_sum.z,
    "t": lz_sum.t
})
jet_energy = jet_p4.t
jet_mass = jet_p4.tau
jet_phi = jet_p4.phi
jet_pt = jet_p4.rho

Solution

For a solution that works equally well for flat arrays of Lorentz vectors as for jagged arrays of Lorentz vectors, try this:

import uproot
import awkward as ak
import vector
vector.register_awkward()   # any record named "Momentum4D" will be Lorentz

with uproot.open(input_path) as input_file:
    tree = input_file["Jets"]
    arrays = tree.arrays(filter_name="Constituent_*")
    p4 = ak.zip({
        "pt": arrays.Constituent_pt,
        "phi": arrays.Constituent_phi,
        "eta": arrays.Constituent_eta,
        "energy": arrays.Constituent_energy,
    }, with_name="Momentum4D")
    jet_p4 = ak.zip({
        "px": ak.sum(p4.px, axis=-1),
        "py": ak.sum(p4.py, axis=-1),
        "pz": ak.sum(p4.pz, axis=-1),
        "energy": ak.sum(p4.energy, axis=-1)
    }, with_name="Momentum4D")

Note that the uproot.TTree.arrays function, if given no arguments, will read all TBranches in the TTree. In your function, you read all the data four times, each time selecting a different column from the data that had been read and throwing the rest out.

Also, I don't like the vector.awk function because it can construct arrays of type:

N * Momentum4D[px: var * float64, py: var * float64, pz: var * float64, E: var * float64]

(in other words, each "px" value is a list of floats), rather than what you want:

N * var * Momentum4D[px: float64, py: float64, pz: float64, E: float64]

ak.zip combines the lists so that the "px" of each Lorentz vector is just a number, but you can have nested lists of Lorentz vectors. This only makes a difference if you have jagged arrays, but I'm pointing it out so that no one falls into this trap.

The with_name="Momentum4D" argument labels the records with that name, and having Lorentz-vector behaviors registered with vector.register_awkward() gives all such records Lorentz vector methods. In this case, we're using it so that p4, defined in terms of pt, phi, eta, energy, has properties px, py, pz—in other words, doing coordinate transformations on demand.

There isn't a Lorentz vector summation method that sums over each item in a jagged array (the uproot-methods one was a hack that only works for jagged arrays of Lorentz vectors, no other structures, like jagged-jagged, etc.), so sum the components with ak.sum in Cartesian coordinates.