I'm trying to reproduce parts of the Higgs discovery in the Higgs --> 4 leptons channel with open data and making use of awkward
. I can do it when the leptons are the same (e.g. 4 muons) with zip/unzip, but is there a way to do it in the 2 muon/2 electron channel? I started with the example in the HSF tutorial
https://hsf-training.github.io/hsf-training-scikit-hep-webpage/04-awkward/index.html
So I now have the following. First get the input file
curl http://opendata.cern.ch/record/12361/files/SMHiggsToZZTo4L.root --output SMHiggsToZZTo4L.root
Then I do the following
import numpy as np
import matplotlib.pylab as plt
import uproot
import awkward as ak
# Helper functions
def energy(m, px, py, pz):
E = np.sqrt( (m**2) + (px**2 + py**2 + pz**2))
return E
def invmass(E, px, py, pz):
m2 = (E**2) - (px**2 + py**2 + pz**2)
if m2 < 0:
m = -np.sqrt(-m2)
else:
m = np.sqrt(m2)
return m
def convert(pt, eta, phi):
px = pt * np.cos(phi)
py = pt * np.sin(phi)
pz = pt * np.sinh(eta)
return px, py, pz
####
# Read in the file
infile_name = 'SMHiggsToZZTo4L.root'
infile = uproot.open(infile_name)
# Convert to Cartesian
muon_pt = infile_signal['Events/Muon_pt'].array()
muon_eta = infile_signal['Events/Muon_eta'].array()
muon_phi = infile_signal['Events/Muon_phi'].array()
muon_q = infile_signal['Events/Muon_charge'].array()
muon_mass = infile_signal['Events/Muon_mass'].array()
muon_px,muon_py,muon_pz = convert(muon_pt, muon_eta, muon_phi)
muon_energy = energy(muon_mass, muon_px, muon_py, muon_pz)
# Do the magic
nevents = len(infile['Events/Muon_pt'].array())
# nevents = 1000 # For testing
max_entries = nevents
muons = ak.zip({
"px": muon_px[0:max_entries],
"py": muon_py[0:max_entries],
"pz": muon_pz[0:max_entries],
"e": muon_energy[0:max_entries],
"q": muon_q[0:max_entries],
})
quads = ak.combinations(muons, 4)
mu1, mu2, mu3, mu4 = ak.unzip(quads)
mass_try = (mu1.e + mu2.e + mu3.e + mu4.e)**2 - ((mu1.px + mu2.px + mu3.px + mu4.px)**2 + (mu1.py + mu2.py + mu3.py + mu4.py)**2 + (mu1.pz + mu2.pz + mu3.pz + mu4.pz)**2)
mass_try = np.sqrt(mass_try)
qtot = mu1.q + mu2.q + mu3.q + mu4.q
plt.hist(ak.flatten(mass_try[qtot==0]), bins=100,range=(0,300));
And the histogram looks good!
So how would I do this for 2-electron + 2-muon combinations? I would guess there's a way to make lepton_xxx
arrays? But I'm not sure how to do this elegantly (quickly) such that I could also create a flag to keep track of what the lepton combinations are?
Thanks!
Matt
This could be answered in a variety of ways:
n=2
for the muons, ak.combinations with n=2
again for the electrons, and then combine them with ak.cartesian (and deal with tuples of tuples, rather than one level of tuples, which would mean two calls to ak.unzip)I'll go with the last method because I've decided that it's easiest.
Another thing you probably want to know about is the Vector library. After
import vector
vector.register_awkward()
you don't have to do explicit coordinate transformations or mass calculations. I'll be using that. Here's how I read in the data:
infile = uproot.open("/tmp/SMHiggsToZZTo4L.root")
muon_branch_arrays = infile["Events"].arrays(filter_name="Muon_*")
electron_branch_arrays = infile["Events"].arrays(filter_name="Electron_*")
muons = ak.zip({
"pt": muon_branch_arrays["Muon_pt"],
"phi": muon_branch_arrays["Muon_phi"],
"eta": muon_branch_arrays["Muon_eta"],
"mass": muon_branch_arrays["Muon_mass"],
"charge": muon_branch_arrays["Muon_charge"],
}, with_name="Momentum4D")
electrons = ak.zip({
"pt": electron_branch_arrays["Electron_pt"],
"phi": electron_branch_arrays["Electron_phi"],
"eta": electron_branch_arrays["Electron_eta"],
"mass": electron_branch_arrays["Electron_mass"],
"charge": electron_branch_arrays["Electron_charge"],
}, with_name="Momentum4D")
And this reproduces your plot:
quads = ak.combinations(muons, 4)
quad_charge = quads["0"].charge + quads["1"].charge + quads["2"].charge + quads["3"].charge
mu1, mu2, mu3, mu4 = ak.unzip(quads[quad_charge == 0])
plt.hist(ak.flatten((mu1 + mu2 + mu3 + mu4).mass), bins=100, range=(0, 200));
The quoted number slices (e.g. "0"
and "1"
) are picking tuple fields, rather than array entries; it's a manual ak.unzip. (The fields could have had real names if I had given a fields
argument to ak.combinations.)
For the two muons, two electrons case, let's make four distinct collections.
muons_plus = muons[muons.charge > 0]
muons_minus = muons[muons.charge < 0]
electrons_plus = electrons[electrons.charge > 0]
electrons_minus = electrons[electrons.charge < 0]
The ak.combinations function (with default axis=1
) returns the Cartesian product of each list in the array of lists with itself, excluding an item with itself, (x, x)
(unless you want that, specify replacement=True
), and excluding one of each symmetric pair (x, y)
/(y, x)
.
If you want just a plain Cartesian product of lists from one array with lists from another array, that's ak.cartesian. The four collections muons_plus
, muons_minus
, electrons_plus
, electrons_minus
are non-overlapping and we want each four-lepton group to have exactly one from each, so that's a plain Cartesian product.
mu1, mu2, e1, e2 = ak.unzip(ak.cartesian([muons_plus, muons_minus, electrons_plus, electrons_minus]))
plt.hist(ak.flatten((mu1 + mu2 + e1 + e2).mass), bins=100, range=(0, 200));
Separating particles by flavor (electron vs muon) but not by charge is an artifact of the typing constraints imposed by C++. Electrons are measured in different ways from muons, so they have different attributes in the dataset. In a statically typed language like C++, we couldn't put them in the same collection because they differ by type (have different attributes). But charges only differ by value (an integer), so there was no reason they had to be in different collections.
But now, the only thing distinguishing a type is (a) what attributes the objects have and (b) what objects with that set of attributes are named. Here, I named them "Momentum4D"
because that allowed Vector to recognize them as Lorentz vectors and give them Lorentz vector methods. But they could have had other attributes (e.g. e_over_p
for electrons, but not for muons). charge
is a non-Momentum4D attribute.
So if we're going to the trouble to put different-flavor leptons in different arrays, why not put different-charge leptons in different arrays, too? Physically, flavor and charge are both particle properties at the same level of distinction, so we probably want to do the analysis that way. No reason to let a C++ type distinction get in the way of the analysis logic!
Also, you probably want
nevents = infile["Events"].num_entries
to get the number of entries from a TTree without reading the array data from it.