I'm trying to output a TTree with the same general format as an input TTree which has the structure:
ttree.show(datatypes.keys())
name | typename | interpretation
---------------------+--------------------------+-------------------------------
Weight | float | AsDtype('>f4')
E_Beam | float | AsDtype('>f4')
Px_Beam | float | AsDtype('>f4')
Py_Beam | float | AsDtype('>f4')
Pz_Beam | float | AsDtype('>f4')
NumFinalState | int32_t | AsDtype('>i4')
E_FinalState | float[] | AsJagged(AsDtype('>f4'))
Px_FinalState | float[] | AsJagged(AsDtype('>f4'))
Py_FinalState | float[] | AsJagged(AsDtype('>f4'))
Pz_FinalState | float[] | AsJagged(AsDtype('>f4'))
The NumFinalState
branch contains the number of elements in all of the *_FinalState
array branches. This will always be the case in my work, so it seems wasteful to do the following:
outfile = uproot.recreate("myData_OUT.root")
datatypes = {"Weight": "float32", "E_Beam": "float32", "Px_Beam": "float32", "Py_Beam": "float32", "Pz_Beam": "float32", "NumFinalState": "int32", "E_FinalState": "var * float32", "Px_FinalState": "var * float32", "Py_FinalState": "var * float32", "Pz_FinalState": "var * float32"}
outfile.mktree("kin", datatypes)
outfile["kin"].show()
name | typename | interpretation
---------------------+--------------------------+-------------------------------
Weight | float | AsDtype('>f4')
E_Beam | float | AsDtype('>f4')
Px_Beam | float | AsDtype('>f4')
Py_Beam | float | AsDtype('>f4')
Pz_Beam | float | AsDtype('>f4')
NumFinalState | int32_t | AsDtype('>i4')
nE_FinalState | int32_t | AsDtype('>i4')
E_FinalState | float[] | AsJagged(AsDtype('>f4'))
nPx_FinalState | int32_t | AsDtype('>i4')
Px_FinalState | float[] | AsJagged(AsDtype('>f4'))
nPy_FinalState | int32_t | AsDtype('>i4')
Py_FinalState | float[] | AsJagged(AsDtype('>f4'))
nPz_FinalState | int32_t | AsDtype('>i4')
Pz_FinalState | float[] | AsJagged(AsDtype('>f4'))
In the documentation, it appears I can use the counter_name
argument in mktree
to give the counter branches custom names, but it seems to run into trouble if I try to give them the same name:
outfile = uproot.recreate("myData_OUT.root")
datatypes = {"Weight": "float32", "E_Beam": "float32", "Px_Beam": "float32", "Py_Beam": "float32", "Pz_Beam": "float32", "NumFinalState": "int32", "E_FinalState": "var * float32", "Px_FinalState": "var * float32", "Py_FinalState": "var * float32", "Pz_FinalState": "var * float32"}
def counter_name(in_str: str) -> str:
if "FinalState" in in_str:
return "NumFinalState"
return f"n{in_str}"
outfile.mktree("kin", datatypes, counter_name=counter_name)
This code throws an error:
---------------------------------------------------------------------------
error Traceback (most recent call last)
/var/folders/td/j379rd296477649qvl1k8n180000gn/T/ipykernel_24226/1447106219.py in <module>
5 return "NumFinalState"
6 return f"n{in_str}"
----> 7 outfile.mktree("kin", datatypes, counter_name=counter_name)
/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/uproot/writing/writable.py in mktree(self, name, branch_types, title, counter_name, field_name, initial_basket_capacity, resize_factor)
1268 path,
1269 directory._file,
-> 1270 directory._cascading.add_tree(
1271 directory._file.sink,
1272 treename,
/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/uproot/writing/_cascade.py in add_tree(self, sink, name, title, branch_types, counter_name, field_name, initial_basket_capacity, resize_factor)
1796 resize_factor,
1797 )
-> 1798 tree.write_anew(sink)
1799 return tree
1800
/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/uproot/writing/_cascadetree.py in write_anew(self, sink)
1114 # reference to fLeafCount
1115 out.append(
-> 1116 uproot.deserialization._read_object_any_format1.pack(
1117 datum["counter"]["tleaf_reference_number"]
1118 )
error: required argument is not an integer
and I figure this error is related to uproot trying to make two branches with the same name. Is there any way to get around this? It'll probably be okay to just create a NumFinalState
branch manually, since it gets read in by a subsequent program, but just in terms of compactness, it would be nice to not create a bunch of unnecessary branches.
Uproot makes one counter branch for each Awkward Array in the dict it's given. Since your Awkward Arrays are arrays of lists of numbers, they're all presumed to have different counters. There isn't a way to manually force them to share a counter; the way it's supposed to work is to join them all into one Awkward Array, which Uproot will recognize as something that should have one counter.
So suppose you have
>>> import awkward as ak
>>> import uproot
>>> E_FinalState = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
>>> Px_FinalState = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
>>> Py_FinalState = ak.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
>>> Pz_FinalState = ak.Array([[11, 22, 33], [], [44, 55]])
Passing each of these individually into the output TTree will make a nE_finalState
, nPx_FinalState
, etc., as you've seen. So make them one array with ak.zip:
>>> finalstate = ak.zip({"E": E_FinalState, "px": Px_FinalState, "py": Py_FinalState, "pz": Pz_FinalState})
>>> finalstate
<Array [[{E: 1.1, px: 1.1, ... pz: 55}]] type='3 * var * {"E": float64, "px": fl...'>
>>> print(finalstate.type)
3 * var * {"E": float64, "px": float64, "py": float64, "pz": int64}
The key thing is that the type is now number of entries * var * {record}
, rather than number of entries * var * float64
, individually for each array. It's in the ak.zip
function that you find out whether they really do have the same number of entries. (It's a one-time cross-check; the creation of the finalstate
array is itself zero-copy.)
Now you can use this when writing a TTree:
>>> outfile = uproot.recreate("myData_OUT.root")
>>> outfile["kin"] = {"finalstate": finalstate}
>>> outfile["kin"].show()
name | typename | interpretation
---------------------+--------------------------+-------------------------------
nfinalstate | int32_t | AsDtype('>i4')
finalstate_E | double[] | AsJagged(AsDtype('>f8'))
finalstate_px | double[] | AsJagged(AsDtype('>f8'))
finalstate_py | double[] | AsJagged(AsDtype('>f8'))
finalstate_pz | int64_t[] | AsJagged(AsDtype('>i8'))
The counter_name
and field_name
arguments only control the generation of names. By default, they follow a convention in which the "finalstate"
name in the dict gets prepended by "n"
for the counter and appended by "_"
and the name of each field for the branches (CMS NanoAOD conventions). Those arguments exist so that you can apply a different naming convention, but they don't actually change which counter branches get created. In fact, defining these functions so that they produce the same name might trigger a confusing error message like this—I don't think it's an explicitly checked case.
Oh, and you should also be able to use the uproot.WritableDirectory.mktree constructor (which makes a TTree without data, so that each uproot.WritableTree.extend call can be like each other). The dict syntax would be
>>> outfile.mktree("kin", {"finalstate": finalstate.type})
<WritableTree '/kin' at 0x7f48ff139550>
>>> outfile["kin"].extend({"finalstate": finalstate})
i.e. use the finalstate.type
, rather than the finalstate
array itself.