I am using uproot to convert a ROOT.TTree into a pandas.dataframe. The structure of the dataframe can be seen below. Note that ‘met’ is an entry level variable, while ‘mu_cells_*’ is a subentry level variable.
Now I want to create a ROOT.TH1 histogram of 'met'. I have asked in the root forum that this can only be done by looping over the dataframe and do ROOT.TH1.Fill() for every entry (not sub-entry to avoid multiple counting), see link. I'd like to ask, what's the best way to do this?
Similarly how do I make a TH1 of ‘mu_cells_e’ now that it has to loop over sub-entry?
Best,
Yosse
met mu_cells_e mu_cells_side mu_cells_tower
entry subentry
0 0 71755.648438 179.995682 -1 6
1 71755.648438 -308.388519 -1 7
2 71755.648438 15.558195 -1 8
3 71755.648438 252.033691 -1 6
4 71755.648438 459.172119 -1 7
... ... ... ... ...
7107 22 26328.087891 611.708374 1 4
23 26328.087891 -13.317616 1 6
24 26328.087891 12.681366 1 2
25 26328.087891 -4.776075 1 4
26 26328.087891 -17.860764 1 6
[173410 rows x 4 columns]
You'll need to pull out a Series first for any further computation, because ROOT, boost-histogram, or any other tool will not know about Pandas sub-indexing. That can be done like this:
mu_cells_side = frame.mu_cells_side.xs(0, level='subentry')
Now you can use the TH1's .FillN(len(mu_cells_side), mu_cells_side, ROOT.nullptr)
or boost-histogram's fill or NumPy, as it is a normal array at this point (and feel free to call mu_cells_side = np.asarray(mu_cells_side)
if any of those care about it being a true np array, but I don't think they do). This will be much faster than trying to loop in Python.
Having a MWE would have been useful for setting up a similar DataFrame:
import pandas as pd
indarr = [[0, 0, 1, 1, 2, 2, 2, 3],
[0, 1, 0, 1, 0, 1, 2, 0]]
ind = pd.MultiIndex.from_tuples(list(zip(*indarr)), names=['entry', 'subentry'])
f = pd.DataFrame({"mu_cells_side":[2,2,3,3,1,1,1,8] , "mu_cells_tower":[1,2,3,4,5,6,7,8]}, index=ind)