I am working in jupyter notebook and have a pandas dataframe from which I would like to fill a ROOT TH3F histogram and save it to a ROOT file using uproot. I haven't been able to find much in the way of examples that would illustrate how to do this, but here is what I assume is the procedure:
Below is some example code that shows how I tried to go about it (incorrectly, because it segfaults).
import ROOT as R
import uproot as ur
import numpy as np
import pandas as pd
# Example dataframe
data = {
'x': [9.5, 5.0, 2.2, 8.1, 5.5, 1.4, 2.5, 9.2, 3.0, 7.9],
'y': [2.0, 5.7, 1.3, 9.1, 6.0, 6.2, 5.8, 1.8, 5.8, 3.1],
'z': [7.5, 4.1, 3.1, 1.6, 2.4, 8.2, 1.3, 4.4, 2.3, 5.0]
}
df = pd.DataFrame(data)
# Fill TH3F
xyz_hist = R.TH3F('xyz', 'xyz', 100, 0, 10, 100, 0, 10, 100, 0, 10)
for index, row in df.iterrows():
xyz_hist.Fill(row['x'], row['y'], row['z'])
# Open file and write histogram
outfile = ur.recreate('outfile.root')
outfile['xyz'] = xyz_hist
Could someone please clarify what is the correct way to go about it? Or is this wrong because I am trying to use uproot for something that it wasn't intended/built for, and the solution is to just use ROOT for opening the file, storing the histogram, etc.?
I executed exactly your code and encountered no issues, regardless of whether I read the histogram back into ROOT:
import ROOT
f = ROOT.TFile("outfile.root")
h = f.Get("xyz")
h.Draw()
or Uproot and hist:
import uproot
f = uproot.open("outfile.root")
h = f["xyz"]
h.to_hist()
so you might just have an old version of one of the packages and are seeing a bug that was fixed since then. Here are the versions that successfully tested the above:
More generally, I'd like to point out a few things.
df.iterrows()
; doing so defeats the purpose of putting data into arrays that can be manipulated with precompiled routines. I'll show an example in a moment.Here's a way that it can be done entirely with Uproot and hist:
import uproot
from hist import Hist
import pandas as pd
data = {
"x": [9.5, 5.0, 2.2, 8.1, 5.5, 1.4, 2.5, 9.2, 3.0, 7.9],
"y": [2.0, 5.7, 1.3, 9.1, 6.0, 6.2, 5.8, 1.8, 5.8, 3.1],
"z": [7.5, 4.1, 3.1, 1.6, 2.4, 8.2, 1.3, 4.4, 2.3, 5.0],
}
df = pd.DataFrame(data)
xyz_hist = Hist.new.Reg(100, 0, 10).Reg(100, 0, 10).Reg(100, 0, 10).Double()
xyz_hist.fill(data["x"], data["y"], data["z"])
outfile = uproot.recreate("outfile.root")
outfile["xyz"] = xyz_hist
and here's how it can be done entirely with ROOT:
import ROOT
import pandas as pd
data = {
"x": [9.5, 5.0, 2.2, 8.1, 5.5, 1.4, 2.5, 9.2, 3.0, 7.9],
"y": [2.0, 5.7, 1.3, 9.1, 6.0, 6.2, 5.8, 1.8, 5.8, 3.1],
"z": [7.5, 4.1, 3.1, 1.6, 2.4, 8.2, 1.3, 4.4, 2.3, 5.0],
}
df = pd.DataFrame(data)
rdf = ROOT.RDF.FromNumpy({
"x": df["x"].values, "y": df["y"].values, "z": df["z"].values
})
h = rdf.Histo3D(("xyz", "", 100, 0, 10, 100, 0, 10, 100, 0, 10), "x", "y", "z")
outfile = ROOT.TFile("outfile.root", "RECREATE")
h.Write()
outfile.Close()
(In both cases, the Pandas DataFrame is also superfluous; both the Hist.fill
and the ROOT.RDF.FromNumpy
methods actually want NumPy arrays. In the ROOT case, I have to explicitly pull NumPy arrays out of the DataFrame. However, I assume that you have a reason for wanting to use Pandas that goes beyond this example.)