pythonpandasroot-frameworkuproot

Writing TH3 histogram to ROOT file with uproot


I am working in jupyter notebook and have a pandas dataframe from which I would like to fill a ROOT TH3F histogram and save it to a ROOT file using uproot. I haven't been able to find much in the way of examples that would illustrate how to do this, but here is what I assume is the procedure:

  1. Declare a ROOT TH3F and iterate over the dataframe to fill the histogram.
  2. Open ("recreate") a new ROOT file with uproot and write this histogram to it.

Below is some example code that shows how I tried to go about it (incorrectly, because it segfaults).

import ROOT as R
import uproot as ur
import numpy as np
import pandas as pd

# Example dataframe
data = {
'x': [9.5, 5.0, 2.2, 8.1, 5.5, 1.4, 2.5, 9.2, 3.0, 7.9],
'y': [2.0, 5.7, 1.3, 9.1, 6.0, 6.2, 5.8, 1.8, 5.8, 3.1],
'z': [7.5, 4.1, 3.1, 1.6, 2.4, 8.2, 1.3, 4.4, 2.3, 5.0]
}
df = pd.DataFrame(data)

# Fill TH3F
xyz_hist = R.TH3F('xyz', 'xyz', 100, 0, 10, 100, 0, 10, 100, 0, 10)
for index, row in df.iterrows():
     xyz_hist.Fill(row['x'], row['y'], row['z'])

# Open file and write histogram
outfile = ur.recreate('outfile.root')
outfile['xyz'] = xyz_hist

Could someone please clarify what is the correct way to go about it? Or is this wrong because I am trying to use uproot for something that it wasn't intended/built for, and the solution is to just use ROOT for opening the file, storing the histogram, etc.?


Solution

  • I executed exactly your code and encountered no issues, regardless of whether I read the histogram back into ROOT:

    import ROOT
    f = ROOT.TFile("outfile.root")
    h = f.Get("xyz")
    h.Draw()
    

    or Uproot and hist:

    import uproot
    f = uproot.open("outfile.root")
    h = f["xyz"]
    h.to_hist()
    

    so you might just have an old version of one of the packages and are seeing a bug that was fixed since then. Here are the versions that successfully tested the above:


    More generally, I'd like to point out a few things.

    1. It's not necessary to iterate over the DataFrame with df.iterrows(); doing so defeats the purpose of putting data into arrays that can be manipulated with precompiled routines. I'll show an example in a moment.
    2. It's not necessary to make a ROOT object (TH3) in order to save the data with Uproot. If ROOT is available, you can save it with ROOT. Uproot is a pure Python alternative to ROOT, so there's not much point in mixing them (although it works because Uproot asks ROOT to serialize its data and Uproot knows how to deserialize it, so this TH3 is effectively being "saved" in memory and then loaded from that in-memory file, before Uproot writes it to a disk-file).

    Here's a way that it can be done entirely with Uproot and hist:

    import uproot
    from hist import Hist
    import pandas as pd
    
    data = {
        "x": [9.5, 5.0, 2.2, 8.1, 5.5, 1.4, 2.5, 9.2, 3.0, 7.9],
        "y": [2.0, 5.7, 1.3, 9.1, 6.0, 6.2, 5.8, 1.8, 5.8, 3.1],
        "z": [7.5, 4.1, 3.1, 1.6, 2.4, 8.2, 1.3, 4.4, 2.3, 5.0],
    }
    df = pd.DataFrame(data)
    
    xyz_hist = Hist.new.Reg(100, 0, 10).Reg(100, 0, 10).Reg(100, 0, 10).Double()
    xyz_hist.fill(data["x"], data["y"], data["z"])
    
    outfile = uproot.recreate("outfile.root")
    outfile["xyz"] = xyz_hist
    

    and here's how it can be done entirely with ROOT:

    import ROOT
    import pandas as pd
    
    data = {
        "x": [9.5, 5.0, 2.2, 8.1, 5.5, 1.4, 2.5, 9.2, 3.0, 7.9],
        "y": [2.0, 5.7, 1.3, 9.1, 6.0, 6.2, 5.8, 1.8, 5.8, 3.1],
        "z": [7.5, 4.1, 3.1, 1.6, 2.4, 8.2, 1.3, 4.4, 2.3, 5.0],
    }
    df = pd.DataFrame(data)
    
    rdf = ROOT.RDF.FromNumpy({
        "x": df["x"].values, "y": df["y"].values, "z": df["z"].values
    })
    h = rdf.Histo3D(("xyz", "", 100, 0, 10, 100, 0, 10, 100, 0, 10), "x", "y", "z")
    
    outfile = ROOT.TFile("outfile.root", "RECREATE")
    h.Write()
    outfile.Close()
    

    (In both cases, the Pandas DataFrame is also superfluous; both the Hist.fill and the ROOT.RDF.FromNumpy methods actually want NumPy arrays. In the ROOT case, I have to explicitly pull NumPy arrays out of the DataFrame. However, I assume that you have a reason for wanting to use Pandas that goes beyond this example.)