So I'm given a TFile that contains two TTree objects, which contain track/tower pT, eta and phi, divided by events. My goals is to extract each track and tower in the event and then cluster whole event using the FastJet package. Now, if I'm doing this task using pure ROOT my analysis takes 30 minutes at max (with ~100 GB TFile). In the meanwhile, uproot will process only 10,000 events in this time limit...
It is apparent that I'm doing something wrongly, so I wanted to ask, what would be the proper way to access track-by-track information to get the same speed as in ROOT?
Uproot gets its efficiency from operating on many events per Python function call. The FastJet interface, last time I checked, would only accept one particle at a time: a Python function call for every particle in every event. Without even profiling it, I'd I suspect that this is the bottleneck.
There's another library called pyjet that improves upon this by feeding FastJet a whole event at a time. All the particles in one event are put into a large, contiguous NumPy array. Then, at least, there's only one Python function call per event.
To do multiple events per array would require jagged arrays (to indicate where one event stops and the next event begins). There have been some plans to link Awkward Array to FastJet to supply this functionality, but for now, pyjet is the best you can do. If you have many particles per event, like hundreds, this might be okay.