This feature has been released as part of pandas 20.1 (on my birthday :] )
PR has been merged!
It seems like this question may have contributed to re-opening the PR for IntervalIndex in pandas.
I no longer have this problem, since I'm actually now querying for overlapping ranges from A
and B
, not points from B
which fall within ranges in A
, which is a full interval tree problem. I won't delete the question though, because I think it's still a valid question, and I don't have a good answer.
I have two dataframes.
In dataframe A
, two of the integer columns taken together represent an interval.
In dataframe B
, one integer column represents a position.
I'd like to do a sort of join, such that points are assigned to each interval they fall within.
Intervals are rarely but occasionally overlapping. If a point falls within that overlap, it should be assigned to both intervals. About half of points won't fall within an interval, but nearly every interval will have at least one point within its range.
I was initially going to dump my data out of pandas, and use intervaltree or banyan or maybe bx-python but then I came across this gist. It turns out that the ideas shoyer has in there never made it into pandas, but it got me thinking -- it might be possible to do this within pandas, and since I want this code to be as fast as python can possibly go, I'd rather not dump my data out of pandas until the very end. I also get the feeling that this is possible with bins
and pandas cut
function, but I'm a total newbie to pandas, so I could use some guidance! Thanks!
Potentially related? Pandas DataFrame groupby overlapping intervals of variable length
This feature is was released as part of pandas 20.1