Pyranges class from similarly named package has two methods with slightly different functionality:
intersect and
overlap.
Intersect method description is quite similar to overlap's one: Return overlapping subintervals.
vs Return overlapping intervals.
I can't quite glimpse the difference between those two (Yeah, I noticed that sub
prefix).
Is overlap
intended to reveal full intervals that do overlap at least at one position?
Setup:
>>> import pyranges as pr
>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
... "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 1 | 3 | a |
| chr1 | 4 | 9 | b |
| chr1 | 10 | 11 | c |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome | Start | End |
| (category) | (int32) | (int32) |
|--------------+-----------+-----------|
| chr1 | 2 | 3 |
| chr1 | 2 | 9 |
| chr1 | 9 | 10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
With overlap
, you get back the intervals in self that overlapped those in other. If an interval overlapped more than once, it is still only returned once (by default):
>>> gr.overlap(gr2)
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 1 | 3 | a |
| chr1 | 4 | 9 | b |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
With intersect
the returned intervals are the intersection of the overlapping intervals in self and other. All overlaps are returned by default:
>>> gr.intersect(gr2)
+--------------+-----------+-----------+------------+
| Chromosome | Start | End | ID |
| (category) | (int32) | (int32) | (object) |
|--------------+-----------+-----------+------------|
| chr1 | 2 | 3 | a |
| chr1 | 2 | 3 | a |
| chr1 | 4 | 9 | b |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
See the docs for more info: