Hi i am working with GRanges and finding the overlaps using findOverlaps function of IRanges. I am getting the hits of which query and subject are overlapped,but I want to also have the coordinates of query and subject where they are overlapped and so I can retrieve the sequence of it.
How can get the coordinates of both subject and query where they are overlapped. I am using following function :
library(GenomicRanges)
library(regioneR) # toGRanges
fo <- findOverlaps(query = toGRanges(df1),subject = toGRanges(df2),type = "within")
df1 <- structure(list(df1c = c("chr2", "chr2", "chr2", "chr2"), df1c2 = c(2800,
3600, 3719, 3893), df1c3 = c(3270, 4152, 5092, 4547)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(df2c = c("chr2", "chr2", "chr2", "chr2", "chr2L"
), df2c2 = c(263, 342, 424, 846, 1030), df2c3 = c(20091, 17222,
2612, 4265, 11575)), class = "data.frame", row.names = c(NA,
-5L))
The expected output should be like
chr CoDF1 CoDF2
1 100-200 90-210
1 150-280 100-285
CoDF1 = Coordinates of df1 file where its overlapped with df2 reads
CoDF2 = Coordinates of df1 file where its overlapped with df1 reads
You'd better use intersect()
:
> intersect(toGRanges(df1),toGRanges(df2))
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 2800-3270 *
[2] chr2 3600-5092 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
But pay attention that your data.frames colnames are not correct to create GRanges object, they should be seqnames/start/end
EDITED :
To see all intersections of all coordinates:
intersection = findOverlaps(query = toGRanges(df1), subject = toGRanges(df2), type = "any")
df = data.frame(df1[queryHits(intersection),], df2[subjectHits(intersection),])
df
seqnames start end seqnames.1 start.1 end.1
1 chr2 2800 3270 chr2 263 20091
1.1 chr2 2800 3270 chr2 342 17222
1.2 chr2 2800 3270 chr2 846 4265
2 chr2 3600 4152 chr2 263 20091
2.1 chr2 3600 4152 chr2 342 17222
2.2 chr2 3600 4152 chr2 846 4265
3 chr2 3719 5092 chr2 263 20091
3.1 chr2 3719 5092 chr2 342 17222
3.2 chr2 3719 5092 chr2 846 4265
4 chr2 3893 4547 chr2 263 20091
4.1 chr2 3893 4547 chr2 342 17222
4.2 chr2 3893 4547 chr2 846 4265