rbioinformaticsoverlapgenomicranges

How to get findOverlapped region?


Hi i am working with GRanges and finding the overlaps using findOverlaps function of IRanges. I am getting the hits of which query and subject are overlapped,but I want to also have the coordinates of query and subject where they are overlapped and so I can retrieve the sequence of it.

How can get the coordinates of both subject and query where they are overlapped. I am using following function :

library(GenomicRanges)
library(regioneR) # toGRanges

fo <- findOverlaps(query = toGRanges(df1),subject =  toGRanges(df2),type = "within")
df1 <- structure(list(df1c = c("chr2", "chr2", "chr2", "chr2"), df1c2 = c(2800, 
3600, 3719, 3893), df1c3 = c(3270, 4152, 5092, 4547)), class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(df2c = c("chr2", "chr2", "chr2", "chr2", "chr2L"
), df2c2 = c(263, 342, 424, 846, 1030), df2c3 = c(20091, 17222, 
2612, 4265, 11575)), class = "data.frame", row.names = c(NA, 
-5L))


The expected output should be like 

chr  CoDF1     CoDF2 
 1   100-200   90-210
 1  150-280   100-285

CoDF1 = Coordinates of df1 file where its overlapped with df2 reads
CoDF2 = Coordinates of df1 file where its overlapped with df1 reads

Solution

  • You'd better use intersect() :

    > intersect(toGRanges(df1),toGRanges(df2))
    
    GRanges object with 2 ranges and 0 metadata columns:
          seqnames    ranges strand
             <Rle> <IRanges>  <Rle>
      [1]     chr2 2800-3270      *
      [2]     chr2 3600-5092      *
      -------
      seqinfo: 2 sequences from an unspecified genome; no seqlengths
    

    But pay attention that your data.frames colnames are not correct to create GRanges object, they should be seqnames/start/end

    EDITED :

    To see all intersections of all coordinates:

    intersection = findOverlaps(query = toGRanges(df1), subject = toGRanges(df2), type = "any")
    df = data.frame(df1[queryHits(intersection),], df2[subjectHits(intersection),])
    df
        seqnames start  end seqnames.1 start.1 end.1
    1       chr2  2800 3270       chr2     263 20091
    1.1     chr2  2800 3270       chr2     342 17222
    1.2     chr2  2800 3270       chr2     846  4265
    2       chr2  3600 4152       chr2     263 20091
    2.1     chr2  3600 4152       chr2     342 17222
    2.2     chr2  3600 4152       chr2     846  4265
    3       chr2  3719 5092       chr2     263 20091
    3.1     chr2  3719 5092       chr2     342 17222
    3.2     chr2  3719 5092       chr2     846  4265
    4       chr2  3893 4547       chr2     263 20091
    4.1     chr2  3893 4547       chr2     342 17222
    4.2     chr2  3893 4547       chr2     846  4265