rintervals

How to classify intervals according to their overlapping?


I am looking for an algorithm that classifies (with letters) a set of intervals according to their overlap.

Graphically the problem is as follows:

enter image description here

I am working in R and my intervals are these:

structure(list(Interval = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Start = c(5.3, 
6.5, 7.6, 7.8, 8, 8.3, 8.5, 8.7, 8.8, 9.9), End = c(7.5, 8.7, 
9.8, 10, 10.2, 10.5, 10.7, 10.9, 11, 12.1)), row.names = c(NA, 
-10L), spec = structure(list(cols = list(Interval = structure(list(), class = c("collector_double", 
"collector")), Start = structure(list(), class = c("collector_double", 
"collector")), End = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = "\t"), class = "col_spec"), class = c("spec_tbl_df","tbl_df", "tbl", "data.frame"))

I think the ivs package may have the solution but I don't know the procedure.


Solution

  • I think this function does what you need.

    label_overlaps <- function(data, Start = "Start", End = "End", 
                               label = "label", labs = letters) {
      data <- data[order(data[[Start]]), ]
      data[[label]] <- ""
      for(i in labs) {
        n <- which(data[[label]] == "")
        if(length(n) == 0) break
        n <- n[1]
        m <- which(data[[Start]] < data[[End]][n] & data[[End]] > data[[Start]][n])
        data[[label]][c(n, m)] <- paste0(data[[label]][c(n, m)], i)
      }
      if(any(!nzchar(data[[label]]))) warning("All labels exhausted")
      return(data)
    }
    

    Use is very simple:

    label_overlaps(df)
    #> # A tibble: 10 x 4
    #>    Interval Start   End label
    #>       <dbl> <dbl> <dbl> <chr>
    #>  1        1   5.3   7.5 a    
    #>  2        2   6.5   8.7 ab   
    #>  3        3   7.6   9.8 b    
    #>  4        4   7.8  10   bc   
    #>  5        5   8    10.2 bc   
    #>  6        6   8.3  10.5 bc   
    #>  7        7   8.5  10.7 bc   
    #>  8        8   8.7  10.9 bc   
    #>  9        9   8.8  11   bc   
    #> 10       10   9.9  12.1 c
    

    If we use data that matches your sample image, we get:

    library(geomtextpath)
    
    df <- data.frame(Interval = c(3, 2, 1, 3, 2, 2),
                     Start = c(1, 3, 7, 14, 15, 18.3),
                     End = c(4, 13, 10, 17, 17.7, 22))
    
    ggplot(label_overlaps(df), aes(Start, Interval, xend = End, yend = Interval)) +
      geom_segment(linewidth = 9, lineend = "round") +
      geom_textsegment(aes(label = label, group = seq_along(label)), 
                       textcolour = "black", gap = FALSE, 
                       linewidth = 8, lineend = "round", color = "#bffec0") +
      coord_cartesian(clip = "off") +
      theme_void(base_size = 16) +
      theme(aspect.ratio = 1/8)
    

    enter image description here

    And with your actual data, we get:

    ggplot(label_overlaps(df), aes(Start, Interval)) +
      geom_textsegment(aes(label = label, xend = End, yend = Interval, 
                           color = label), textcolour = "black", gap = FALSE, 
                       vjust = -1, linewidth = 6, lineend = "round") +
      scale_color_manual(values = c("red", "orangered", "orange",
                                    "green4", "dodgerblue"), guide = "none") +
      theme_minimal(base_size = 16)
    

    enter image description here