rcumulative-frequency

density of ongoing events from density of starting time


I have a data frame containing a column of starting times of event A, and length of event A in hours, like so:

df = structure(list(StartTime = c(10.1401724605821, 8.34114734060131, 
10.1930766354781, 9.49644518946297, 9.36002452136017, 10.8311833878979, 
9.44229844841175, 8.48090101312846, 9.31779155065306, 9.57179348240606
), Length = c(3.28013235144317, 3.97817114274949, 4.29317499510944, 
2.63135516550392, 3.49188423063606, 4.08827690966427, 3.63062007538974, 
3.82309223059565, 1.52407871372998, 1.80725628975779)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

In practice, df contains thousands of records. I would like to calculate the density (or histogram - but density makes more sense due to the fact that in each increment of time there are many events) of the number of ongoing events. So for example, in an event started at 8.02, and takes 1 hour, then this record contributes one count of ongoing operation at 8.03, 8.04...9.02. Each record similarly contributes to many times.

What is the best way of approaching this?


Solution

  • Here's a tidyverse solution:

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
    df %>% 
      mutate(end = StartTime + Length) %>% 
      pivot_longer(c("StartTime", "end")) %>%
      arrange(value) %>%
      mutate(active = cumsum(2 * (name == "StartTime") - 1)) %>%
      ggplot(aes(value, active)) +
      geom_step() +
      labs(x = "time", y = "count")
    

    Created on 2020-10-16 by the reprex package (v0.3.0)