rggplot2ecdf

Adding more information to an eCDF plot in R


I used the following code to get a eCDF plot:

df %>% group_by(group1, group2) %>%   
summarise(n = length(unique(sessionID))) %>%
ggplot(aes(n)) +    
stat_ecdf(geom = "step") +
scale_x_continuous(n.breaks = 30) +   
theme_classic()

enter image description here

I would like to add quartiles, mean, and median to the plot, somewhat similar to the plot below.

enter image description here


Solution

  • My suggestion is to just calculate the quantiles outside the plot and use an extra data.frame as input to the annotation layers.

    library(ggplot2)
    
    # Dummy data
    df <- data.frame(n = rpois(100, lambda = 5))
    
    # Quantiles
    q <- c(0.2, 0.5, 0.8)
    
    # data.frame for quantiles
    qdf <- data.frame(
      q = factor(q),
      x = c(rep(-Inf, length(q)), rep(quantile(df$n, q), 2)),
      y = c(rep(q, 2), rep(-Inf, length(q)))
    )
    
    ggplot(df, aes(n)) +
      stat_ecdf() +
      # Line segments to/from the ecdf line
      geom_path(
        data = qdf,
        aes(x = x, y = y, colour = q),
        linetype = "dotted"
      ) +
      # Labels at x
      geom_text(
        data = subset(qdf, is.finite(x) & is.finite(y)),
        aes(x = x, y = 0.5 * y, label = x, colour = q),
        hjust = -1
      ) +
      # Labels at y
      geom_text(
        data = subset(qdf, is.finite(x) & is.finite(y)),
        aes(x - 0.5 * min(x), y = y, label = y, colour = q),
        vjust = -1
      )
    

    Created on 2022-10-27 by the reprex package (v2.0.0)