rggplot2area

Colour above and below intersecting curves separately using ggplot in R?


Other similar questions have been asked here, but they are slightly less complicated than my issue.

Im plotting two curves geom_line using ggplot2 in R. Let's call curve 1 the "baseline". Curve 2 intersects curve 1 at multiple points, and can be above the baseline or below the baseline at many points. Im trying to calculate the areas between the curves and then plot them!

Im trying to calculate:

1: The total area in between the baseline and curve 2

2: The Area between the curve 2 and baseline that is above the baseline

3: The Area between the curve 2 and baseline that is below the baseline.

I have some working code (that may or may not be correct!) to calculate the areas, but now I would like to plot the curves and the area where curve 2 is above the baseline, I want to colour it blue. And when curve 2 is below the baseline, I want it coloured red. Hopefully my example below clears up what Im trying to achieve.

The 1st example is a simple example where my method works:

library(dplyr)
library(ggplot2)

df <- data.frame(time = c(0,1,2,3),
                 baseline = c(50, 50, 10, 10),
                 curve2 = c(40,50,50,10))

calculate_area_between_curves <- function(df, x_col, y1_col, y2_col) {
  df <- df %>% arrange(!!sym(x_col))
  
  x <- df[[x_col]]
  y1 <- df[[y1_col]]
  y2 <- df[[y2_col]]
  
  dx <- diff(x)
  
  # Calculate areas
  total_area <- sum(abs((y1[-1] + y1[-length(y1)])/2 - (y2[-1] + y2[-length(y2)])/2) * dx)
  blue_area <- sum(pmax((y1[-1] + y1[-length(y1)])/2 - (y2[-1] + y2[-length(y2)])/2, 0) * dx)
  red_area <- sum(pmax((y2[-1] + y2[-length(y2)])/2 - (y1[-1] + y1[-length(y1)])/2, 0) * dx)
  
  return(list(total = total_area, blue = blue_area, red = red_area))
}

# Calculate areas
areas <- calculate_area_between_curves(df, "time", "baseline", "curve2")
> areas
$total
[1] 45

$blue
[1] 5

$red
[1] 40

# Plot 
ggplot(df, aes(time)) +
  geom_ribbon(aes(ymin = pmin(baseline, curve2), ymax = baseline), fill = "blue", alpha = 0.3) +
  geom_ribbon(aes(ymin = baseline, ymax = pmax(baseline, curve2)), fill = "red", alpha = 0.3) +
  geom_line(aes(y = baseline), col = 'red') +
  geom_line(aes(y = curve2), col = 'blue') + 
  theme_bw()

coloured_area_1

It seems like my area function has calculated the area correctly and was able to colour the areas in the plot correctly. However, if I change the dataframe to something slightly more complex:

df <- data.frame(time = c(0, 120, 300, 600, 900),
                 baseline = c(100, 62.3, 56.7, 47.9, 44.7),
                 curve2 = c(92.2, 58.7, 58.2, 52.4, 51.1))

And run the same code, I now get these areas and this plot:

> areas
$total
[1] 3408

$blue
[1] 873

$red
[1] 2535

coloured_area_2

The issue is, I dont know if the areas are correct for the more complex data and as you can see, the coloured areas are spilling outside of their lines at the intersection point. And for some of the other plots in my data, curve2 intersects the baseline many times.

Any suggestions as to how I could fix this?


Solution

  • The problem in your second example is that where the two lines intersect is not one of the points in your dataframe, like it is in your first example.

    One solution would be to add all the points where the lines intersect to your dataframe. Something like this:

    x <- df$time
    y1 <- df$baseline
    y2 <- df$curve2
    
    xp <- (y1[-length(y1)]-y2[-length(y2)])*diff(x) / (diff(y2)-diff(y1)) + x[-length(x)]
    yp <- y1[-length(y1)] + diff(y1)/diff(x) * (xp-x[-length(x)])
    
    intersections <- data.frame(time = xp, baseline = yp, curve2 = yp) %>%
      .[(xp>x[-length(x)])&(xp<x[-1]),]
    df <- df %>%
      rbind(., intersections) %>%
      arrange(time)
    

    Which gives you slightly different areas. They're higher, not lower, than before because before, it was counting some negative area, in a sense, at the intersection.

    > areas
    $total
    [1] 3487.412
    
    $blue
    [1] 912.7059
    
    $red
    [1] 2574.706
    

    And the graph:

    enter image description here

    Alternatively, if your data include many more datapoints than these examples, the difference this makes may be negligible, and the overlapping at the intersections might not be noticeable.