rggplot2smoothingloess

Forcing a slope on a loess smooth


I have a set of two variables measured over time:

data <- structure(list(
time = c(0, 30, 60, 90, 120, 150, 210, 270, 330, 1350, 1470, 1710), 
var1 = c(0.963, 0.942, 0.921, 0.933, 0.868, 0.727, 0.742, 0.652, 0.485, 0.402, 0.401, 0.376), 
var2 = c(0.887, 0.897, 0.839, 0.812, 0.779, 0.744, 0.589, 0.495, 0.227, 0.185, 0.153, 0.119)),
class = c("tbl_df", "tbl", "data.frame"), 
row.names = c(NA, -12L)
)

When plotting, the loess smooth dips below zero:

ggplot(data, aes(x = time)) +
  geom_point(aes(y = var1), color = "steelblue") +
  geom_point(aes(y = var2), color = "darkorange") +
  geom_smooth(aes(y = var1, color = "Water activity"), method = "loess", se = F) +
  geom_smooth(aes(y = var2, color = "Dryweight/100"), method = "loess", se = F)

loess smooth plot output from previous code showing line dipping below 0

This presents two problems:

  1. The data I have CANNOT assume negative values and
  2. The slope changing sign (from -ve to +ve to -ve again) is not a representation of the truth.

The true curve should flatten over time, keeping a negative slope until then.

I know that this is mostly due to loess inferring "missing values" in the long gap between the 9th (time = 330) and 10th (time = 1350) measurements, and I have tried playing with the span parameter of the geom_smooth()smooth function:

ggplot(df_s, aes(x = time)) +
  geom_point(aes(y = var1), color = "steelblue") +
  geom_point(aes(y = var2), color = "darkorange") +
  geom_smooth(aes(y = var1, color = "Water activity"), method = "loess", span = 5, se = F) +
  geom_smooth(aes(y = var2, color = "Dryweight/100"), method = "loess", span = 5, se = F)

which is somewhat better as it does not dip below zero. However, the slope does still changes sign: loess smooth plot output from previous code with span parameter set to 5 showing line changing sign

How do I represent the true nature of my data (a curve, with a very negative slope changing to an almost flat line) without inserting dummy measurements? Should I use a different function?


Solution

  • We can incorporate your assumptions (monotonic decreasing) with a better fitting function, e.g. Monotonic GAM.

    Code would look something like:

    library(scam)
    
    fit1 <- scam(var1 ~ s(time, bs = "mpd", k = 10), data = data)
    fit2 <- scam(var2 ~ s(time, bs = "mpd", k = 10), data = data)
    
    # Prediction grid
    new_time <- data.frame(time = seq(min(data$time), max(data$time), length.out = 200))
    pred1 <- data.frame(time = new_time$time, var = predict(fit1, new_time), series = "Water activity")
    pred2 <- data.frame(time = new_time$time, var = predict(fit2, new_time), series = "Dryweight/100")
    pred <- rbind(pred1, pred2)
    
    ggplot() +
      geom_point(data = data, aes(x = time, y = var1), color = "steelblue") +
      geom_point(data = data, aes(x = time, y = var2), color = "darkorange") +
      geom_line(data = pred, aes(x = time, y = var, color = series), linewidth = 1) +
      scale_color_manual(values = c("Water activity" = "steelblue", "Dryweight/100" = "darkorange")) +
      labs(x = "Time", y = "Value", color = "Variable") +
      theme_minimal()
    

    which gives me this, which looks to fit your requirements
    enter image description here