I have time series data where values increase over time (t) and then become stable: I would like to fit an exponential curve NPQ_Lss = a*(1- exp(-b*t)
. Later I would like to compare the fitted curves between populations (P1
and P2
) to find out if they are significantly different.
How could I do so in R? I understand the limitation that I have few data points.
My data looks like this:
t | NPQ_Lss | Population |
---|---|---|
-2 | 1.777 | P2 |
0 | 2.224 | P2 |
3 | 2.492 | P2 |
5 | 2.384 | P2 |
7 | 2.407 | P2 |
-2 | 1.719 | P1 |
0 | 2.191 | P1 |
3 | 2.418 | P1 |
5 | 2.303 | P1 |
7 | 2.403 | P1 |
In principle,
library(nlme)
fit <- gnls(NPQ_Lss ~ SSasympOrig(t, a, log_b),
data = DF,
params = a + log_b ~ Population)
should fit the model with population-specific parameters. (SS
stands for "self-start", so you shouldn't need to provide starting values if your data are reasonably well-behaved.)
But, it actually doesn't: the main problem here is that your data are actually a terrible match for the curve a*(1-exp(-b*t))
, which assumes that the curves passes through y=0 at time 0. (Your curve is most of the way to its asymptote at time t=0
.)
Fitting to the data as a whole with SSasymp
, which doesn't assume the curve goes through the origin, works OK:
fit <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
data = DF)
Unfortunately this specification, which should work, doesn't (" supplied 3 starting values, need 6")
fit2 <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
data = DF,
params = a + y0 + log_b ~ Population)
Let's try giving it starting values
fit2 <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
data = DF,
params = a + y0 + log_b ~ Population,
start = c(rbind(coef(fit), 0)))
The start
value is a sneaky way of setting the starting values to {a, 0, y0, 0, log_b, 0}
, knowing that this model will be parameterized according to {"a for population 1", "difference in a between pops", "y0 for P1", "y0 diff", "log_b for P1", "log_b diff"}
...
Comparing the models with those fitted by @G.Grothendieck (using method = "plinear"
, which is a little more robust):
newdf <- expand.grid(Population = c("P1", "P2"), t = seq(-3, 10, length = 51))
df_pred <- (list(orig_0=fm1, orig_pop = fm2, flex_0 = fit, flex_pop = fit2)
|> purrr::map_dfr(~ data.frame(newdf, NPQ_Lss = predict(., newdata = newdf)), .id = "model")
)
library(ggplot2); theme_set(theme_bw())
ggplot(df_pred,
aes(t, NPQ_Lss, colour = Population)) +
geom_line(aes(linetype = model)) +
geom_point(data = DF)
ggsave("SO77667833.png")
The model that insists on fitting a curve through (0,0) gives a prediction (the concave-up/accelerating curve) that is technically correct (it's the least-squares fit of that model to these data), but ridiculous; the model (SSasymp
) that doesn't constrain the fit to go through the origin gives reasonable results.
summary()
of the two-population model (fit2
) will give you the p-values for the individual parameter differences; anova(fit, fit2)
as suggested by @G.Grothendieck will test the differences in the parameters jointly.