rexponentialnls

How to do exponential curve fitting for y = a*(1- exp(-b*t)


I have time series data where values increase over time (t) and then become stable: I would like to fit an exponential curve NPQ_Lss = a*(1- exp(-b*t). Later I would like to compare the fitted curves between populations (P1 and P2) to find out if they are significantly different.

How could I do so in R? I understand the limitation that I have few data points.

My data looks like this:

t NPQ_Lss Population
-2 1.777 P2
0 2.224 P2
3 2.492 P2
5 2.384 P2
7 2.407 P2
-2 1.719 P1
0 2.191 P1
3 2.418 P1
5 2.303 P1
7 2.403 P1

Solution

  • In principle,

    library(nlme)
    fit <- gnls(NPQ_Lss ~ SSasympOrig(t, a, log_b),
                data = DF,
                params = a + log_b ~ Population)
    

    should fit the model with population-specific parameters. (SS stands for "self-start", so you shouldn't need to provide starting values if your data are reasonably well-behaved.)

    But, it actually doesn't: the main problem here is that your data are actually a terrible match for the curve a*(1-exp(-b*t)), which assumes that the curves passes through y=0 at time 0. (Your curve is most of the way to its asymptote at time t=0.)

    Fitting to the data as a whole with SSasymp, which doesn't assume the curve goes through the origin, works OK:

    fit <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
                data = DF)
    

    Unfortunately this specification, which should work, doesn't (" supplied 3 starting values, need 6")

    fit2 <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
                data = DF,
                params = a + y0 + log_b ~ Population)
    

    Let's try giving it starting values

    fit2 <- gnls(NPQ_Lss ~ SSasymp(t, a, y0, log_b),
                data = DF,
                params = a + y0 + log_b ~ Population,
                start = c(rbind(coef(fit), 0)))
    

    The start value is a sneaky way of setting the starting values to {a, 0, y0, 0, log_b, 0}, knowing that this model will be parameterized according to {"a for population 1", "difference in a between pops", "y0 for P1", "y0 diff", "log_b for P1", "log_b diff"} ...

    Comparing the models with those fitted by @G.Grothendieck (using method = "plinear", which is a little more robust):

    newdf <- expand.grid(Population = c("P1", "P2"), t = seq(-3, 10, length = 51))
    df_pred <- (list(orig_0=fm1, orig_pop = fm2, flex_0 = fit, flex_pop = fit2)
        |> purrr::map_dfr(~ data.frame(newdf, NPQ_Lss = predict(., newdata = newdf)), .id = "model")
    )
    
    library(ggplot2); theme_set(theme_bw())
    ggplot(df_pred,
           aes(t, NPQ_Lss, colour = Population)) +
        geom_line(aes(linetype = model)) +
        geom_point(data = DF)
    ggsave("SO77667833.png")
    

    enter image description here

    The model that insists on fitting a curve through (0,0) gives a prediction (the concave-up/accelerating curve) that is technically correct (it's the least-squares fit of that model to these data), but ridiculous; the model (SSasymp) that doesn't constrain the fit to go through the origin gives reasonable results.

    summary() of the two-population model (fit2) will give you the p-values for the individual parameter differences; anova(fit, fit2) as suggested by @G.Grothendieck will test the differences in the parameters jointly.