rggplot2regressionexponential

Error in lm.fit(x,y,offset = offset, singular.ok,...) for exponential lm() model in R


I am trying to apply a exponential regression on the given data frame :

df <- structure(list(Type_stat = c("Vitesse_min", "Vitesse_max", "Vitesse_min", 
"Vitesse_max", "Vitesse_min", "Vitesse_max", "Vitesse_min", "Vitesse_max", 
"Vitesse_min", "Vitesse_max", "Vitesse_min", "Vitesse_max", "Vitesse_min", 
"Vitesse_max"), x = c(1.21117779847283, 95.6649728882555, 0.697324074344613, 
106.970986891676, 0.920357699258461, 101.063665819829, 3.24997333259321, 
272.317116262567, 6.04974316697968, 202.804835071838, 10.0554146811207, 
361.891536087235, 30.7768185135629, 765.018255121896), y = c(0.546201232032854, 
0.546201232032854, 0.450376454483231, 0.450376454483231, 0.288843258042437, 
0.288843258042437, 0.108145106091718, 0.108145106091718, 0.0520191649555099, 
0.0520191649555099, 0.027378507871321, 0.027378507871321, 0.0082135523613963, 
0.0082135523613963)), row.names = c(NA, -14L), class = c("tbl_df", 
"tbl", "data.frame"))

Whichever, I try to use that code to obtain an unique regression :

exp.model <-lm(y ~ exp(x), df)

I obtain that error :

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  NA/NaN/Inf in 'x'

In fine I would like to have grouped regressions by "Type_stat" on a plot with geom_smooth() :

library(tidyverse)
df %>% ggplot() + geom_smooth(method="lm", aes(x = x, y = y, color=as.factor(Type_stat)), formula= (y ~ exp(x)), se=FALSE, na.rm = FALSE, linetype = 1)

But the error is similar in both case.


Solution

  • Your proximal problem is that exponentiating numbers as large as 765 gives an overflow error: exp(max(x$df)) is Inf.

    Based on what your data look like, I suspect it would be better to fit log(y) ~ x rather than y ~ exp(x). Note that this is a different model: your model corresponds to y = a + b*exp(x) + ɛ, my suggestion would be log(y) = a + b*x + ɛ (so that y = exp(a + b*x + ɛ) or y = exp(a)*exp(b*x)*exp(ɛ), i.e. a log-Normal/multiplicative model. In general the latter makes way more sense in applied contexts (i.e., we want to estimate the scale parameter in exp(c*x), not assume that the data are strictly exponential and multiplied by a scaling factor (c*exp(x)).

    If you really want to fit your model (a + b*exp(x) + ɛ) you can do it by shifting your x values by a constant: y = a + exp(c)*b*exp(x-c) + ɛ.

    m1 <- lm(log(y) ~ x, data = df); cc1 <- coef(m1)
    S <- 200 ## shift
    m2 <- lm(y ~ exp(x-S), data = df); cc2 <- coef(m2)
    plot(y ~ x, data = df)
    curve(exp(cc1[1] + cc1[2]*x), add = TRUE, col = 2, lwd = 2)
    curve(cc2[1] + exp(S)*cc2[2]*exp(x-S), add = TRUE, col = 4, lwd = 2)
    

    You can fit the model y = a*exp(b*x) + ɛ (Normal rather than log-Normal error) by using glm(y~x, family = gaussian(link = "log")); this will also work within ggplot2 ...

    ggplot(df, aes(x, y)) + 
       geom_smooth(method = "glm", method.args = list(family = gaussian(link = "log"))) + 
       geom_point() + 
       scale_y_continuous(limits=c(NA, 0.5), oob = scales::squish)
    

    (I truncated the y-axis scale because the confidence intervals explode at the upper end of the x-range.)

    Follow-up questions should probably go to Cross Validated, as I'm guessing that in reality this is more of a statistical/data analysis question than a programming one ...