I am trying to apply a exponential regression on the given data frame :
df <- structure(list(Type_stat = c("Vitesse_min", "Vitesse_max", "Vitesse_min",
"Vitesse_max", "Vitesse_min", "Vitesse_max", "Vitesse_min", "Vitesse_max",
"Vitesse_min", "Vitesse_max", "Vitesse_min", "Vitesse_max", "Vitesse_min",
"Vitesse_max"), x = c(1.21117779847283, 95.6649728882555, 0.697324074344613,
106.970986891676, 0.920357699258461, 101.063665819829, 3.24997333259321,
272.317116262567, 6.04974316697968, 202.804835071838, 10.0554146811207,
361.891536087235, 30.7768185135629, 765.018255121896), y = c(0.546201232032854,
0.546201232032854, 0.450376454483231, 0.450376454483231, 0.288843258042437,
0.288843258042437, 0.108145106091718, 0.108145106091718, 0.0520191649555099,
0.0520191649555099, 0.027378507871321, 0.027378507871321, 0.0082135523613963,
0.0082135523613963)), row.names = c(NA, -14L), class = c("tbl_df",
"tbl", "data.frame"))
Whichever, I try to use that code to obtain an unique regression :
exp.model <-lm(y ~ exp(x), df)
I obtain that error :
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'x'
In fine I would like to have grouped regressions by "Type_stat" on a plot with geom_smooth()
:
library(tidyverse)
df %>% ggplot() + geom_smooth(method="lm", aes(x = x, y = y, color=as.factor(Type_stat)), formula= (y ~ exp(x)), se=FALSE, na.rm = FALSE, linetype = 1)
But the error is similar in both case.
Your proximal problem is that exponentiating numbers as large as 765 gives an overflow error: exp(max(x$df))
is Inf
.
Based on what your data look like, I suspect it would be better to fit log(y) ~ x
rather than y ~ exp(x)
. Note that this is a different model: your model corresponds to y = a + b*exp(x) + ɛ
, my suggestion would be log(y) = a + b*x + ɛ
(so that y = exp(a + b*x + ɛ)
or y = exp(a)*exp(b*x)*exp(ɛ)
, i.e. a log-Normal/multiplicative model. In general the latter makes way more sense in applied contexts (i.e., we want to estimate the scale parameter in exp(c*x)
, not assume that the data are strictly exponential and multiplied by a scaling factor (c*exp(x)
).
If you really want to fit your model (a + b*exp(x) + ɛ
) you can do it by shifting your x
values by a constant: y = a + exp(c)*b*exp(x-c) + ɛ
.
m1 <- lm(log(y) ~ x, data = df); cc1 <- coef(m1)
S <- 200 ## shift
m2 <- lm(y ~ exp(x-S), data = df); cc2 <- coef(m2)
plot(y ~ x, data = df)
curve(exp(cc1[1] + cc1[2]*x), add = TRUE, col = 2, lwd = 2)
curve(cc2[1] + exp(S)*cc2[2]*exp(x-S), add = TRUE, col = 4, lwd = 2)
You can fit the model y = a*exp(b*x) + ɛ
(Normal rather than log-Normal error) by using glm(y~x, family = gaussian(link = "log"))
; this will also work within ggplot2
...
ggplot(df, aes(x, y)) +
geom_smooth(method = "glm", method.args = list(family = gaussian(link = "log"))) +
geom_point() +
scale_y_continuous(limits=c(NA, 0.5), oob = scales::squish)
(I truncated the y-axis scale because the confidence intervals explode at the upper end of the x-range.)
Follow-up questions should probably go to Cross Validated, as I'm guessing that in reality this is more of a statistical/data analysis question than a programming one ...