I am having problems fitting a potential regression model of type Y = aX^b. Here is some context.
I have the following data vectors:
x <- c(2.08, 1.99, 2.03, 2.01, 2.10, 1.91, 1.84, 2.16, 2.04, 2.05, 2.04, 1.97, 2.03, 2.11, 2.06, 2.07, 2.12, 1.98, 2.13, 2.13, 1.97, 1.79, 2.11, 2.09, 2.19, 2.07, 1.99, 2.03, 2.12, 2.14)*100
y <- c(157.91, 138.47, 146.26, 142.81, 161.77, 123.76, 109.68, 175.48, 149.84, 151.99, 149.39, 134.55, 147.54, 164.49, 153.63, 154.44, 167.12, 136.43, 169.25, 168.22, 134.32, 101.56, 164.96, 160.17, 182.02, 154.95, 137.78, 147.75, 166.54, 171.11)
plot(x,y)
Although this toy data fits a linear model well (R^2 0.997), actually my data has a wider range of X ranging from 5 to 450 and I intuit that it is a better fit to a function of the type Y = aX^b.
I am trying to fit a model linearizing X and Y using log(x)
and log(y)
.
fit <- lm(log(y)~log(x))
plot(x,y)
lines(x, exp(fit$fitted.values), col="red")
However, the plot does not make sense since many lines appear. How can I improve this graph? Am I fitting the model incorrectly or am I plotting wrong?
If I print the following I can get the summary of the model:
summary(fit)
Output:
> summary(fit)
Call:
lm(formula = log(y) ~ log(x))
Residuals:
Min 1Q Median 3Q Max
-0.0064932 -0.0026293 -0.0003367 0.0026992 0.0065128
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -10.49654 0.08586 -122.3 <2e-16 ***
log(x) 2.91462 0.01614 180.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.003912 on 28 degrees of freedom
Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991
F-statistic: 3.261e+04 on 1 and 28 DF, p-value: < 2.2e-16
How can I obtain the RMSE? And how can I get the equation that defines the model? That is to say, that they are worth a and b in the equation Y = aX^b.
The issue is in the plot rather than the model, if plotting lines your data should be ordered, otherwise you will get a very zig-zaggy thing, as you noticed. Try
lines(sort(x), exp(fit$fitted.values)[order(x)], col="red")
or alternatively sort your data before running the model.