rregressionnon-linear-regression

Perform Power Regression in R


I am having problems fitting a potential regression model of type Y = aX^b. Here is some context.

I have the following data vectors:

x <- c(2.08, 1.99, 2.03, 2.01, 2.10, 1.91, 1.84, 2.16, 2.04, 2.05, 2.04, 1.97, 2.03, 2.11, 2.06, 2.07, 2.12, 1.98, 2.13, 2.13, 1.97, 1.79, 2.11, 2.09, 2.19, 2.07, 1.99, 2.03, 2.12, 2.14)*100
y <- c(157.91, 138.47, 146.26, 142.81, 161.77, 123.76, 109.68, 175.48, 149.84, 151.99, 149.39, 134.55, 147.54, 164.49, 153.63, 154.44, 167.12, 136.43, 169.25, 168.22, 134.32, 101.56, 164.96, 160.17, 182.02, 154.95, 137.78, 147.75, 166.54, 171.11)

plot(x,y)

Although this toy data fits a linear model well (R^2 0.997), actually my data has a wider range of X ranging from 5 to 450 and I intuit that it is a better fit to a function of the type Y = aX^b.

I am trying to fit a model linearizing X and Y using log(x) and log(y).

fit <- lm(log(y)~log(x))
plot(x,y)
lines(x, exp(fit$fitted.values), col="red")

enter image description here

However, the plot does not make sense since many lines appear. How can I improve this graph? Am I fitting the model incorrectly or am I plotting wrong?

If I print the following I can get the summary of the model:

summary(fit)

Output:

> summary(fit)

Call:
lm(formula = log(y) ~ log(x))

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0064932 -0.0026293 -0.0003367  0.0026992  0.0065128 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -10.49654    0.08586  -122.3   <2e-16 ***
log(x)        2.91462    0.01614   180.6   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.003912 on 28 degrees of freedom
Multiple R-squared:  0.9991,    Adjusted R-squared:  0.9991 
F-statistic: 3.261e+04 on 1 and 28 DF,  p-value: < 2.2e-16

How can I obtain the RMSE? And how can I get the equation that defines the model? That is to say, that they are worth a and b in the equation Y = aX^b.


Solution

  • The issue is in the plot rather than the model, if plotting lines your data should be ordered, otherwise you will get a very zig-zaggy thing, as you noticed. Try

    lines(sort(x), exp(fit$fitted.values)[order(x)], col="red")
    

    or alternatively sort your data before running the model.