I'm trying to do some regression on variables with the interval [0,1]. I'd like to include orthogonal quadratic and cubic components. The polynomials I am after are Shifted Legendre polynomials(Wikipedia).
I am baffled by the behaviour of the poly() function in R. Not only does it not return a vector in [0,1], the vector it does return varies with the length of the input vector.
This code generates an illustrative example. The code generates some first-order polynomials (lines) from x. The intervals of the resultant polynomials range from [-0.36,0.36] to [-0.017,0.017].
x <- seq(0,1,by = 0.1) # base variable
plot(x,x,col=2)
M<-matrix(nrow=6,ncol=4)
dfpoly<-data.frame(M)
v<- c(0.05,0.01,0.005,0.001,0.0005,0.0001) # This vector alters the length of x
for (i in 1:length(v)){
x <- seq(0,1,by = v[i])
y <- poly(x,degree = 3)[,1] #first order polynomial, should in my mind be the same as x
points(x,y)
dfpoly[i,1] <- length(x)
dfpoly[i,2] <- min(y)
dfpoly[i,3] <- max(y)
dfpoly[i,4] <- mean(diff(y)/diff(x))
}
names(dfpoly) <- c("x length","y min","y max","y slope")
dfpoly
graph of x vs generated first order polynomials
Summary of outputs:
x length y min y max y slope
1 21 -0.36037498508 0.36037498508 0.72074997016
2 101 -0.17064747029 0.17064747029 0.34129494057
3 201 -0.12156314064 0.12156314064 0.24312628128
4 1001 -0.05469022724 0.05469022724 0.10938045447
5 2001 -0.03870080906 0.03870080906 0.07740161813
6 10001 -0.01731791041 0.01731791041 0.03463582082
Now, I'd expect all the lines to inhabit the same interval [0,1] as x, and perfectly overlap with x (the red series of points) on the graph. But they don't. Nor do they have any pattern to them that I can identify by eye.
1. What is the cause of the weird interval behaviour with poly()?
2. Are there other techniques or functions I can use to coerce these polynomials to [0,1]?
The poly()
function returns a matrix, whose columns are the values of polynomials evaluated at the values of x
. From the help page ?poly
, the columns are mutually orthogonal, and are also orthogonal to the constant polynomial p(x) = 1
. Orthogonality is in the vector sense (i.e. $\sum x_i y_i = 0$).
I don't think the help page guarantees this, but in practice it appears that the columns are also unit length, i.e. $\sum x_i^2 = 1$.
The unit length condition explains your "weird interval behaviour". More terms means they have to be smaller to still have a sum of squares equal to 1.
To coerce the columns to the range [0,1]
, just subtract the minimum and divide by the range. This will lose both the orthogonality and unit length properties, but
will keep the degree and linear independence.
For example,
x <- seq(0,1,by = 0.1) # base variable
plot(x,x,col=2)
M<-matrix(nrow=6,ncol=4)
dfpoly<-data.frame(M)
v<- c(0.05,0.01,0.005,0.001,0.0005,0.0001) # This vector alters the length of x
for (i in 1:length(v)){
x <- seq(0,1,by = v[i])
y <- poly(x,degree = 3)[,1] #first order polynomial, should in my mind be the same as x
y <- (y - min(y))/diff(range(y))
points(x,y)
dfpoly[i,1] <- length(x)
dfpoly[i,2] <- min(y)
dfpoly[i,3] <- max(y)
dfpoly[i,4] <- mean(diff(y)/diff(x))
}
names(dfpoly) <- c("x length","y min","y max","y slope")
dfpoly
This prints
x length y min y max y slope
1 21 0 1 1
2 101 0 1 1
3 201 0 1 1
4 1001 0 1 1
5 2001 0 1 1
6 10001 0 1 1