I'm trying to compute OLS estimators manually in R for given vectors and matrices, but when I get the formula beta=(x'x)^-1(x'y), R tells me that the is a dimension issue, and I can't figure out why.
My code is
nr = 100
nc = 1000
x=matrix(rnorm(nr * nc, mean=1, sd=1), nrow = nr)
epsilon=matrix(rnorm(nr * nc, mean=0, sd=1), nrow = nr)
k=c(1,2,4,8)
eta1=((epsilon^1-mean(epsilon^1))/(mean(epsilon^(1*2))-mean(epsilon^1)^2)^(1/2))
eta2=((epsilon^2-mean(epsilon^2))/(mean(epsilon^(2*2))-mean(epsilon^2)^2)^(1/2))
eta4=((epsilon^4-mean(epsilon^4))/(mean(epsilon^(4*2))-mean(epsilon^4)^2)^(1/2))
eta8=((epsilon^8-mean(epsilon^8))/(mean(epsilon^(8*2))-mean(epsilon^8)^2)^(1/2))
y1=x+eta1
y2=x+eta2
y4=x+eta4
y8=x+eta8
beta1=inv(t(x)*x)*(t(x)*y1)
beta2=inv(t(x)*x)*(t(x)*y2)
beta4=inv(t(x)*x)*(t(x)*y4)
beta8=inv(t(x)*x)*(t(x)*y8)
Also, I feel that there should be a way to loop through the values of k to get this automated, instead of doing each eta by hand. So, a bit of help in this area would also be appreciated.
The ouput I'm looking for is to get a vector of beta for each of the different values of k.
You have got several issues. Firstly, you should have nx1 matrix for y and epsilon, but you have nxm matrix for them instead. Secondly, you should use matrix multiplication which is %*%
in R
. i.e. t(x)%*%y1
. However you use dot product (*
) instead.
For the sake of simplicity, lets create a matrix with 5 columns. My approach is creating a dependent variable which is related with x columns (independent variables or feature matrix in machine learning terminology)
nr = 100
nc = 5
x=matrix(rnorm(nr * nc, mean=1, sd=1), nrow = nr)
epsilon=matrix(rnorm(nr, mean=0, sd=1), nrow = nr) # it should be nx1
k=c(1,2,4,8)
eta1=((epsilon^1-mean(epsilon^1))/(mean(epsilon^(1*2))-mean(epsilon^1)^2)^(1/2))
eta2=((epsilon^2-mean(epsilon^2))/(mean(epsilon^(2*2))-mean(epsilon^2)^2)^(1/2))
eta4=((epsilon^4-mean(epsilon^4))/(mean(epsilon^(4*2))-mean(epsilon^4)^2)^(1/2))
eta8=((epsilon^8-mean(epsilon^8))/(mean(epsilon^(8*2))-mean(epsilon^8)^2)^(1/2))
To check the output we should create y values wisely. So, let's define some betas and create y values wrt them. At the end, we can compare the output with the inputs we defined. Note that, you should have 5 betas for 5 columns.
# made up betas
beta1_real <- 1:5
beta2_real <- -4:0
beta4_real <- 7:11
beta8_real <- seq(0.25,1.25,0.25)
To create the y values,
y1= 10 + x %*% matrix(beta1_real) + eta1
y2= 20 + x %*% matrix(beta2_real) + eta2
y4= 30 + x %*% matrix(beta4_real) + eta4
y8= 40 + x %*% matrix(beta8_real) + eta8
Here, I also added a constant term for each y values. To get the constant term at the end, we should add ones at the beginning of our x matrix like,
x <- cbind(matrix(1,nrow = nr),x)
The rest is almost same with yours. Only difference is I used solve
instead of inv
and also I used the matrix multiplication (%*%
)
beta1=solve(t(x)%*%x)%*%(t(x)%*%y1)
beta2=solve(t(x)%*%x)%*%(t(x)%*%y2)
beta4=solve(t(x)%*%x)%*%(t(x)%*%y4)
beta8=solve(t(x)%*%x)%*%(t(x)%*%y8)
If we compare the outputs,
beta1_real was,
# [1] 1 2 3 4 5
and the output of beta1 is,
# [,1]
# [1,] 10.0049631
# [2,] 0.9632124
# [3,] 1.8987402
# [4,] 2.9816673
# [5,] 4.2111817
# [6,] 4.9529084
The results are similar. 10 at the beginning is the constant term I added. The difference stems from the error term applied (etas).