I want to run a monte carlo simulation for a multiple regression model that predicts mpg and then evaluate how many times each car has a better performance (lower mpg) than the other. This is what I got so far
library(pacman)
pacman::p_load(data.table, fixest, stargazer, dplyr, magrittr)
df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
fit$coefficients[1]
beta_0 = fit$coefficients[1] # Intercept
beta_1 = fit$coefficients[2] # Slope (cyl)
beta_2 = fit$coefficients[3] # slope (hp)
set.seed(1) # Seed
n = 1000 # Sample size
M = 500 # Number of experiments/iterations
## Storage
slope_DT <- rep(0,M)
slope_DT_2 <- rep(0,M)
intercept_DT <- rep(0,M)
## Begin Monte Carlo
for (i in 1:M){ # M is the number of iterations
# Generate data
U_i = rnorm(n, mean = 0, sd = 2) # Error
X_i = rnorm(n, mean = 5, sd = 5) # Independent variable
Y_i = beta_0 + beta_1*X_i + beta_2*X_i +U_i # Dependent variable
# Formulate data.table
data_i = data.table(Y = Y_i, X = X_i)
# Run regressions
ols_i <- fixest::feols(data = data_i, Y ~ X)
# Extract slope coefficient and save
slope_DT_2[i] <- ols_i$coefficients[3]
slope_DT[i] <- ols_i$coefficients[2]
intercept_DT[i] <- ols_i$coefficients[1]
}
# Summary statistics
estimates_DT <- data.table(beta_2 = slope_DT_2,beta_1 = slope_DT, beta_0 = intercept_DT)
This code does not create any coefficients for hp I want to know how to add coefficients to the model and then predict results and test how many times one car has lower mpg than the other. For example how many times Mazda RX4 has a lower predicted mpg than Datsun 710.
Some idea on how can make this work?
Thank you
Like I've noted in my comment, you should use two independent variables. Moreover, I would like to suggest to you the lapply-function, which makes the code shorter, since you don't need the initialization/Storage part.
estimates_DT <- do.call("rbind",lapply(1:M, function(i) {
# Generate data
U_i = rnorm(n, mean = 0, sd = 2) # Error
X_i_1 = rnorm(n, mean = 5, sd = 5) # First independent variable
X_i_2 = rnorm(n, mean = 5, sd = 5) #Second ndependent variable
Y_i = beta_0 + beta_1*X_i_1 + beta_2*X_i_2 + U_i # Dependent variable
# Formulate data.table
data_i = data.table(Y = Y_i, X1 = X_i_1, X2 = X_i_2)
# Run regressions
ols_i <- fixest::feols(data = data_i, Y ~ X1 + X2)
ols_i$coefficients
}))
estimates_DT <- setNames(data.table(estimates_DT),c("beta_0","beta_1","beta_2"))
To compare the predictions of the two cars, define the following function taking the two car names you want to compare as arguments:
compareCarEstimations <- function(carname1="Mazda RX4",carname2="Datsun 710") {
car1data <- mtcars[rownames(mtcars) == carname1,c("cyl","hp")]
car2data <- mtcars[rownames(mtcars) == carname2,c("cyl","hp")]
predsCar1 <- estimates_DT[["beta_0"]] + car1data$cyl*estimates_DT[["beta_1"]]+car1data$hp*estimates_DT[["beta_2"]]
predsCar2 <- estimates_DT[["beta_0"]] + car2data$cyl*estimates_DT[["beta_1"]]+car2data$hp*estimates_DT[["beta_2"]]
list(
car1LowerCar2 = sum(predsCar1 < predsCar2),
car2LowerCar1 = sum(predsCar1 >= predsCar2)
)
}
Make sure the names provided as arguments are valid names, e.g., are in rownames(mtcars).