rregressionsimulationmontecarlo

How to run a montecarlo simulation for multple regression in R?


I want to run a monte carlo simulation for a multiple regression model that predicts mpg and then evaluate how many times each car has a better performance (lower mpg) than the other. This is what I got so far

library(pacman)
pacman::p_load(data.table, fixest, stargazer, dplyr, magrittr)

df <- mtcars
fit <- lm(mpg~cyl + hp, data = df)
fit$coefficients[1]

beta_0 = fit$coefficients[1] # Intercept 
beta_1 = fit$coefficients[2] # Slope (cyl)
beta_2 = fit$coefficients[3] # slope (hp)
set.seed(1)  # Seed
n = 1000     # Sample size
M = 500      # Number of experiments/iterations

## Storage 
slope_DT <- rep(0,M)
slope_DT_2 <- rep(0,M)
intercept_DT <- rep(0,M)

## Begin Monte Carlo

for (i in 1:M){ #  M is the number of iterations
  
  # Generate data
  U_i = rnorm(n, mean = 0, sd = 2) # Error
  X_i = rnorm(n, mean = 5, sd = 5) # Independent variable
  Y_i = beta_0 + beta_1*X_i + beta_2*X_i +U_i  # Dependent variable
  
  # Formulate data.table
  data_i = data.table(Y = Y_i, X = X_i)
  
  # Run regressions
  ols_i <- fixest::feols(data = data_i, Y ~ X)
  
  # Extract slope coefficient and save
  slope_DT_2[i] <- ols_i$coefficients[3]
  slope_DT[i] <- ols_i$coefficients[2]
  intercept_DT[i] <- ols_i$coefficients[1]
  
}


# Summary statistics
estimates_DT <- data.table(beta_2 = slope_DT_2,beta_1 = slope_DT, beta_0 = intercept_DT)

This code does not create any coefficients for hp I want to know how to add coefficients to the model and then predict results and test how many times one car has lower mpg than the other. For example how many times Mazda RX4 has a lower predicted mpg than Datsun 710. Some idea on how can make this work? Thank you


Solution

  • Like I've noted in my comment, you should use two independent variables. Moreover, I would like to suggest to you the lapply-function, which makes the code shorter, since you don't need the initialization/Storage part.

    estimates_DT <- do.call("rbind",lapply(1:M, function(i) {
      # Generate data
      U_i = rnorm(n, mean = 0, sd = 2) # Error
      X_i_1 = rnorm(n, mean = 5, sd = 5) # First independent variable
      X_i_2 = rnorm(n, mean = 5, sd = 5) #Second ndependent variable
      Y_i = beta_0 + beta_1*X_i_1 + beta_2*X_i_2 + U_i  # Dependent variable
    
      # Formulate data.table
      data_i = data.table(Y = Y_i, X1 = X_i_1, X2 = X_i_2)
      
      # Run regressions
      ols_i <- fixest::feols(data = data_i, Y ~ X1 + X2)  
      ols_i$coefficients
    }))
    
    estimates_DT <- setNames(data.table(estimates_DT),c("beta_0","beta_1","beta_2"))
    

    To compare the predictions of the two cars, define the following function taking the two car names you want to compare as arguments:

    compareCarEstimations <- function(carname1="Mazda RX4",carname2="Datsun 710") {
      car1data <- mtcars[rownames(mtcars) == carname1,c("cyl","hp")]
      car2data <- mtcars[rownames(mtcars) == carname2,c("cyl","hp")]
      
      predsCar1 <- estimates_DT[["beta_0"]] + car1data$cyl*estimates_DT[["beta_1"]]+car1data$hp*estimates_DT[["beta_2"]]
      predsCar2 <- estimates_DT[["beta_0"]] + car2data$cyl*estimates_DT[["beta_1"]]+car2data$hp*estimates_DT[["beta_2"]]
      
      list(
        car1LowerCar2 = sum(predsCar1 < predsCar2),
        car2LowerCar1 = sum(predsCar1 >= predsCar2)
      )
    }
    

    Make sure the names provided as arguments are valid names, e.g., are in rownames(mtcars).