[SOLVED] Using linear regression coefficients from table to compute values

Using linear regression coefficients from table to compute values

I have a file with many variable names and coefficients. The task is to use those variable names and coefficients to create a linear regression formula and apply it to data. Here's a small example:

coefs <- tibble(varname = c("(Intercept)", "dxaids", "abnormal_bun"),
                coef = c(-3.1, 0.1, 0.2))

data <- tibble(dxaids = c(0,0,1), abnormal_bun = c(1,0,0))

The goal is a new column, effectively

data %>% mutate(y = -3.1 + 0.1 * dxaids + 0.2 * abnormal_bun)

What I've done for the time being is manually write out the equation with about 25 variables.

Of course I can write an ugly loop for this, shown below, but is there any cleaner way with tidyverse tools? Perhaps this can be accomplished with a single matrix-vector multiply, but dplyr doesn't seem amenable to matrix operations.

y <- as.numeric(coefs[coefs$varname == "(Intercept)", "coef"])

for (i in 1:nrow(coefs)) {
  varname <- as.character(coefs[i,"varname"])
  coef <- as.numeric(coefs[i,"coef"])
  if (varname != "(Intercept)") 
    y <- y + coef * data[,varname] 
}

Solution

You can avoid using a for loop if you use matrix multiplication:

coefs$coef[1] + (as.matrix(data) %*% coefs$coef[-1])
     [,1]
[1,] -2.9
[2,] -3.1
[3,] -3.0

Just make sure columns in data correspond with order in coefs$coef[-1]. Example, if columns in data do not match coef order, then simply you can reorder data using:

data <- data[, 2:1] # note the order is chaged
coefs$coef[1] + (as.matrix (data[, coefs$varname[-1]]) %*% coefs$coef[-1])
     [,1]
[1,] -2.9
[2,] -3.1
[3,] -3.0