I have a file with many variable names and coefficients. The task is to use those variable names and coefficients to create a linear regression formula and apply it to data. Here's a small example:
coefs <- tibble(varname = c("(Intercept)", "dxaids", "abnormal_bun"),
coef = c(-3.1, 0.1, 0.2))
data <- tibble(dxaids = c(0,0,1), abnormal_bun = c(1,0,0))
The goal is a new column, effectively
data %>% mutate(y = -3.1 + 0.1 * dxaids + 0.2 * abnormal_bun)
What I've done for the time being is manually write out the equation with about 25 variables.
Of course I can write an ugly loop for this, shown below, but is there any cleaner way with tidyverse tools? Perhaps this can be accomplished with a single matrix-vector multiply, but dplyr doesn't seem amenable to matrix operations.
y <- as.numeric(coefs[coefs$varname == "(Intercept)", "coef"])
for (i in 1:nrow(coefs)) {
varname <- as.character(coefs[i,"varname"])
coef <- as.numeric(coefs[i,"coef"])
if (varname != "(Intercept)")
y <- y + coef * data[,varname]
}
You can avoid using a for
loop if you use matrix multiplication:
coefs$coef[1] + (as.matrix(data) %*% coefs$coef[-1])
[,1]
[1,] -2.9
[2,] -3.1
[3,] -3.0
Just make sure columns in data
correspond with order in coefs$coef[-1]
. Example, if columns in data do not match coef order, then simply you can reorder data using:
data <- data[, 2:1] # note the order is chaged
coefs$coef[1] + (as.matrix (data[, coefs$varname[-1]]) %*% coefs$coef[-1])
[,1]
[1,] -2.9
[2,] -3.1
[3,] -3.0