Suppose I have three columns in a data frame A, B and C respectively. The number of rows for each column are 10. Now, I want to multiply all possible combinations of the columns by taking 2 at a time. So, the columns that I want are AB, AC,BC. How shall I get this using R. Also, I want to use these columns AB, AC, BC into several iterations of a multiple linear regression as follows: Y ~ A + B + C + AB/AC/B*C. The model iterations has to be computed through a for loop.
I have tried this code:
cc <- combn(data, 2, FUN = Reduce, f = `*`)
n = choose(ncol(data),2)
model <- list()
for (i in 1:n) {
model[[i]] <- lm(Y ~ A+B+C+cc[,i])
}
Now, the problem is cc[,i] is a double containing all the possible combinations of the three columns and it does not have specific name to it. So, in the model summary the interaction variable is named as "cc[,i]" only. I want to change the variable name to either "AB" or "BC" or "A*C". How shall I do it?
See if this code structure works for you:
set.seed(7)
df <- data.frame(A = rexp(10, 1), B = rexp(10, 2), C = rexp(10, 3))
Y <- rnorm(10)
df <- data.frame(cbind(Y, df))
m <- combn(3, 2)
mylist <- list()
for (i in 1:3) {
new_col <- df[ , m[1, i] + 1] * df[ , m[2, i] + 1]
df2 <- cbind(df, new_col)
mylist[[i]] <- lm(Y ~ ., data = df2)
}
mylist
[[1]]
Call:
lm(formula = Y ~ ., data = df2)
Coefficients:
(Intercept) A B C new_col
-1.9290 0.4046 2.0715 0.8587 -0.1156
[[2]]
Call:
lm(formula = Y ~ ., data = df2)
Coefficients:
(Intercept) A B C new_col
-1.81765 0.33800 1.89724 0.79561 0.03353
[[3]]
Call:
lm(formula = Y ~ ., data = df2)
Coefficients:
(Intercept) A B C new_col
-1.402 0.321 1.058 -1.445 5.114