rlmmodelr

Calculate all possible interactions in model_matrix


I'm simulating data with a fluctuating number of variables. As part of the situation, I am needing to calculate a model matrix with all possible combinations. See the following reprex for an example. I am able to get all two-interactions by specifying the formula as ~ .*.. However, this particular dataset has 3 variables (ndim <- 3). I can get all two- and three-way interactions by specifying the formula as ~ .^3. The issue is that there may be 4+ variables that I need to calculate, so I would like to be able to generalize this. I have tried specifying the formula as ~ .^ndim, but this throws an error.

Is there a way define the power in the formula with a variable?

library(tidyverse)
library(mvtnorm)
library(modelr)

ndim <- 3

data <- rmvnorm(100, mean = rep(0, ndim)) %>%
  as_tibble(.name_repair = ~ paste0("dim_", seq_len(ndim)))

model_matrix(data, ~ .*.)
#> # A tibble: 100 x 7
#>    `(Intercept)`  dim_1   dim_2    dim_3 `dim_1:dim_2` `dim_1:dim_3`
#>            <dbl>  <dbl>   <dbl>    <dbl>         <dbl>         <dbl>
#>  1             1 -0.775  0.214   0.111         -0.166       -0.0857 
#>  2             1  1.25  -0.0636  1.40          -0.0794       1.75   
#>  3             1  1.07  -0.361   0.976         -0.384        1.04   
#>  4             1  2.08   0.381   0.593          0.793        1.24   
#>  5             1 -0.197  0.382  -0.257         -0.0753       0.0506 
#>  6             1  0.266 -1.82    0.00411       -0.485        0.00109
#>  7             1  3.09   2.57   -0.612          7.96        -1.89   
#>  8             1  2.03   0.247   0.112          0.501        0.226  
#>  9             1 -0.397  0.204   1.55          -0.0810      -0.614  
#> 10             1  0.597  0.335   0.533          0.200        0.319  
#> # … with 90 more rows, and 1 more variable: `dim_2:dim_3` <dbl>

model_matrix(data, ~ .^3)
#> # A tibble: 100 x 8
#>    `(Intercept)`  dim_1   dim_2    dim_3 `dim_1:dim_2` `dim_1:dim_3`
#>            <dbl>  <dbl>   <dbl>    <dbl>         <dbl>         <dbl>
#>  1             1 -0.775  0.214   0.111         -0.166       -0.0857 
#>  2             1  1.25  -0.0636  1.40          -0.0794       1.75   
#>  3             1  1.07  -0.361   0.976         -0.384        1.04   
#>  4             1  2.08   0.381   0.593          0.793        1.24   
#>  5             1 -0.197  0.382  -0.257         -0.0753       0.0506 
#>  6             1  0.266 -1.82    0.00411       -0.485        0.00109
#>  7             1  3.09   2.57   -0.612          7.96        -1.89   
#>  8             1  2.03   0.247   0.112          0.501        0.226  
#>  9             1 -0.397  0.204   1.55          -0.0810      -0.614  
#> 10             1  0.597  0.335   0.533          0.200        0.319  
#> # … with 90 more rows, and 2 more variables: `dim_2:dim_3` <dbl>,
#> #   `dim_1:dim_2:dim_3` <dbl>

model_matrix(data, ~.^ndim)
#> Error in terms.formula(object, data = data): invalid power in formula

Created on 2019-02-15 by the reprex package (v0.2.1)


Solution

  • You can use use as.formula with paste in model_matrix:

    model_matrix(data, as.formula(paste0("~ .^", ndim)))