rpurrrmodelr

run model for each line of model parameters (meta) data.frame


In the spirit of purr, broom, modelr, I am trying to create a "meta" data.frame in which each row denotes the dataset (d) and the model parameters (yvar, xvars, FEvars). For instance:

iris2 <- iris %>% mutate(Sepal.Length=Sepal.Length^2)
meta <- data.frame(n=1:4,
           yvar = c('Sepal.Length','Sepal.Length','Sepal.Length','Sepal.Length'),
           xvars= I(list(c('Sepal.Width'),
                         c('Sepal.Width','Petal.Length'),
                         c('Sepal.Width'),
                         c('Sepal.Width','Petal.Length'))),
           data= I(list(iris,iris,iris2,iris2)) )

Now, I would like to run a model for each column of "meta". And then add a list column "model" with the model output object. To run the model I use an auxiliary function that uses a dataset, a y variable and a vector of x variables:

OLS_help <- function(d,y,xvars){
  paste(y, paste(xvars, collapse=" + "), sep=" ~ ") %>% as.formula %>% 
    lm(d)
}
y <- 'Sepal.Length'
xvars <- c('Sepal.Width','Petal.Length')
OLS_help(iris,y,xvars)

How can I execute OLS_help for all the rows of meta and adding the output of OLS_help as a list column in meta? I tryed the following code, but it did not work:

meta %>% mutate(model = map2(d,yvar,xvars,OLS_help) )
Error: Can't convert a `AsIs` object to function
Call `rlang::last_error()` to see a backtrace

OBS: The solution to when only the "data" (nested) list column (corvered in Hadley's book here) is:

by_country <- gapminder %>% group_by(country, continent) %>% nest()
country_model <- function(df) {  lm(lifeExp ~ year, data = df) }
by_country <- by_country %>% mutate(model = map(data, country_model)) 

Solution

  • We can use pmap in the following way

    df <- meta %>%
        as_tibble() %>%
        mutate_if(is.factor, as.character) %>%
        mutate(fit = pmap(
            list(yvar, xvars, data),
            function(y, x, df) lm(reformulate(x, response = y), data = df)))
    ## A tibble: 4 x 5
    #      n yvar         xvars     data               fit
    #  <int> <chr>        <I<list>> <I<list>>          <list>
    #1     1 Sepal.Length <chr [1]> <df[,5] [150 × 5]> <lm>
    #2     2 Sepal.Length <chr [2]> <df[,5] [150 × 5]> <lm>
    #3     3 Sepal.Length <chr [1]> <df[,5] [150 × 5]> <lm>
    #4     4 Sepal.Length <chr [2]> <df[,5] [150 × 5]> <lm>
    

    Explanation: pmap iterates over multiple arguments simultaneously (similar to base R's Map); here we simultaneously loop throw entries in column yvar, xvar and data, then use reformulate to construct the formula to be used within lm. We store the lm fit object in column fit.