rmodelpurrrpredictmodelr

Using predict() across multiple models to generate confidence intervals in R


My goal is to create multiple models from a dataframe and then generate confidence intervals around the fitted values that correspond to those different models.

Pulling in libraries:

library(purrr)
library(dplyr)
library(modelr)

Assigning data_1 as the DNase data set from R:

data_1 <- DNase

Creating one unique model for each run:

model_dna <- data_1 %>% group_by(Run) %>% 
  do(model_dna = lm(conc ~ density, data = .)) %>% ungroup()

I then want to predict the fit and 95% confidence interval for that same set of data, but do so for each model individually. When including the interval = "confidence", the resulting table should produce a "fit" column of fitted values, as well as an "upr" and "lwr" column, representing the range of confidence around the fitted value. I tried this, as spread_predictions has worked before in helping to spread fitted values across multiple sets of data:

data_2 <- map(model_dna$model_dna, ~ spread_predictions(data = data_1, models = .x, interval = "confidence"))

However, the following error is generated:

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "character"

Does anyone know what the best way to produce these numbers are? Do I have to change the way this function is processing my data? Or is there a better function to use, i.e. somehow using predict() directly, which certainly takes interval as an argument (http://www.sthda.com/english/articles/40-regression-analysis/166-predict-in-r-model-predictions-and-confidence-intervals/)?


Solution

  • We can use invoke, specify the data as 'data_1' and the models as a list (the model_dna column in the model_dna is a list)

    library(purrr)
    out <- invoke(spread_predictions, data = data_1, model_dna$model_dna)
    

    -output

    head(out)
    #Grouped Data: density ~ conc | Run
    #  Run       conc density       <lm>
    #1   1 0.04882812   0.017 -1.5759940
    #2   1 0.04882812   0.018 -1.5693389
    #3   1 0.19531250   0.121 -0.8838652
    #4   1 0.19531250   0.124 -0.8639000
    #5   1 0.39062500   0.206 -0.3181831
    #6   1 0.39062500   0.215 -0.2582873
    

    If it is to get the confidence interval from the model,

    library(broom)
    library(tidyr)
    data_1 %>%
       nest_by(Run) %>% 
       mutate(model_dna = list(lm(conc ~ density, data = data) %>%
       tidy(., conf.int = TRUE))) %>%
       select(Run, model_dna) %>%
       ungroup %>% 
       unnest(c(model_dna))
    # A tibble: 22 x 8
    #   Run   term        estimate std.error statistic     p.value conf.low conf.high
    #   <ord> <chr>          <dbl>     <dbl>     <dbl>       <dbl>    <dbl>     <dbl>
    # 1 10    (Intercept)    -1.69     0.670     -2.52 0.0245         -3.13   -0.251 
    # 2 10    density         6.66     0.733      9.08 0.000000306     5.08    8.23  
    # 3 11    (Intercept)    -1.58     0.629     -2.51 0.0249         -2.93   -0.231 
    # 4 11    density         6.60     0.690      9.57 0.000000161     5.12    8.08  
    # 5 9     (Intercept)    -1.47     0.646     -2.28 0.0388         -2.86   -0.0876
    # 6 9     density         6.49     0.708      9.16 0.000000272     4.97    8.01  
    # 7 1     (Intercept)    -1.30     0.588     -2.21 0.0440         -2.56   -0.0402
    # 8 1     density         6.51     0.659      9.88 0.000000108     5.10    7.92  
    # 9 4     (Intercept)    -1.22     0.583     -2.09 0.0550         -2.47    0.0300
    #10 4     density         6.36     0.645      9.86 0.000000111     4.97    7.74  
    # … with 12 more rows