My goal is to create multiple models from a dataframe and then generate confidence intervals around the fitted values that correspond to those different models.
Pulling in libraries:
library(purrr)
library(dplyr)
library(modelr)
Assigning data_1 as the DNase data set from R:
data_1 <- DNase
Creating one unique model for each run:
model_dna <- data_1 %>% group_by(Run) %>%
do(model_dna = lm(conc ~ density, data = .)) %>% ungroup()
I then want to predict the fit and 95% confidence interval for that same set of data, but do so for each model individually. When including the interval = "confidence", the resulting table should produce a "fit" column of fitted values, as well as an "upr" and "lwr" column, representing the range of confidence around the fitted value. I tried this, as spread_predictions has worked before in helping to spread fitted values across multiple sets of data:
data_2 <- map(model_dna$model_dna, ~ spread_predictions(data = data_1, models = .x, interval = "confidence"))
However, the following error is generated:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "character"
Does anyone know what the best way to produce these numbers are? Do I have to change the way this function is processing my data? Or is there a better function to use, i.e. somehow using predict() directly, which certainly takes interval as an argument (http://www.sthda.com/english/articles/40-regression-analysis/166-predict-in-r-model-predictions-and-confidence-intervals/)?
We can use invoke
, specify the data
as 'data_1' and the models as a list
(the model_dna column in the model_dna is a list
)
library(purrr)
out <- invoke(spread_predictions, data = data_1, model_dna$model_dna)
-output
head(out)
#Grouped Data: density ~ conc | Run
# Run conc density <lm>
#1 1 0.04882812 0.017 -1.5759940
#2 1 0.04882812 0.018 -1.5693389
#3 1 0.19531250 0.121 -0.8838652
#4 1 0.19531250 0.124 -0.8639000
#5 1 0.39062500 0.206 -0.3181831
#6 1 0.39062500 0.215 -0.2582873
If it is to get the confidence interval from the model,
library(broom)
library(tidyr)
data_1 %>%
nest_by(Run) %>%
mutate(model_dna = list(lm(conc ~ density, data = data) %>%
tidy(., conf.int = TRUE))) %>%
select(Run, model_dna) %>%
ungroup %>%
unnest(c(model_dna))
# A tibble: 22 x 8
# Run term estimate std.error statistic p.value conf.low conf.high
# <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 10 (Intercept) -1.69 0.670 -2.52 0.0245 -3.13 -0.251
# 2 10 density 6.66 0.733 9.08 0.000000306 5.08 8.23
# 3 11 (Intercept) -1.58 0.629 -2.51 0.0249 -2.93 -0.231
# 4 11 density 6.60 0.690 9.57 0.000000161 5.12 8.08
# 5 9 (Intercept) -1.47 0.646 -2.28 0.0388 -2.86 -0.0876
# 6 9 density 6.49 0.708 9.16 0.000000272 4.97 8.01
# 7 1 (Intercept) -1.30 0.588 -2.21 0.0440 -2.56 -0.0402
# 8 1 density 6.51 0.659 9.88 0.000000108 5.10 7.92
# 9 4 (Intercept) -1.22 0.583 -2.09 0.0550 -2.47 0.0300
#10 4 density 6.36 0.645 9.86 0.000000111 4.97 7.74
# … with 12 more rows