rtidyverse

How to write a function to estimate confidence intervals


I am trying to write a function to estimate confidence intervals around the mean. I want to estimate intervals either by wk or by month using pd as my var, however, after trying for a while I am getting NAs for all my wk and besides the function ignores the month and returns by wk only. Could someone point out what I am doing wrong? What should I group_by inside the function to be able to switch between wk and month when usig the ci function? Data and code below:

x <-  structure(list(wk = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
2, 3, 3, 3, 3, 3, 3), month = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1), xd = c(0.0370098838217444, 0.0391215970961887, 
0.0409865246890313, NA, NA, NA, NA, 0.015188, 0.007513, 0.012716, 
0.013962, 0.014259, 0.023553, 0.032122, 0.03953, 0.028946, 0.030769, 
0.030815, 0.029187, 0.022604), td = c(0.02701157274146, 0.0284158620689655, 
0.0296560389182058, NA, NA, NA, NA, 0.0125, 0.007396, 0.010856, 
0.011685, 0.011882, 0.018063, 0.023761, 0.028687, 0.021649, 0.022861, 
0.022892, 0.021809, 0.017432), pd = c(317.308439683869, 0, 126.719553152898, 
NA, NA, NA, NA, 2671.6, 3540.6976744186, 1270.35740604274, 1067.69362430466, 
688.099646524154, 317.444499806234, 420.941879550524, 280.475476696762, 
250.681324772507, 159.048160622895, 258.125109208457, 450.868907331836, 
0), year = c(2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 
2007)), class = "data.frame", row.names = c(NA, -20L))

x

library(tidyverse)
ci <- function(dat, by, var, conf.int = 0.90,...) {
dat <- na.omit(dat)
 dat %>%
  group_by(wk) %>%   #Should I group by wk here?
  do(data.frame(rbind(smean.cl.normal(.$var, conf.int = conf.int))))
}

ci(dat = x, by = month, var = pd) # Returns NAs

Solution

  • I would use summarise instead of do and then unnest_wider. And you need to put the "by" argument in curly curly braces. If you group_by(wk) then your function will group by the wk variable no matter what you specify as the "by" argument. You also don't need to omit the NA because smean.cl.normal omits them by default.

    library(Hmisc)
    library(dplyr)
    
    ci <- function(dat, by, var, conf.int = 0.90,...) {
      #dat <- na.omit(dat)
      dat %>%
        group_by({{by}}) %>%   
        dplyr::summarise(mean=list(smean.cl.normal({{var}}, conf.int=conf.int))) %>%
        unnest_wider(mean)
    }
    

    ci(dat = x, by = month, var = pd)
      month  Mean Lower Upper
      <dbl> <dbl> <dbl> <dbl>
    1     1  739.  300. 1177.
    

    ci(dat = x, by = wk, var = pd)
    # A tibble: 3 x 4
         wk  Mean Lower Upper
      <dbl> <dbl> <dbl> <dbl>
    1     1  148. -121.  417.
    2     2 1425.  528. 2323.
    3     3  233.  111.  355.