I am trying to write a function to estimate confidence intervals around the mean.
I want to estimate intervals either by wk
or by month
using pd
as my var
, however,
after trying for a while I am getting NAs for all my wk
and besides the function ignores
the month
and returns by wk
only. Could someone point out what I am doing wrong?
What should I group_by inside the function to be able to switch between wk
and month
when
usig the ci
function? Data and code below:
x <- structure(list(wk = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 3, 3, 3, 3, 3, 3), month = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1), xd = c(0.0370098838217444, 0.0391215970961887,
0.0409865246890313, NA, NA, NA, NA, 0.015188, 0.007513, 0.012716,
0.013962, 0.014259, 0.023553, 0.032122, 0.03953, 0.028946, 0.030769,
0.030815, 0.029187, 0.022604), td = c(0.02701157274146, 0.0284158620689655,
0.0296560389182058, NA, NA, NA, NA, 0.0125, 0.007396, 0.010856,
0.011685, 0.011882, 0.018063, 0.023761, 0.028687, 0.021649, 0.022861,
0.022892, 0.021809, 0.017432), pd = c(317.308439683869, 0, 126.719553152898,
NA, NA, NA, NA, 2671.6, 3540.6976744186, 1270.35740604274, 1067.69362430466,
688.099646524154, 317.444499806234, 420.941879550524, 280.475476696762,
250.681324772507, 159.048160622895, 258.125109208457, 450.868907331836,
0), year = c(2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007,
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007,
2007)), class = "data.frame", row.names = c(NA, -20L))
x
library(tidyverse)
ci <- function(dat, by, var, conf.int = 0.90,...) {
dat <- na.omit(dat)
dat %>%
group_by(wk) %>% #Should I group by wk here?
do(data.frame(rbind(smean.cl.normal(.$var, conf.int = conf.int))))
}
ci(dat = x, by = month, var = pd) # Returns NAs
I would use summarise
instead of do
and then unnest_wider
. And you need to put the "by" argument in curly curly braces. If you group_by(wk)
then your function will group by the wk
variable no matter what you specify as the "by" argument. You also don't need to omit the NA because smean.cl.normal
omits them by default.
library(Hmisc)
library(dplyr)
ci <- function(dat, by, var, conf.int = 0.90,...) {
#dat <- na.omit(dat)
dat %>%
group_by({{by}}) %>%
dplyr::summarise(mean=list(smean.cl.normal({{var}}, conf.int=conf.int))) %>%
unnest_wider(mean)
}
ci(dat = x, by = month, var = pd)
month Mean Lower Upper
<dbl> <dbl> <dbl> <dbl>
1 1 739. 300. 1177.
ci(dat = x, by = wk, var = pd)
# A tibble: 3 x 4
wk Mean Lower Upper
<dbl> <dbl> <dbl> <dbl>
1 1 148. -121. 417.
2 2 1425. 528. 2323.
3 3 233. 111. 355.