I have a data frame
> dput(df)
structure(list(id = c(1, 2, 3, 4, 1, 2, 3, 4), level = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("g01", "g02"), class = "factor"),
m_col = c(1, 2, 3, 4, 11, 22, 33, 44), u_col = c(11, 12,
13, 14, 21, 22, 23, 24), group = c(0, 0, 1, 1, 0, 0, 1, 1
)), row.names = c(NA, -8L), class = "data.frame")
Which looks like this
id level m_col u_col group
1 1 g01 1 11 0
2 2 g01 2 12 0
3 3 g01 3 13 1
4 4 g01 4 14 1
5 1 g02 11 21 0
6 2 g02 22 22 0
7 3 g02 33 23 1
8 4 g02 44 24 1
I want to perform a binomial weighted test on each 'level' (I need to compare u_col and m_col for each id, essentially) ... so using tidyverse
and broom
I can do the following:
res <- df %>%
group_by(level) %>%
do(tidy(glm(cbind(.$m_col,.$u_col) ~ .$group, family="binomial"))) %>%
filter(term == ".$group")
Which gives me some p-values for each level:
> res
# A tibble: 2 x 6
# Groups: level [2]
level term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 g01 .$group 0.687 0.746 0.921 0.357
2 g02 .$group 0.758 0.296 2.56 0.0105
I can then ask how many p<0.05
length(which(res$p.value < 0.05)
I would now like to permute the data, repeat the binomial test, ask how many p's < 0.05 and then store that value, and then repeat 999 more times.
HOWEVER, the permutation needs to shuffle the 'group' column within each 'level'. I'm struggling to find a way to do this, so for example one permutation would look like this
id level m_col u_col group
1 1 g01 1 11 1
2 2 g01 2 12 0
3 3 g01 3 13 1
4 4 g01 4 14 0
5 1 g02 11 21 1
6 2 g02 22 22 0
7 3 g02 33 23 1
8 4 g02 44 24 0
A second would look like
id level m_col u_col group
1 1 g01 1 11 0
2 2 g01 2 12 1
3 3 g01 3 13 1
4 4 g01 4 14 0
5 1 g02 11 21 0
6 2 g02 22 22 1
7 3 g02 33 23 1
8 4 g02 44 24 0
etc
Having the test rely on 2 columns limits the shuffle options and I'm stumped. I would appreciate any advice.
If you want a dataframe you may try this:
library(tidyverse)
map_dfr(1:1000, ~ df %>%
group_by(level) %>%
mutate(group = group[sample(row_number())]) %>% # permutation shuffle the 'group' column within each 'level'.
do(tidy(glm(cbind(.$m_col,.$u_col) ~ .$group, family="binomial"))) %>%
filter(term == ".$group") %>%
ungroup() %>%
summarise(sum(p.value < 0.05))) # ask how many p<0.05
and if you want a vector:
map_dbl(1:1000, ~ df %>%
group_by(level) %>%
mutate(group = group[sample(row_number())]) %>% # permutation shuffle the 'group' column within each 'level'.
do(tidy(glm(cbind(.$m_col,.$u_col) ~ .$group, family="binomial"))) %>%
filter(term == ".$group") %>%
ungroup() %>%
summarise(sum(p.value < 0.05)) %>% # ask how many p<0.05
pull())