I have a dataset in long format with multiple groups that I need to do pre and post intervention hypothesis testing for each group.
I'm trying to do this by grouping at the group level and carrying out the test on the value and time point, though for some reason the p values I'm getting don't make any sense. They're all the same... see below example:
# Load the required library
library(dplyr)
# Set seed for reproducibility
set.seed(123)
# Create a dataframe with unique ids, timepoints, foodgroups, and values
data <- data.frame(
id = rep(1:10, each = 2), # Increased sample size
timepoint = rep(c("before", "after"), times = 100),
group = rep(c("A", "B", "C", "D", "E"), each = 40), # Adjusted for larger sample size
value = rnorm(200) # Generating random values for illustration
)
# Perform t-test for each foodgroup
result <- data %>%
group_by(group) %>%
summarise(
p_value = wilcox.test(value ~ timepoint, data = ., paired = TRUE)$p.value
)
# Print the results
print(result)
For example if I just select the group as below, I get a unique and presumably accurate p-value.
I guess there's some issue with how I am grouping them?
# Perform t-test for each foodgroup
result <- data %>%
filter(group=='B') %>%
summarise(
p_value = wilcox.test(value ~ timepoint, data = ., paired = TRUE)$p.value
)
# Print the results
print(result)
Can anyone recommend identify the issue in this or suggest a better way to achieve this?
wilcox.test()
ignores tibble grouping so your code actually computes this:
wilcox.test(value ~ timepoint, data=data, paired=T)$p.value
# [1] 0.4340859
You can achieve what you want by applying wilcox.test()
to the data subsets like this:
sapply(split(data, ~ group),
\(gr) wilcox.test(value ~ timepoint, data=gr, paired=T)$p.value)
# A B C D E
# 0.3883762 0.8123550 0.5458755 0.2773552 0.6215134
dplyr
We can use group_modify()
to iterate over the groups:
data %>%
group_by(group) %>%
group_modify(~ {
wilcox.test(value ~ timepoint, data=., paired=T)$p.value %>%
data.frame()
}) %>%
set_names(c('group', 'p_value'))
# # A tibble: 5 × 2
# # Groups: group [5]
# group p_value
# <chr> <dbl>
# 1 A 0.388
# 2 B 0.812
# 3 C 0.546
# 4 D 0.277
# 5 E 0.622