I have a dataframe as follows:
df = structure(list(aa = c(1L, 5L, 8L, 10L, 1L, 10L, 8L, 6L, 7L, 4L,
1L, 5L, 7L, 7L, 5L, 8L), bb = c(2L, 9L, 1L, 10L, 8L, 7L, 10L,
8L, 1L, 7L, 2L, 10L, 3L, 5L, 2L, 10L), cc = c(1L, 5L, 9L, 4L,
9L, 1L, 8L, 3L, 2L, 2L, 2L, 5L, 7L, 2L, 2L, 3L), dd = c(10L,
5L, 8L, 10L, 6L, 8L, 7L, 5L, 2L, 9L, 10L, 6L, 5L, 3L, 7L, 8L),
ee = c(5L, 7L, 5L, 1L, 8L, 4L, 5L, 2L, 10L, 6L, 8L, 10L,
6L, 5L, 10L, 6L), Group = c("High", "High", "High", "High",
"High", "High", "High", "High", "Low", "Low", "Low", "Low",
"Low", "Low", "Low", "Low")), class = "data.frame", row.names = c(NA,
-16L))
I want to calculate pvalue for each column based on the Group
mentioned in the table.
my expected output is:
values pvalue t mean in High mean in Low
aa 0.08 0.41523 6.8 5
bb 0.89 1.41523 6.8 4
cc 0.088 2.41523 2.3 8
dd 0.89 3.41523 9.6 2
ee 0.76 4.41523 4.3 5
I tried following code to generate the pvalue:
# Compute t-test
res <- t.test(aa ~ Group, data = df)
res
It results as:
Welch Two Sample t-test
data: aa by Group
t = 0.41523, df = 11.794, p-value = 0.6854
alternative hypothesis: true difference in means between group High and group Low is not equal to 0
95 percent confidence interval:
-2.660919 3.910919
sample estimates:
mean in group High mean in group Low
6.125 5.500
want <- c('p.value','estimate', 'statistic')
t(sapply(head(names(df),-1),\(x)unlist(t.test(reformulate('Group', x), df)[want])))
p.value estimate.mean in group High estimate.mean in group Low statistic.t
aa 0.6854296 6.125 5.500 0.4152274
bb 0.3093107 6.875 5.000 1.0550233
cc 0.1938533 5.000 3.125 1.3833764
dd 0.3738283 7.375 6.250 0.9219951
ee 0.0177543 4.625 7.625 -2.6880860
pivot_longer(df,-Group) %>%
group_by(name)%>%
summarise(mod = list(unlist(t.test(value~Group)[want])))%>%
unnest_wider(mod)
# A tibble: 5 × 5
name p.value `estimate.mean in group High` `estimate.mean in group Low` statistic.t
<chr> <dbl> <dbl> <dbl> <dbl>
1 aa 0.685 6.12 5.5 0.415
2 bb 0.309 6.88 5 1.06
3 cc 0.194 5 3.12 1.38
4 dd 0.374 7.38 6.25 0.922
5 ee 0.0178 4.62 7.62 -2.69