I have a very large data frame consisting of two variables (here A and B) and 134,000 observations (134000/4 = 33500 groups).
I'm a bit uncertain as to how to get my code to run a paired wilcox.test, but when applied to every four rows. As example data, I want to compare A vs B but considering rows 1:4 for the first output, 5:8 for the second and 9:12 for the third.
df1 <- as.data.frame(cbind(A = c(0.67, 0.45,0.76, 0.67, 0.56, 0.88, 0.34, 0.56, 0.35, 0.45, 0.67, 0.87),
B = c(0.45, 0.54, 0.67, 0.86, 0.23, 0.56, 0.34, 0.66, 0.21, 0.55, 0.56, 0.45)))
##for one row only
check <- wilcox.test(unlist(df1[1:4, 1]), unlist(df1[5:8, 2]))
I can see there are examples whereby the dataframe is in wide format ( so would be A1, A2, A3, A4, B1, B2, B3, B4) Run wilcoxon rank sum test on each row of a data frame, but I would prefer to keep it in long format if possible.
Any guidance would be greatly appreciated.
We could split by a grouping created with gl
and apply the wilcox.test
on each of the list
element
lapply(split(df1, as.integer(gl(nrow(df1), 4, nrow(df1)))),
function(x) wilcox.test(x$A, x$B))