I need to compare the value in every four rows to one value (as mu) using a Wilcoxon signed rank sum test. For example if my data looks like this:
df1 <- c(0.205346764819837, 0.260927758796802, 0.243880102849495, 0.244549329012715,
0.122609277587968, 0.19381141911169, 0.0617801415941672, 0.217762671269064,
0.0513190799901377, 0.293455672572294, 0.222447254411609, 0.271001373674756,
0.00119756260786869, 0.119069423408827, -0.0164312634285513,
0.0446268183579303)
df2 <- c(0.23340509, 0.05959987, 0.17380963, 0.14517836)
I am using a wilcox.test
to compare each of the four values from df1 with one value as mu from df_stack2. Considering a df with just the first four rows it would be
wilcox.test(dfnew$A, mu=0.23340509)$p.value.
I realise I could group every four rows through using:
split(df, as.integer(gl(nrow(df) 4, nrow(df))))
I was hoping to adopt this for use in a mapply (so I could parallelise with future.apply due to the actual size of my dataframe), however, I am a little unsure as to how I could specify every four rows being compared to one value (in a separate dataframe) as mu?
You can create your group using rep()
and apply your function by group:
library(data.table)
setDT(dfnew)[, grp:=rep(1:(.N/4), each=4, length.out=.N)]
dfnew[, .(pval = wilcox.test(A, mu=df2[.BY$grp])$p.value), grp]
Output:
grp pval
<int> <num>
1: 1 0.875
2: 2 0.125
3: 3 0.875
4: 4 0.125
Similarly, using dplyr
:
dfnew %>%
group_by(grp = rep(1:(n()/4), each=4, length.out=n())) %>%
summarize(pval = wilcox.test(A,mu = df2[cur_group()$grp])$p.value)
Output:
grp pval
<int> <dbl>
1 1 0.875
2 2 0.125
3 3 0.875
4 4 0.125
There is another approach that you might find interesting:
setDT(dfnew)[, .(pval = wilcox.test(A, mu=.BY$mu)$p.value), .(mu = rep(df2, each=4))]
Output:
mu pval
<num> <num>
1: 0.23340509 0.875
2: 0.05959987 0.125
3: 0.17380963 0.875
4: 0.14517836 0.125