I'm trying to make this function work, but am failing. What I need is a function that reads the names from a dataframe columns and uses them to perform a Wilcoxon test on each of those columns. "result" would be the main final product, a table with the genus names and their p-values on each row. I've added also a plotting feature for visualizing the values among groups for each column, that I would save naming them after the corresponding genus.
library("dplyr")
library("ggpubr")
library(PairedData)
library(tidyr)
process <- function(data, genus){
group_by(data,group) %>%summarise(
count = n(),
median = median(genus, na.rm = TRUE),
IQR = IQR(genus, na.rm = TRUE)
)
# Subset data before and after treatment
T0 <- subset(data, group == "T0", genus,drop = TRUE)
T2 <- subset(data, group == "T2", genus,drop = TRUE)
#Wilcoxon test for paired data, I want a table of names and corresponding p-values
res <- wilcox.test(T0, T2, paired = TRUE)
res$p.value
result <- spread(genus,res$p.value)
# Plot paired data, with title depending on the data and its p-value (this last one could be optional)
pd <- paired(T0, T2)
tiff(genus".tiff", width = 600, height = 400)
plot(pd, type = "profile") + labs(title=print(data[,genus]", paired p-value="res[,p.value]) +theme_bw()
dev.off()
}
l <- length(my_data)
glist <- list(colnames(my_data[3:l])) #bacteria start at col 3
wilcoxon <- process(data = my_data, genus = glist)
A reproducible dataset could be
my_data
Patient group Subdoligranulum Agathobacter
pt_10T0 T0 0.02 0.00
pt_10T2 T2 10.71 19.89
pt_15T0 T0 29.97 0.28
pt_15T2 T2 16.10 7.70
pt_20T0 T0 2.39 0.44
pt_20T2 T2 20.48 3.35
pt_32T0 T0 12.23 0.17
pt_32T2 T2 37.11 1.87
pt_36T0 T0 0.64 0.03
pt_36T2 T2 0.02 0.08
pt_39T0 T0 0.04 0.01
pt_39T2 T2 0.36 0.05
pt_3t0 T0 13.23 1.34
pt_3T2 T2 19.22 1.51
pt_9T0 T0 11.69 0.57
pt_9T2 T2 34.56 3.52
I'm not very familiar with functions, and haven't found yet a good tutorial on how to make them from a dataframe... so this is my best attempt, I hope some of you can make it work. Thank you for the help!
Simply, return
the needed value at end of processing. Below does not test the plot step (with unknown packages) but adjusted for proper R grammar:
proc_wilcox <- function(data, genus){
# Subset data before and after treatment
T0 <- data[[genus]][data$group == "T0"]
T2 <- data[[genus]][data$group == "T2"]
# Wilcoxon test for paired data
res <- wilcox.test(T0, T2, paired = TRUE)
# Plot paired data, with title depending on the data and its p-value
# pd <- paired(T0, T2)
# tiff(paste0(genus, ".tiff"), width = 600, height = 400)
# plot(pd, type = "profile") +
# labs(title=paste0(genus, " paired p-value= ", res$p.value)) +
# theme_bw()
# dev.off()
return(res$p.value)
}
Then, call the method with an apply function such as sapply
or slightly faster vapply
designed to process across iterables and return same length.
# VECTOR OF RESULTS (USING sapply)
wilcoxon_results <- sapply(
names(my_data)[3:ncol(my_data)],
function(col) proc_wilcox(my_data, col)
)
# VECTOR OF RESULTS (USING vapply)
wilcoxon_results <- vapply(
names(my_data)[3:ncol(my_data)],
function(col) proc_wilcox(my_data, col),
numeric(1)
)
wilcoxon_results
# Subdoligranulum Agathobacter
# 0.1484375 0.0078125
wilcoxon_df <- data.frame(wilcoxon_results)
wilcoxon_df
# wilcoxon_results
# Subdoligranulum 0.1484375
# Agathobacter 0.0078125