i have data frame where for a 1 site i have tumor
and normal
count data. I want to do fisher exact test using the count_unmethylated
and count_methylated
for tumor and normal for each position chromosome start end
.
so for the first position;
chromosome start end
1 10469 10469
i want to conduct fisher extact test this way
count_unmethylated count_methylated
norm 0 2
tum 1 3
and do it for the rest of loci chromosome start end
i tried solution from previous code with modification but didn't work: Row-wise Fisher Exact Test, grouped by samples in R
head(tumNorm_dt_merged_long) %>%
group_by(chromosome, start, end) %>%
summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(),
starts_with('count_'))), p.adjust.method = "BH"), ncol=2)) %>%
unnest_wider(data) %>%
unnest(c(group:p.adj.signif)) -> Fisher_result
my data looks like this
dput(head(tumNorm_dt_merged_long))
structure(list(chromosome = c("1", "1", "1", "1", "1", "1"),
start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
group = c("norm", "tum", "norm", "tum", "norm", "tum"), count_methylated = c(2,
3, 3, 2, 1, 2), count_unmethylated = c(0, 1, 0, 0, 1, 2),
methylation_percentage = c(100, 75, 100, 100, 50, 50)), row.names = c(NA,
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x130baa0>, sorted = c("chromosome",
"start", "end", "group"))
Here is a solution using base R. Split the data frame based on the start column, assumes just 2 rows per unique start value. The use the lapply loop to calculate the Fisher's test on columns 5 & 6.
tumNorm_dt_merged_long <- structure(list(chromosome = c("1", "1", "1", "1", "1", "1"),
start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
group = c("norm", "tum", "norm", "tum", "norm", "tum"),
count_methylated = c(2, 3, 3, 2, 1, 2),
count_unmethylated = c(0, 1, 0, 0, 1, 2),
methylation_percentage = c(100, 75, 100, 100, 50, 50)),
row.names = c(NA, -6L), class = c("data.table", "data.frame"), sorted = c("chromosome", "start", "end", "group"))
dflist <- split(tumNorm_dt_merged_long, tumNorm_dt_merged_long$start)
output <-lapply(dflist, function(x){
print(x)
results <- fisher.test(x[ , c(5,6)])
print(results)
results
})