I have some bulk-RNA sequencing data that I need to do differential expression significance testing on. I have two conditions, WT and KO, with two replicates each, giving me a dataframe that looks like the following (the columns are in counts):
WT1 WT2 KO1 KO2
gene1 1.3 1.23 3.42 3.45
gene2 2.6 2.54 1.22 1.21
gene3 5.54 2.32 1.21 1.10
My questions are, how do I get a column on the right with a p-value for each gene so that I can construct a Volcano plot of the data? Basically, what statistical test do I need to use to generate that column, and what function do I use in R to do so? I'm sorry if this isn't technically a question that I'm supposed to ask here, but frankly I didn't know where else to post. Thanks in advance!
just in case someone ends up caring about this question and I'm not just screaming into the ether (per the usual), I figured this out. Basically, for this kind of data I need to use either a one-way ANOVA test or a two-tailed t-test, which basically end up being the same thing (at least in this case). I decided to go with the t.test() function in R, as it's a little bit easier to understand (at least if you're not super familiar with statistics in R). Normally, the t.test function produces a summary that looks like this:
Welch Two Sample t-test
data: bulk_data[1, 1:2] and bulk_data[1, 3:4]
t = -0.93364, df = 1.1978, p-value = 0.5002
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3807992 0.3068266
sample estimates:
mean of x mean of y
0.09525708 0.13224335
I needed to remove the p-value object and add it to the fifth column of the data frame, so I used this loop:
for (i in 1:nrow(bulk_data)) {
t <- t.test(x = bulk_data[i, 1:2], y = bulk_data[i, 3:4], alternative = "two.sided")
bulk_data[i, 5] <- t$p.value
}
This gave me a very nice list of p-values in the fifth column.