rsequencingstatistical-test

Trouble Deciding How to Test for Variance in Bulk RNA sequencing Data


I have some bulk-RNA sequencing data that I need to do differential expression significance testing on. I have two conditions, WT and KO, with two replicates each, giving me a dataframe that looks like the following (the columns are in counts):

       WT1   WT2   KO1   KO2
 gene1 1.3   1.23  3.42  3.45
 gene2 2.6   2.54  1.22  1.21
 gene3 5.54  2.32  1.21  1.10 

My questions are, how do I get a column on the right with a p-value for each gene so that I can construct a Volcano plot of the data? Basically, what statistical test do I need to use to generate that column, and what function do I use in R to do so? I'm sorry if this isn't technically a question that I'm supposed to ask here, but frankly I didn't know where else to post. Thanks in advance!


Solution

  • just in case someone ends up caring about this question and I'm not just screaming into the ether (per the usual), I figured this out. Basically, for this kind of data I need to use either a one-way ANOVA test or a two-tailed t-test, which basically end up being the same thing (at least in this case). I decided to go with the t.test() function in R, as it's a little bit easier to understand (at least if you're not super familiar with statistics in R). Normally, the t.test function produces a summary that looks like this:

     Welch Two Sample t-test
    
     data:  bulk_data[1, 1:2] and bulk_data[1, 3:4]
     t = -0.93364, df = 1.1978, p-value = 0.5002
     alternative hypothesis: true difference in means is not equal to 0
     95 percent confidence interval:
      -0.3807992  0.3068266
     sample estimates:
      mean of x  mean of y 
     0.09525708 0.13224335 
    

    I needed to remove the p-value object and add it to the fifth column of the data frame, so I used this loop:

      for (i in 1:nrow(bulk_data)) {
       t <- t.test(x = bulk_data[i, 1:2], y = bulk_data[i, 3:4], alternative = "two.sided")
       bulk_data[i, 5] <- t$p.value
      }
    

    This gave me a very nice list of p-values in the fifth column.