rbioinformaticsrna-seq

Questions about DESeq2 and setting up logical vectors to select a result file


I am very new to use DESeq2 and R language for processing sequencing data. I have R and R-studio installed successfully, together with all necessary packages. But when I followed some tutorial to understand the design and contrast, I found myself lost. Here is the link to the tutorial: https://www.r-bloggers.com/2024/05/a-guide-to-designs-and-contrasts-in-deseq2/

My specific question is -- following the instruction, I got a result file called res.

res <- results(dds1, contrast = list("conditiontreatment", "conditioncontrol"))

Here I get res with a dimension of (1000,6)

I want to select the genes that have a p-value < 0.05 in the result file (res), so I set up the following logical vector: significant_p <- res$pvalue < 0.05 when I tried to use this logical vector to select the res file, it said that there are na values in res, so I dropped all the na values by doing the following: filtered_res <- na.omit(res)

Now I can use the logical vector on the filtered_res to select for the genes' expression with a pvalue < 0.05. However, the system returned the following error:

filtered_res[significant_p]
Error: subscript is a logical vector with out-of-bounds TRUE values

But when I used the logical vector with dds1, then it worked.

dds1[significant_p]
class: DESeqDataSet 
dim: 465 6 
metadata(1): version
assays(4): counts mu H cooks
rownames(465): gene1 gene2 ... gene996 gene997
rowData names(25): trueIntercept trueBeta ... deviance maxCooks
colnames(6): sample1 sample2 ... sample5 sample6
colData names(2): condition sizeFactor

I don't quite understand the datastructure here very well. Obviously I hope that the logical vector would work on the res file, instead of the original file dds1. Can anyone help explain a little to me?

Thank you very much!


Solution

  • The problem is that filtered_res[significant_p] is doing a subset on columns, not rows as you intend to do. The filtered_res[significant_p] is equivalent to filtered_res[,significant_p] but you need rowwise subsetting which would be filtered_res[significant_p,]. As there are more rows than columns, and you use the row indices to subset columns, you're running out-of-bounds.