I am very new to use DESeq2 and R language for processing sequencing data. I have R and R-studio installed successfully, together with all necessary packages. But when I followed some tutorial to understand the design and contrast, I found myself lost. Here is the link to the tutorial: https://www.r-bloggers.com/2024/05/a-guide-to-designs-and-contrasts-in-deseq2/
My specific question is -- following the instruction, I got a result file called res.
res <- results(dds1, contrast = list("conditiontreatment", "conditioncontrol"))
Here I get res with a dimension of (1000,6)
I want to select the genes that have a p-value < 0.05 in the result file (res), so I set up the following logical vector:
significant_p <- res$pvalue < 0.05
when I tried to use this logical vector to select the res file, it said that there are na values in res, so I dropped all the na values by doing the following:
filtered_res <- na.omit(res)
Now I can use the logical vector on the filtered_res to select for the genes' expression with a pvalue < 0.05. However, the system returned the following error:
filtered_res[significant_p]
Error: subscript is a logical vector with out-of-bounds TRUE values
But when I used the logical vector with dds1, then it worked.
dds1[significant_p]
class: DESeqDataSet
dim: 465 6
metadata(1): version
assays(4): counts mu H cooks
rownames(465): gene1 gene2 ... gene996 gene997
rowData names(25): trueIntercept trueBeta ... deviance maxCooks
colnames(6): sample1 sample2 ... sample5 sample6
colData names(2): condition sizeFactor
I don't quite understand the datastructure here very well. Obviously I hope that the logical vector would work on the res file, instead of the original file dds1. Can anyone help explain a little to me?
Thank you very much!
The problem is that filtered_res[significant_p]
is doing a subset on columns, not rows as you intend to do. The filtered_res[significant_p]
is equivalent to filtered_res[,significant_p]
but you need rowwise subsetting which would be filtered_res[significant_p,]
. As there are more rows than columns, and you use the row indices to subset columns, you're running out-of-bounds.