rdataframedplyrfilterrbind

how to select rows in which there is more than 1 value across columns


I have a dataframe:

dput(gene1[1:5,1:5])
structure(list(en_Adipose_Subcutaneous.db = c(0.0531016390078734, 
-0.00413407782001034, -0.035434632568444, 0.00968736935965742, 
0.0523714252287003), en_Adipose_Visceral_Omentum.db = c(0, 0, 
0, 0, 0), en_Adrenal_Gland.db = c(0, 0, 0, 0, 0), en_Artery_Aorta.db = c(0, 
0, 0, 0, 0), en_Artery_Coronary.db = c(0, 0, 0, 0, 0)), row.names = c("rs1041770", 
"rs12628452", "rs915675", "rs11089130", "rs36061596"), class = "data.frame")

I want to select only those rows for which atleast there is value in more than 2 columns. And remove those rows for which there is a value only in one column. I wrote this code:

one_tissueonly <- NULL
for(i in 1:552){
y <- which(gene1[i,]!=0)  ## >1 means more than one col 
if(length(y)>1){  ##select only for one col:
  value <- gene1[i,]
}
one_tissueonly <- rbind(one_tissueonly,value)
}

But it generate some same rows: for the first value using rbind function:

dput(one_tissueonly[1:5,1:5])
structure(list(en_Adipose_Subcutaneous.db = c(0.0531016390078734, 
0.0531016390078734, 0.0531016390078734, 0.00968736935965742, 
0.0523714252287003), en_Adipose_Visceral_Omentum.db = c(0, 0, 
0, 0, 0), en_Adrenal_Gland.db = c(0, 0, 0, 0, 0), en_Artery_Aorta.db = c(0, 
0, 0, 0, 0), en_Artery_Coronary.db = c(0, 0, 0, 0, 0)), row.names = c("rs1041770", 
"rs10417701", "rs10417702", "rs11089130", "rs36061596"), class = "data.frame")

Output file looks like this: enter image description here Does anyone know how to solve this. Thank you.


Solution

  • Following Gregor Thomas advice (modified since you want two or more tissues that show the same marker)

    # using bracets:
    gene <- gene[rowSums(gene != 0) > 1, ]
    # using subset()
    gene <- subset(gene,rowSums(gene != 0) > 1)