rdplyrfilterr-rownames

how to filter dataframe based on rownames using strsplit


I have a dataframe:

dput(gene_exp[1:5,1:5])
structure(list(en_Adipose_Subcutaneous.db = c(0.0531016390078734, 
-0.00413407782001034, -0.035434632568444, 0.00968736935965742, 
0.0523714252287003), en_Adipose_Visceral_Omentum.db = c(0, 0, 
0, 0, 0), en_Adrenal_Gland.db = c(0, 0, 0, 0, 0), en_Artery_Aorta.db = c(0, 
0, 0, 0, 0), en_Artery_Coronary.db = c(0, 0, 0, 0, 0)), row.names = c("rs1041770_ENSG00000283633.1", 
"rs12628452_ENSG00000283633.1", "rs915675_ENSG00000283633.1", 
"rs11089130_ENSG00000283633.1", "rs36061596_ENSG00000283633.1"
), class = "data.frame")

I want to filter this dataframe for gene1 only: I wrote this code:

gene <- gene_exp %>% filter(unlist(strsplit(rownames(gene_exp), "_")) %in% "ENSG00000283633.1")
Error in `filter()`:
ℹ In argument: `unlist(strsplit(rownames(gene_exp), "_")) %in%
  "ENSG00000283633.1"`.
Caused by error:

! `..1` must be of size 5956 or 1, not size 11902.
Run `rlang::last_trace()` to see where the error occurred.

Is there any other way to solve this. Thank you.


Solution

  • I don't recommend using rownames if you are using the tidyverse, since one of the tidy principles is that data should live in columns (and so not in other attributes like rownames). I would add the gene name into my data as a proper column, then filter on that. So, for example:

    library(tidyverse)
    
    your_data %>% 
      rownames_to_column() %>% 
      separate(rowname, into = c('rs', 'gene_name'), sep = '_') %>% 
      filter(gene_name == 'ENSG00000283633.1')