rscalerowsbioinformaticsnormalizing

How to exclude certain rows from scale() normalizing calculations in R?


I am trying to graph some sequencing data and want to exclude Chromosome 4 data (where the rows in the first column have a '4') from only the scaling calculation. Chromosome 4 may skew the normalizing mean/Sd calculations, so I want to exclude it from my scale() function. Is there any way to do that? Right now, I have:

preMBT_RT <-preMBT_RT %>% mutate_each_(funs(scale(.) %>% as.vector),vars=c("Timing"))

^But is there any way I can indicate IN that function to exclude rows with '4' in the first column?? I still want the new data frame to have scaled rows with '4', I just want the calculation in scale() to not use Chromosome 4 data. Any help is much appreciated- thanks!

Here is a sample of what the data frame looks like in brief:

Chromosome     Location     Replication Timing
1              3748         -0.0001
4              1847101      0.000302   <-row I would want to exclude
20             1234         0.000102
...            ...          ...

Solution

  • We could replace the 'Timing' that correspond to 'Chromosome' 4 as NA and then do the scale

    preMBT_RT %>%
           mutate(Timing =  scale(Timing *NA^(Chromosome =="4")))
    

    If we need to exclude the values in scale while keeping the original value for 'Timing'

    preMBT_RT %>% 
       mutate(Timing =  ifelse(Chromosome =="4", Timing, scale(Timing[Chromosome != "4"])))