rstatisticsr-markdowndata-manipulationdata-management

R - adding a new column based on binary data across many columns


I cannot get my data frame to add an additional column. I have reviewed so many stack overflows, but here is a subset (Adding a new column in a matrix in R, adding new column to data frame in R, new column not added to dataframe in R,R: complete a dataset with a new column added, R: add a new column to dataframes from a function)

I need a single column that tells us if there is a positive or "1" in any of the viral rows I have.

I am trying to determine probability and from what I see, I will need this column to do further calculations, so please help if able!

Sample data

Filovirus (MOD) PCR   :    Phlebo (Sanchez-Seco) PCR
0                          0         
0                          1            
0                          0            
0                          0        
0                          0         
0                          0        
0                          0       
0                          0         
0                          0        
0                          0   


species code  forest site
<fctr>  <dbl> <fctr>
SM      1     UMNP-mangabey
SM      1     UMNP-mangabey
RC      9     UMNP-hondohondoc
BWC     9     UMNP-hondohondod
BWC     9     UMNP-hondohondod
BWC     9     UMNP-hondohondod
BWC     9     UMNP-hondohondod
BWC     9     UMNP-hondohondod
BWC     9     UMNP-hondohondod
BWC     9     UMNP-hondohondod

The closest I have gotten is getting base R to call which rows have the positive value

I followed the solution here but have yet to get it to work for me.

tmp=which(data==1,arr.ind=T)    
tmp=tmp[order(tmp[,"row"]),]
c("positive","negative")[tmp[,"col"]] -> data$new

Any advice is greatly appreciated.

Dput

structure(list(`Filovirus (MOD) PCR` = c("0", "0", "0", "0", 
"0", "0", "0", "0", "0", "0"), `Filovirus (A) PCR` = c("0", "0", 
"0", "0", "0", "0", "0", "0", "0", "0"), `Filovirus (B) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Filo C PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Filovirus (D) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Coronavirus   (Quan) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Coronavirus (Watanabe) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Paramyxo  (Tong)  PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Flavivirus Moureau PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Flavivirus  Sanchez-seco PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Arena Lozano 1 PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Retrovirus Courgnard PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Simian Foamy Goldberg (Pol) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Simian Foamy Goldberg (LTR Region) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Influenza (Anthony) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Influenza (Liang) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Rhabdo (CII) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Enterovirus CII I PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Enterovirus CII-II PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Alphav   (Sanchez-Seco) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Lyssavirus (Vasquez-Moron) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Seadornavirus (CII) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Hantavirus (Raboni) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Hantavirus (Klempa) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Nipah (Wacharapleusadee) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Henipa (Feldman) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Bunya S (Briese) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Bunya L (Briese) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), `Phlebo (Sanchez-Seco) PCR` = c("0", 
"0", "0", "0", "0", "0", "0", "0", "0", "0"), species = structure(c(3L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("SM", "SY", "BWC", 
"YB", "RC"), class = "factor"), code = c(2, 5, 5, 5, 5, 5, 5, 
5, 5, 5), forestsite = structure(c(3L, 14L, 14L, 14L, 14L, 14L, 
14L, 14L, 14L, 14L), .Label = c("Magombera1", "Magombera2", "NDUFR", 
"Ndundulu1", "Ndundulu2", "Ndundulu3", "Nyumbanitu", "UMNP-campsite3", 
"UMNP-hondohondoa", "UMNP-hondohondob", "UMNP-hondohondoc", "UMNP-hondohondod", 
"UMNP-hondohondoe", "UMNP-HQ", "MamaGoti", "UMNP-mangabey", "UMNP-njokamoni", 
"UMNP-Sanje1", "UMNP-Sanje2", "UMNP-Sanje3", "Sonjo", "SonjoRoad"
), class = "factor")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • Update: Your 0 and 1 are character type. Transforming to number with type.convert(as.is = TRUE) will make the code work:

    library(dplyr)
    
    df %>%
      type.convert(as.is=TRUE) %>% 
      mutate(new_column = if_else(rowSums(select(., contains("PCR"))) > 0, "positive", "negative"))
    
       Filovirus (…¹ Filov…² Filov…³ Filo …⁴ Filov…⁵ Coron…⁶ Coron…⁷ Param…⁸ Flavi…⁹ Flavi…˟ Arena…˟ Retro…˟ Simia…˟ Simia…˟ Influ…˟
               <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
     1             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     2             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     3             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     4             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     5             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     6             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     7             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     8             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
     9             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
    10             0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
    # … with 18 more variables: `Influenza (Liang) PCR` <int>, `Rhabdo (CII) PCR` <int>, `Enterovirus CII I PCR` <int>,
    #   `Enterovirus CII-II PCR` <int>, `Alphav   (Sanchez-Seco) PCR` <int>, `Lyssavirus (Vasquez-Moron) PCR` <int>,
    #   `Seadornavirus (CII) PCR` <int>, `Hantavirus (Raboni) PCR` <int>, `Hantavirus (Klempa) PCR` <int>,
    #   `Nipah (Wacharapleusadee) PCR` <int>, `Henipa (Feldman) PCR` <int>, `Bunya S (Briese) PCR` <int>,
    #   `Bunya L (Briese) PCR` <int>, `Phlebo (Sanchez-Seco) PCR` <int>, species <chr>, code <int>, forestsite <chr>,
    #   new_column <chr>, and abbreviated variable names ¹​`Filovirus (MOD) PCR`, ²​`Filovirus (A) PCR`, ³​`Filovirus (B) PCR`,
    #   ⁴​`Filo C PCR`, ⁵​`Filovirus (D) PCR`, ⁶​`Coronavirus   (Quan) PCR`, ⁷​`Coronavirus (Watanabe) PCR`, …
    # ℹ Use `colnames()` to see all variable names
    

    First answer: The dplyr pendant would be: Data taken from @langtang(many thanks):

    library(dplyr)
    
    df %>%
      mutate(new_column = if_else(rowSums(select(., contains("PCR"))) > 0, "positive", "negative"))
    
    
       species code      forest_site Filovirus (MOD) PCR Phlebo (Sanchez-Seco) PCR
    1       SM    1    UMNP-mangabey            negative                  negative
    2       SM    1    UMNP-mangabey            negative                  positive
    3       RC    9 UMNP-hondohondoc            negative                  negative
    4      BWC    9 UMNP-hondohondod            negative                  negative
    5      BWC    9 UMNP-hondohondod            negative                  negative
    6      BWC    9 UMNP-hondohondod            negative                  negative
    7      BWC    9 UMNP-hondohondod            negative                  negative
    8      BWC    9 UMNP-hondohondod            negative                  negative
    9      BWC    9 UMNP-hondohondod            negative                  negative
    10     BWC    9 UMNP-hondohondod            negative                  negative