rreplaceaggregate

Replace values from one vector in dataframe with other values, based on a condition in another vector, in R


I have a dataframe, "grp_wqdata, that is grouped by the vector "CHEMICAL_NAME":

structure(list(SAMPLE_ID = c("Sampe 1", "Sampe 2", "Sampe 3", 
"Sampe 4", "Sampe 5", "Sampe 6", "Sampe 7", "Sampe 8", "Sampe 9", 
"Sampe 10", "Sampe 11", "Sampe 12"), CHEMICAL_NAME = c("LEAD", 
"LEAD", "LEAD", "LEAD", "LEAD", "LEAD", "LEAD", "LEAD", "LEAD", 
"LEAD", "LEAD", "LEAD"), REPORT_RESULT_VALUE = c(0.769, 0.512, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5), REPORT_METHOD_DETECTION_LIMIT = c("0.0100", 
"0.0100", "0.500", "0.500", "0.500", "0.500", "0.500", "0.500", 
"0.500", "0.500", "0.500", "0.500"), DETECT_FLAG = c("Y", "Y", 
"N", "N", "N", "N", "N", "N", "N", "N", "N", "N")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -12L))

If the value in vector DETECT_FLAG is "N", then I would like to replace the value that is in REPORT_RESULT_VALUE on that row with the value in REPORT_RESULT_LIMIT in that row.

I have tried the following:

ifelse("N" %in% grp_wqdata$DETECT_FLAG,
       replace(grp_wqdata$REPORT_RESULT_VALUE, 
               grp_wqdata$REPORT_RESULT_VALUE[match(grp_wqdata$DETECT_FLAG == "N")],
               grp_wqdata$REPORT_RESULT_LIMIT[match(grp_wqdata$DETECT_FLAG == "N")]),
       -999)#end of ifelse

but I continue to get the error

Error in match(grp_wqdata$DETECT_FLAG == "N") : 
  argument "table" is missing, with no default

Does anyone have any suggestions for doing this without a for loop?


Solution

  • grp_wqdata <- structure(list(SAMPLE_ID = c("Sampe 1", "Sampe 2", "Sampe 3", 
    "Sampe 4", "Sampe 5", "Sampe 6", "Sampe 7", "Sampe 8", "Sampe 9", 
    "Sampe 10", "Sampe 11", "Sampe 12"), CHEMICAL_NAME = c("LEAD", 
    "LEAD", "LEAD", "LEAD", "LEAD", "LEAD", "LEAD", "LEAD", "LEAD", 
    "LEAD", "LEAD", "LEAD"), REPORT_RESULT_VALUE = c(0.769, 0.512, 
    5, 5, 5, 5, 5, 5, 5, 5, 5, 5), REPORT_METHOD_DETECTION_LIMIT = c("0.0100", 
    "0.0100", "0.500", "0.500", "0.500", "0.500", "0.500", "0.500", 
    "0.500", "0.500", "0.500", "0.500"), DETECT_FLAG = c("Y", "Y", 
    "N", "N", "N", "N", "N", "N", "N", "N", "N", "N")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -12L))
    
    
    grp_wqdata$REPORT_RESULT_VALUE <- ifelse(
         grp_wqdata$DETECT_FLAG == 'N',
         grp_wqdata$REPORT_METHOD_DETECTION_LIMIT, 
         grp_wqdata$REPORT_RESULT_VALUE)
    
       SAMPLE_ID CHEMICAL_NAME REPORT_RESULT_VALUE REPORT_METHOD_DETECTION_LIMIT DETECT_FLAG
       <chr>     <chr>         <chr>               <chr>                         <chr>      
     1 Sampe 1   LEAD          0.769               0.0100                        Y          
     2 Sampe 2   LEAD          0.512               0.0100                        Y          
     3 Sampe 3   LEAD          0.500               0.500                         N          
     4 Sampe 4   LEAD          0.500               0.500                         N          
     5 Sampe 5   LEAD          0.500               0.500                         N          
     6 Sampe 6   LEAD          0.500               0.500                         N          
     7 Sampe 7   LEAD          0.500               0.500                         N          
     8 Sampe 8   LEAD          0.500               0.500                         N          
     9 Sampe 9   LEAD          0.500               0.500                         N          
    10 Sampe 10  LEAD          0.500               0.500                         N          
    11 Sampe 11  LEAD          0.500               0.500                         N          
    12 Sampe 12  LEAD          0.500               0.500                         N  
    
    

    Here's how to understand understand what's going on.

    If DETECT_FLAG == 'N': Then take the value from REPORT_METHOD_DETECTION_LIMIT.

    Otherwise: Take the value from REPORT_RESULT_VALUE

    Also, you'll probably want to convert REPORT_VALUE_VALUE and REPORT_METHOD_DETECTION_LIMIT to numeric vectors.

    Also, as Jilber pointed out in the comments, we can use with to avoid having to use grp_wqdata$ over and over.

    grp_wqdata$REPORT_RESULT_VALUE <- with(grp_wqdata, 
                                           ifelse(
                                             DETECT_FLAG == 'N',
                                             REPORT_METHOD_DETECTION_LIMIT, 
                                             REPORT_RESULT_VALUE))
    

    When you pass a data.frame as the first argument to with, you can then refer to the columns in your data.frame without having to prefix them with the data.frame name and $ or [[.