rif-statementconditional-statementsswitch-statement

How can I change certain values in a column based on multiple conditions, leaving others as they are if the conditions aren't met?


I have a dataset that I frequently pull as it is updated and analyze in R. I am trying to figure out how I can change values in a column if they meet 2 conditions and otherwise keep the existing value. The dataset is "visitdata" and the two columns I'm working with for this issue are: od_witnessed (did they witness an overdose, yes/no/unknown/declined) and num_witnessed (the number of overdoses witnessed). Those entering data were under the impression they only needed to enter a number witnessed if it was >1.

So, if od_witnessed = yes AND num_witnessed = NA, then I want the value in num_witnessed to be 1. If those two conditions are not met, I want the existing number to remain.

Example of current data:

od_witnessed num_witnessed
Yes 3
Yes NA
No NA
Yes NA

Desired output:

od_witnessed num_witnessed
Yes 3
Yes 1
No NA
Yes 1

Ultimately, I want to be able to sum the num_witnessed column to get the total number of overdoses witnessed.

Here are some things I have tried (I'm relatively new to R): *of note, the od_witnessed column is listed as factor data w/ 4 levels "no", "yes", "not asked", "declined" and the num_witnessed column is numeric.

visitdata %>%
mutate(od_witnessed = case_when(od_witnessed = 1 & is.na(num_witnessed) ~ 1))

resulted in error: "error in 'mutate()': in argument: num_witnessed = case_when(od_witnessed = 1 & is.na(num_witnessed))) caused by error in 'case_when()': case 1 ('1 & is.na(num_witnessed)') must be a two-sided formula, not a logical vector

visitdata[which(
visitdata$od_witnessed == "Yes" & visitdata$num_witnessed(is.na)), c(visitdata$num_witnessed == 1)]

error attempt to apply non-function

if(visitdata$od_witnessed = 'yes'& visitdata$num_witnessed = 'NA') {visitdata$num_witnessed = 1}

error: unexpected '=' in "if(visitdata$od_witnessed ="

ifelse(visitdata$od_witnessed == "Yes" & is.na(visitdata$num_witnessed), 1, visitdata$num_witnessed)

code runs, but the NAs are still NAs

I also tried the codes from this post: dplyr replacing na values in a column based on multiple conditions None worked... it just produced a tibble with NAs highlighted red for EVERY column in the dataset. See attached image but *note variable names are different in my actual code, I simplified in this question for easier understanding.

Any advice or known code would be greatly appreciated!


Solution

  • Several options

    Using base R with ifelse and transform (operates on a copy of the data)

    transform(visitdata, num_witnessed = 
        ifelse(od_witnessed == "Yes" & is.na(num_witnessed), 1, num_witnessed))
      od_witnessed num_witnessed
    1          Yes             3
    2          Yes             1
    3           No            NA
    4          Yes             1
    

    using replace is a bit more performant than ifelse

    transform(visitdata, num_witnessed = 
        replace(num_witnessed, od_witnessed == "Yes" & is.na(num_witnessed), 1))
    

    or using with and overwriting the variable directly in place

    visitdata$num_witnessed <- with(visitdata, 
      ifelse(od_witnessed == "Yes" & is.na(num_witnessed), 1, num_witnessed))
    
    visitdata
      od_witnessed num_witnessed
    1          Yes             3
    2          Yes             1
    3           No            NA
    4          Yes             1
    

    or with case_when

    library(dplyr)
    
    visitdata %>% 
      mutate(num_witnessed = 
        case_when(od_witnessed == "Yes" & is.na(num_witnessed) ~ 1, 
                  .default = num_witnessed))
      od_witnessed num_witnessed
    1          Yes             3
    2          Yes             1
    3           No            NA
    4          Yes             1
    
    Data
    visitdata <- structure(list(od_witnessed = c("Yes", "Yes", "No", "Yes"),
    num_witnessed = c(3L, NA, NA, NA)), class = "data.frame", 
    row.names = c(NA, -4L))