I have a dataset that I frequently pull as it is updated and analyze in R. I am trying to figure out how I can change values in a column if they meet 2 conditions and otherwise keep the existing value. The dataset is "visitdata" and the two columns I'm working with for this issue are: od_witnessed (did they witness an overdose, yes/no/unknown/declined) and num_witnessed (the number of overdoses witnessed). Those entering data were under the impression they only needed to enter a number witnessed if it was >1.
So, if od_witnessed = yes AND num_witnessed = NA, then I want the value in num_witnessed to be 1. If those two conditions are not met, I want the existing number to remain.
Example of current data:
od_witnessed | num_witnessed |
---|---|
Yes | 3 |
Yes | NA |
No | NA |
Yes | NA |
Desired output:
od_witnessed | num_witnessed |
---|---|
Yes | 3 |
Yes | 1 |
No | NA |
Yes | 1 |
Ultimately, I want to be able to sum the num_witnessed column to get the total number of overdoses witnessed.
Here are some things I have tried (I'm relatively new to R): *of note, the od_witnessed column is listed as factor data w/ 4 levels "no", "yes", "not asked", "declined" and the num_witnessed column is numeric.
visitdata %>%
mutate(od_witnessed = case_when(od_witnessed = 1 & is.na(num_witnessed) ~ 1))
resulted in error: "error in 'mutate()': in argument: num_witnessed = case_when(od_witnessed = 1 & is.na(num_witnessed))) caused by error in 'case_when()': case 1 ('1 & is.na(num_witnessed)') must be a two-sided formula, not a logical vector
visitdata[which(
visitdata$od_witnessed == "Yes" & visitdata$num_witnessed(is.na)), c(visitdata$num_witnessed == 1)]
error attempt to apply non-function
if(visitdata$od_witnessed = 'yes'& visitdata$num_witnessed = 'NA') {visitdata$num_witnessed = 1}
error: unexpected '=' in "if(visitdata$od_witnessed ="
ifelse(visitdata$od_witnessed == "Yes" & is.na(visitdata$num_witnessed), 1, visitdata$num_witnessed)
code runs, but the NAs are still NAs
I also tried the codes from this post: dplyr replacing na values in a column based on multiple conditions None worked... it just produced a tibble with NAs highlighted red for EVERY column in the dataset. See attached image but *note variable names are different in my actual code, I simplified in this question for easier understanding.
Any advice or known code would be greatly appreciated!
Several options
Using base R with ifelse
and transform
(operates on a copy of the data)
transform(visitdata, num_witnessed =
ifelse(od_witnessed == "Yes" & is.na(num_witnessed), 1, num_witnessed))
od_witnessed num_witnessed
1 Yes 3
2 Yes 1
3 No NA
4 Yes 1
using replace
is a bit more performant than ifelse
transform(visitdata, num_witnessed =
replace(num_witnessed, od_witnessed == "Yes" & is.na(num_witnessed), 1))
or using with
and overwriting the variable directly in place
visitdata$num_witnessed <- with(visitdata,
ifelse(od_witnessed == "Yes" & is.na(num_witnessed), 1, num_witnessed))
visitdata
od_witnessed num_witnessed
1 Yes 3
2 Yes 1
3 No NA
4 Yes 1
or with case_when
library(dplyr)
visitdata %>%
mutate(num_witnessed =
case_when(od_witnessed == "Yes" & is.na(num_witnessed) ~ 1,
.default = num_witnessed))
od_witnessed num_witnessed
1 Yes 3
2 Yes 1
3 No NA
4 Yes 1
visitdata <- structure(list(od_witnessed = c("Yes", "Yes", "No", "Yes"),
num_witnessed = c(3L, NA, NA, NA)), class = "data.frame",
row.names = c(NA, -4L))