Suppose I have a table of around 1M rows with the following format:
id paid_2000 paid_2001 paid_2002 paid_2003 censor_yr
1 10 20 10 20 2001
2 15 25 15 15 2003
What would be an efficient way to set each observation to NA where the paid year is greater than or equal to the censor year? In particular, I would like the table to look like this:
id paid_2000 paid_2001 paid_2002 paid_2003 censor_yr
1 10 NA NA NA 2001
2 15 25 15 NA 2003
Using dplyr
:
library(dplyr)
df %>%
gather(paid_yr, value, grep("paid", names(.))) %>%
mutate(value = ifelse(as.numeric(gsub(".*_", "", paid_yr)) >= censor_yr,
NA, value)) %>%
spread(paid_yr, value)
Throw a %>% select
chain at the bottom to move censor_yr
back to the end.
Can explain how it works if you'd like. May or may not be easier to read than akrun's answers.