I have a wide
df
with columns
representing the months
of many given years
and the changes of colour in each month
:
df <- data.frame(id = as.integer(c(123,124,125,126)),
no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
`Jan.2010` = as.character(c("green", "black", "pink", "grey")),
`Feb.2010` = as.character(c("green", "black", "pink", "grey")),
`Mar.2010` = as.character(c("green", "red", "pink", "grey")),
`Apr.2010` = as.character(c("green", "red", "pink", "grey")),
`May.2010` = as.character(c("green", "red", "pink", "grey")),
`Jun.2010` = as.character(c("green", "red", "pink", "grey")),
`Jul.2010` = as.character(c("green", "white", "pink", "grey")),
`Ago.2010` = as.character(c("red", "white", "pink", "grey")),
`Sep.2010` = as.character(c("red", "white", "pink", "grey")),
`Oct.2010` = as.character(c("red", "white", "pink", "grey")),
`Nov.2010` = as.character(c("red", "white", "pink", "grey")),
`Dez.2010` = as.character(c("red", "white", "grey", "blue"))
)
df
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123 May.2010 green green green green green green green red red red red red
2 124 <NA> black black red red red red white white white white white white
3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey grey grey grey blue
I want to apply NA
to each column
that contains a month
equal to and above that specified in column
'no_change'. This is the desired output
:
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
2 124 <NA> black black red red red red white white white white white white
3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
You can pivot
the format into a "long" format, and find out which rows should be turned into NA
.
library(tidyverse)
df %>%
pivot_longer(ends_with("2010")) %>%
group_by(id) %>%
mutate(value = ifelse(cumsum(name == no_change & !is.na(no_change)), NA, value)) %>%
pivot_wider() %>%
ungroup()
# A tibble: 4 × 14
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
2 124 NA black black red red red red white white white white white white
3 125 NA pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA