rdataframeif-statementdummy-variable

Create a dummy variable based on two variables x1 and x2 (dummy=x1 only if at least one adjacent x2=yes)


Below is the code to generate the dataframe for demonstration:

d<-data.frame(x1=c(rep("no",5),rep("yes",4),rep("no",2),rep("yes",3),rep("no",2),rep("yes",3)),
              x2=c(rep("no",6),rep("yes",1),rep("no",9),rep("yes",2),rep("no",1)),
              dummy=c(rep(0,5),rep(1,4),rep(0,5),rep(0,2),rep(1,3)))

I have two variables x1 and x2. What I want is a dummy variable, named as 'dummy', based on both x1 and x2 indicators. Specifically, the dummy should equal 1 by capturing all x1=yes values conditional that at least one of its adjacent x2=yes. If x1=yes but its adjacent x2=no, then the dummy should be 0.

It is easy to create a dummy variable taking value of 1 when both x1 and x2 equal 'yes', using

d$dummy=ifelse(d$x1=="yes" & d$x2=="yes",1,0)

But it would not be able to capture the whole cluster of x1=yes which is what I wish to do.

The desired output I am looking for is like this: enter image description here

Any idea how this could be done?


Solution

  • You can group_by consecutive_ids, and then get 1 if x1 == "yes", and any adjacent x2 is "yes".

    library(dplyr) #1.1.0+
    d %>% 
      group_by(cons = consecutive_id(x1)) %>% 
      mutate(dummy = +(x1 == "yes" & any(x2 == "yes"))) %>%
      ungroup()
    
       x1    x2    dummy  cons
       <chr> <chr> <int> <int>
     1 no    no        0     1
     2 no    no        0     1
     3 no    no        0     1
     4 no    no        0     1
     5 no    no        0     1
     6 yes   no        1     2
     7 yes   yes       1     2
     8 yes   no        1     2
     9 yes   no        1     2
    10 no    no        0     3
    11 no    no        0     3
    12 yes   no        0     4
    13 yes   no        0     4
    14 yes   no        0     4
    15 no    no        0     5
    16 no    no        0     5
    17 yes   yes       1     6
    18 yes   yes       1     6
    19 yes   no        1     6