rdata-sciencedata-wrangling

Conditioning previous values within groups in R


I'm trying to write a code that will allow me to create a TRUE or FALSE variable within the groups name depending on the value of the earliest record of the column poped of the following data.frame:

 library(tidyverse)   
  name<-c("AAA","AAA","AAA","AAA","AAA","AAA","AAA")
  poped<-c(NA,1,NA,NA,1,NA,NA)
  order<-c(1:7)
  tag<-c("X","Y","X","X","Y","X","X")

>   df
  name order tag poped
1  AAA     1   X    NA
2  AAA     2   Y     1
3  AAA     3   X    NA
4  AAA     4   X    NA
5  AAA     5   Y     1
6  AAA     6   X    NA
7  AAA     7   X    NA

I want to mutate a two new variable named CHECK and POS

CHECK will take on the values

    1= If the closest (above) value where the tag column is Y and poped is 1
    0= If the closest (above) value where the tag column is Y and poped is 0
    2 = If the current row has tag = Y
    NA = Otherwise

POS will take on the value of the closest (above) row number where the tag column is Y and poped is 1, and NA otherwise.

My desired output will be:

>   df
  name order tag poped CHECK POS                                                            why
1  AAA     1   X    NA    NA  NA                                      There is no previous data
2  AAA     2   Y     1    NA  NA                                                current tag = Y
3  AAA     3   X    NA     1   2 the closest value above where tag=Y is in row 2 and poped is 1
4  AAA     4   X    NA     1   2 the closest value above where tag=Y is in row 2 and poped is 1
5  AAA     5   Y     1    NA  NA                                                current tag = Y
6  AAA     6   X    NA     1   5 the closest value above where tag=Y is in row 5 and poped is 1
7  AAA     7   X    NA     1   5 the closest value above where tag=Y is in row 5 and poped is 1

How can I create a solution, ideally using Tidyverse?


Solution

  • df %>%
      mutate(ctag=if_else(tag=="Y",tag,as.character(NA)),
             cpop=if_else(tag=="Y",poped,as.double(NA)),
             maxr=if_else(tag=="Y" & poped==1,order,as.integer(NA))) %>%
      fill(ctag,cpop,maxr) %>% 
      mutate(
        CHECK = case_when(
          tag == "Y"~2,
          lag(ctag) == "Y" & lag(cpop)==1 ~1,
          lag(ctag) == "Y" & lag(cpop)==0 ~0,
          TRUE~as.double(NA)),
        POS = if_else(tag=="Y", as.integer(NA), maxr)
      ) %>% 
      select(!ctag:maxr)
    

    Output:

      name  order tag   poped CHECK   POS
      <chr> <int> <chr> <dbl> <dbl> <int>
    1 AAA       1 X        NA    NA    NA
    2 AAA       2 Y         1     2    NA
    3 AAA       3 X        NA     1     2
    4 AAA       4 X        NA     1     2
    5 AAA       5 Y         1     2    NA
    6 AAA       6 X        NA     1     5
    7 AAA       7 X        NA     1     5