rtidyversedata-wrangling

How to evaluate conditions met on previous values within a group in a data.frame in R


I have a dataset that looks like this:

library(tidyverse)   
 name<-c("AAA","AAA","AAA")
    value<-c(1:3)
    order<-c(2,3,6)
    tag<-c(0,0,0)
    part_a<-data.frame(name,value,order,tag)
    name<-c("AAA","AAA","AAA")
    value<-c(1:3)
    order<-c(1,5,4)
    key<-c(1,1,1)
    part_b<-data.frame(name,value,order,key)
    df<-bind_rows(part_a,part_b)
    df %>% group_by(name) %>% arrange(order)


    Groups:   name [1]
      name  value order   tag   key
      <chr> <int> <dbl> <dbl> <dbl>
    1 AAA       1     1    NA     1
    2 AAA       1     2     0    NA
    3 AAA       2     3     0    NA
    4 AAA       3     4    NA     1
    5 AAA       2     5    NA     1
    6 AAA       3     6     0    NA

I am trying to write a code that will allow me to create two new variables named CHECK_1 and CHECK_2 , CHECK_1 needs to have two values it will be 1 if the previous value of the column tag is NA and the previous value of the column key is not NA and 0 otherwise, the variable CHECK_2 needs to have only two value it should be "out of range" if the difference between the current and previous value of the column order is greater than 3 and "within range" otherwise.

This operations must be done within groups in a data frame, I know I can make use of the base R function lag() but I have tried to incorporate the criteria in a if_else statement with no luck.

My desired output will be:

name value order tag key check_1 check_2
AAA 1 1 NA 1 1 "within range"
AAA 1 2 0 NA 0 "within range"
AAA 2 3 0 NA 0 "within range"
AAA 3 4 NA 1 1 "within range"
AAA 2 5 NA 1 1 "within range"
AAA 3 6 0 NA 0 "within range"

Solution

  • You could do something like this with tidyverse. However, I am unsure what you want to do with the first row of values for the group.

    library(tidyverse)
    
    df %>% 
      group_by(name) %>% 
      arrange(order) %>% 
      mutate(check_1 = ifelse(is.na(lag(tag)) & !is.na(lag(key)), 1, 0),
             check_2 = ifelse(order - lag(order) > 3, "out of range", "within range"))
    

    Output

      name  value order   tag   key check_1 check_2     
      <chr> <int> <dbl> <dbl> <dbl>   <dbl> <chr>       
    1 AAA       1     1    NA     1       0 NA          
    2 AAA       1     2     0    NA       1 within range
    3 AAA       2     3     0    NA       0 within range
    4 AAA       3     4    NA     1       0 within range
    5 AAA       2     5    NA     1       1 within range
    6 AAA       3     6     0    NA       1 within range
    

    If you are wanting to default the first row values to 1 and "within range", then we could use case_when to make an extra ifelse condition.

    df %>% 
      group_by(name) %>% 
      arrange(order) %>% 
      mutate(check_1 = case_when(row_number() == 1 ~ 1,
                                 is.na(lag(tag)) & !is.na(lag(key)) ~ 1, 
                                 TRUE ~ 0),
             check_2 = case_when(row_number() == 1 ~ "within range",
                                 order - lag(order) > 3 ~ "out of range", 
                                 TRUE ~ "within range"))
    
      name  value order   tag   key check_1 check_2     
      <chr> <int> <dbl> <dbl> <dbl>   <dbl> <chr>       
    1 AAA       1     1    NA     1       1 within range
    2 AAA       1     2     0    NA       1 within range
    3 AAA       2     3     0    NA       0 within range
    4 AAA       3     4    NA     1       0 within range
    5 AAA       2     5    NA     1       1 within range
    6 AAA       3     6     0    NA       1 within range