rdplyr

Enumerate events where n consecutive values are not NA


I would like to compute a new column in a dataframe that finds n consecutive values not being null and adds an enumeration to it, per user group. Let df be our sample dataframe:

n <- 4

# Sample DataFrame
df <- tibble(
  user_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
  date = as.Date('2024-01-01') + 0:19,
  value1 = c(NA, 5, 6, 7, 8, NA, NA, 2, 3, 4, 5, NA, 1, 2, 3, NA, 1, 2, 3, 4)
)

Expected result:

# A tibble: 20 × 4
   user_id date       value1 expected_result
     <dbl> <date>      <dbl>           <dbl>
 1       1 2024-01-01     NA              NA
 2       1 2024-01-02      5               1
 3       1 2024-01-03      6               2
 4       1 2024-01-04      7               3
 5       1 2024-01-05      8               4
 6       1 2024-01-06     NA              NA
 7       1 2024-01-07     NA              NA
 8       1 2024-01-08      2               5
 9       1 2024-01-09      3               6
10       1 2024-01-10      4               7
11       1 2024-01-11      5               8
12       1 2024-01-12     NA              NA
13       2 2024-01-13      1              NA
14       2 2024-01-14      2              NA
15       2 2024-01-15      3              NA
16       2 2024-01-16     NA              NA
17       2 2024-01-17      1               1
18       2 2024-01-18      2               2
19       2 2024-01-19      3               3
20       2 2024-01-20      4               4


Solution

  • You can use cumsum

    df %>%
      mutate(gp=value1/value1 * cumsum(is.na(value1)), .by=user_id) %>%
      mutate(n=(n()>=n & !is.na(value1)), .by=c(user_id, gp)) %>%
      mutate(expected=na_if(value1/value1 * cumsum(!is.na(value1) & n), 0), .by=user_id) %>%
      select(-c(gp, n))
    

    Gives

    # A tibble: 20 × 4
       user_id date       value1 expected
         <dbl> <date>      <dbl>    <dbl>
     1       1 2024-01-01     NA       NA
     2       1 2024-01-02      5        1
     3       1 2024-01-03      6        2
     4       1 2024-01-04      7        3
     5       1 2024-01-05      8        4
     6       1 2024-01-06     NA       NA
     7       1 2024-01-07     NA       NA
     8       1 2024-01-08      2        5
     9       1 2024-01-09      3        6
    10       1 2024-01-10      4        7
    11       1 2024-01-11      5        8
    12       1 2024-01-12     NA       NA
    13       2 2024-01-13      1       NA 
    14       2 2024-01-14      2       NA 
    15       2 2024-01-15      3       NA  
    16       2 2024-01-16     NA       NA
    17       2 2024-01-17      1        1
    18       2 2024-01-18      2        2
    19       2 2024-01-19      3        3
    20       2 2024-01-20      4        4