I would like to compute a new column in a dataframe that finds n consecutive values not being null and adds an enumeration to it, per user group. Let df be our sample dataframe:
n <- 4
# Sample DataFrame
df <- tibble(
user_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
date = as.Date('2024-01-01') + 0:19,
value1 = c(NA, 5, 6, 7, 8, NA, NA, 2, 3, 4, 5, NA, 1, 2, 3, NA, 1, 2, 3, 4)
)
Expected result:
# A tibble: 20 × 4
user_id date value1 expected_result
<dbl> <date> <dbl> <dbl>
1 1 2024-01-01 NA NA
2 1 2024-01-02 5 1
3 1 2024-01-03 6 2
4 1 2024-01-04 7 3
5 1 2024-01-05 8 4
6 1 2024-01-06 NA NA
7 1 2024-01-07 NA NA
8 1 2024-01-08 2 5
9 1 2024-01-09 3 6
10 1 2024-01-10 4 7
11 1 2024-01-11 5 8
12 1 2024-01-12 NA NA
13 2 2024-01-13 1 NA
14 2 2024-01-14 2 NA
15 2 2024-01-15 3 NA
16 2 2024-01-16 NA NA
17 2 2024-01-17 1 1
18 2 2024-01-18 2 2
19 2 2024-01-19 3 3
20 2 2024-01-20 4 4
You can use cumsum
df %>%
mutate(gp=value1/value1 * cumsum(is.na(value1)), .by=user_id) %>%
mutate(n=(n()>=n & !is.na(value1)), .by=c(user_id, gp)) %>%
mutate(expected=na_if(value1/value1 * cumsum(!is.na(value1) & n), 0), .by=user_id) %>%
select(-c(gp, n))
Gives
# A tibble: 20 × 4
user_id date value1 expected
<dbl> <date> <dbl> <dbl>
1 1 2024-01-01 NA NA
2 1 2024-01-02 5 1
3 1 2024-01-03 6 2
4 1 2024-01-04 7 3
5 1 2024-01-05 8 4
6 1 2024-01-06 NA NA
7 1 2024-01-07 NA NA
8 1 2024-01-08 2 5
9 1 2024-01-09 3 6
10 1 2024-01-10 4 7
11 1 2024-01-11 5 8
12 1 2024-01-12 NA NA
13 2 2024-01-13 1 NA
14 2 2024-01-14 2 NA
15 2 2024-01-15 3 NA
16 2 2024-01-16 NA NA
17 2 2024-01-17 1 1
18 2 2024-01-18 2 2
19 2 2024-01-19 3 3
20 2 2024-01-20 4 4