I have 2 kinds of posts (sponsored and non sponsored) for many users. I know the date when each post happened. Now I need to know for each post of a user when the last sponsored post happened (in full days) see the column "Days since last sponsored". Thank you for your help! I want to solve it in R.
Date | User | Sponsored | Days Since Last Sponsored |
---|---|---|---|
08.07.2022 | YYY | 0 | NA |
24.07.2022 | YYY | 1 | NA |
08.08.2022 | YYY | 1 | 15 |
15.08.2022 | YYY | 0 | 7 |
20.08.2022 | YYY | 0 | 12 |
14.05.2022 | ZZZ | 1 | NA |
15.05.2022 | ZZZ | 0 | 1 |
20.06.2022 | ZZZ | 0 | 36 |
20.06.2022 | ZZZ | 0 | 36 |
22.06.2022 | ZZZ | 1 | 38 |
22.06.2022 | ZZZ | 0 | 0 |
I have tried to group by user and use diff time. However the sponsored / non-sponsored gives me a headache.
Here's an option with tidyverse
:
The calculation at the end is a bit off - I spot checked one of my values and it seems correct, but let me know if there's some other logic used.
library(tidyverse)
df <- tibble::tribble(
~Date, ~User, ~Sponsored, ~"Days Since Last Sponsored",
"08.07.2022", "YYY", 0L, NA,
"24.07.2022", "YYY", 1L, NA,
"08.08.2022", "YYY", 1L, 15L,
"15.08.2022", "YYY", 0L, 7L,
"20.08.2022", "YYY", 0L, 12L,
"14.05.2022", "ZZZ", 1L, NA,
"15.05.2022", "ZZZ", 0L, 1L,
"20.06.2022", "ZZZ", 0L, 36L,
"20.06.2022", "ZZZ", 0L, 36L,
"22.06.2022", "ZZZ", 1L, 38L,
"22.06.2022", "ZZZ", 0L, 0L
)
df %>%
mutate(Date = lubridate::dmy(Date),
sponsored_date = ifelse(Sponsored == 1, as.character.Date(Date), NA_character_),
sponsored_date = lag(sponsored_date)) %>%
group_by(User) %>%
fill(sponsored_date, .direction = "down") %>%
mutate(sponsored_date = lubridate::ymd(sponsored_date),
since_last = as.numeric(Date - sponsored_date)) %>%
select(Date, User, Sponsored, "Days Since Last Sponsored", since_last)
#> # A tibble: 11 × 5
#> # Groups: User [2]
#> Date User Sponsored `Days Since Last Sponsored` since_last
#> <date> <chr> <int> <int> <dbl>
#> 1 2022-07-08 YYY 0 NA NA
#> 2 2022-07-24 YYY 1 NA NA
#> 3 2022-08-08 YYY 1 15 15
#> 4 2022-08-15 YYY 0 7 7
#> 5 2022-08-20 YYY 0 12 12
#> 6 2022-05-14 ZZZ 1 NA NA
#> 7 2022-05-15 ZZZ 0 1 1
#> 8 2022-06-20 ZZZ 0 36 37
#> 9 2022-06-20 ZZZ 0 36 37
#> 10 2022-06-22 ZZZ 1 38 39
#> 11 2022-06-22 ZZZ 0 0 0