Expand existing table into a "versioned up-to-date statues" table

I want to generate a "tracked history" or "growing history" table, based on a source table. Consider this example:

library(tibble)

df_source <- 
  tibble::tribble(
  ~user_id, ~iteration,     ~animal, ~foo_val, ~bar_val,
     "xyz",         1L,     "zebra",      12L,       2L,
     "xyz",         1L,    "monkey",       3L,       4L,
     "xyz",         1L,  "elephant",       4L,      13L,
     "xyz",         2L,     "horse",      11L,       4L,
     "xyz",         2L,       "dog",       5L,      12L,
     "xyz",         2L,       "cow",       2L,      12L,
     "tty",         1L,   "giraffe",       8L,       4L,
     "tty",         1L,  "kangaroo",       6L,       1L,
     "tty",         1L,   "penguin",      12L,       5L,
     "tty",         2L, "crocodile",      10L,       7L,
     "tty",         2L,       "owl",       7L,       8L,
     "tty",         2L,  "kangaroo",      14L,       2L
  )

I have two users here. For each user I know what animals he/she received in each "iteration".

What I want is to "expand" this table such that it will look like a history table, that keeps track of the status of each user, as it was relevant for each iteration. This means that in iteration 1, we knew little - only about iteration 1. But in iteration 2, we knew about iteration 1 and 2, so the data is richer in iteration 2.

In fact, if there is a conflict between iteration 1's data and iteration 2's data (say the same animal but different values), the for iteration 2, the conflict is resolved by considering iteration 2's version over 1's.

So in this case, the result table should be:

user_id	status_as_of_iteration	animal	foo_status	bar_status	my_comment
xyz	1	zebra	12	2	original iteration 1 remains
xyz	1	monkey	3	4	original iteration 1 remains
xyz	1	elephant	4	13	original iteration 1 remains
tty	1	giraffe	8	4	original iteration 1 remains
tty	1	kangaroo	6	1	original iteration 1 remains
tty	1	penguin	12	5	original iteration 1 remains
xyz	2	zebra	12	2	preserved history from iteration 1, carried over to iteration 2, nothing to override
xyz	2	monkey	3	4	preserved history from iteration 1, carried over to iteration 2, nothing to override
xyz	2	elephant	4	13	preserved history from iteration 1, carried over to iteration 2, nothing to override
xyz	2	horse	11	4	original iteration 2 remains
xyz	2	dog	5	12	original iteration 2 remains
xyz	2	cow	2	12	original iteration 2 remains
tty	1	giraffe	8	4	preserved history from iteration 1, carried over to iteration 2, nothing to override
tty	1	penguin	12	5	preserved history from iteration 1, carried over to iteration 2, nothing to override
tty	2	crocodile	10	7	original iteration 2
tty	2	owl	7	8	original iteration 2
tty	2	kangaroo	14	2	NOTE! original iteration 2, values of iteration 2 overrode iteration 1's vals

so it actually means to:

replicate the previous iteration records per user, but increment their number for status_as_of_iteration
keep the most recent iteration records, per user, as is.

but please note how user_id == tty in status_as_of_iteration == 2 has only a single record for animal == kangaroo, and this is because we preserve the most recent/up-to-date "status" for each animal in each iteration.

Does anyone know of a "simple" way to do it with dplyr or tidyr?

Solution

Using some other tidy package you can try this

library(dplyr)
library(tidyr)
library(purrr)

df_source %>% 
  nest(data=c(animal, foo_val, bar_val)) %>% 
  mutate(data = accumulate(data, rows_upsert, by="animal"), .by=user_id) %>% 
  unnest(cols=data)

which returns

   user_id iteration animal    foo_val bar_val
   <chr>       <int> <chr>       <int>   <int>
 1 xyz             1 zebra          12       2
 2 xyz             1 monkey          3       4
 3 xyz             1 elephant        4      13
 4 xyz             2 zebra          12       2
 5 xyz             2 monkey          3       4
 6 xyz             2 elephant        4      13
 7 xyz             2 horse          11       4
 8 xyz             2 dog             5      12
 9 xyz             2 cow             2      12
10 tty             1 giraffe         8       4
11 tty             1 kangaroo        6       1
12 tty             1 penguin        12       5
13 tty             2 giraffe         8       4
14 tty             2 kangaroo       14       2
15 tty             2 penguin        12       5
16 tty             2 crocodile      10       7
17 tty             2 owl             7       8

We nest the data, and then iteratively rows_upsert to update the the values in each group.