rdplyrnalagmutated

R: Fill in NAs by multiple groups summing up lagges of two variables


I try to fill in the NA values of column N based on the lag of N and the lag of an additional column. The problem is that it fills in the second row, but not all of them.

Here is my code.

```{r}
library(tidyverse)
library(zoo)

# toy data ----
df <- data.frame('Group.1' = c("a", "a", "a"),
                'Group.2' = c("A", "A", "A"),
                'monthyr' = as.yearmon(c("2018-01-01", "2018-02-01", "2018-03-01")),
                'x' = c(5, 7, 8),
                'y' = c(10, 18, 9),
                'N' = c(100, NA, NA)
                )
df$net <- df$x - df$y

df

# what i get ----
df_attempt <- df %>%
  group_by(Group.1, Group.2) %>%
  arrange(Group.1, Group.2, monthyr) %>%
  mutate(N = ifelse(is.na(N), lag(N) + lag(net), N))

# what i want to get ----
df_expected <- data.frame('Group.1' = c("a", "a", "a"),
                          'Group.2' = c("A", "A", "A"),
                          'monthyr' = as.yearmon(c("2018-01-01", "2018-02-01", "2018-03-01")),
                          'x' = c(5, 7, 8),
                          'y' = c(10, 18, 9),
                          'N' = c(100, 95, 84)
)
```

Solution

  • One approach to achieve the kind of "recursive" fill you are trying to achieve may look like so:

    library(tidyverse)
    library(zoo)
    
    df %>%
      group_by(Group.1, Group.2) %>%
      arrange(monthyr) %>%
      fill(N) %>%
      mutate(N = N + cumsum(lag(net, default = 0)))
    #> # A tibble: 3 × 7
    #> # Groups:   Group.1, Group.2 [1]
    #>   Group.1 Group.2 monthyr       x     y     N   net
    #>   <chr>   <chr>   <yearmon> <dbl> <dbl> <dbl> <dbl>
    #> 1 a       A       Jan 2018      5    10   100    -5
    #> 2 a       A       Feb 2018      7    18    95   -11
    #> 3 a       A       Mär 2018      8     9    84    -1