rperformancedplyrtidyrna

Fill first (only one) NA with next non-NA value by group using dplyr/tidyr


I hava a problem, where I need to fill NA by group, but only one (the first) NA before some non-NA value (direction = "up"). I know the tidyr fill() function but do not know how to use it conditionally (so to fill only one NA above some non-NA value).

df <- data.frame(g=c(1,1,1,1,1,1, 2,2,2,2,2,2,2),v=c(NA,NA,5,NA,8,NA, 1,1,NA,2,NA,NA,3))

Data (with group "g" and values "v"):

   g  v
1  1 NA
2  1 NA
3  1  5
4  1 NA
5  1  8
6  1 NA
7  2  1
8  2  1
9  2 NA
10 2  2
11 2 NA
12 2 NA
13 2  3

should become...

   g  v
1  1 NA
2  1  5
3  1  5
4  1  8
5  1  8
6  1 NA
7  2  1
8  2  1
9  2  2
10 2  2
11 2 NA
12 2  3
13 2  3

Solution

  • You can always create your own solution:

    fill_one_up <- \(x) {
      n <- length(x)
      if (n <= 1L) return(x)
      for (i in 2L:n) {
        if (!is.na(x[i]) && is.na(x[i-1L])) {
          x[i-1L] <- x[i]
        }
      }
      return(x)
    }
    
    df <- df |> mutate(v2 = fill_one_up(v), .by=g)
    
    #    g  v v2
    # 1  1 NA NA
    # 2  1 NA  5
    # 3  1  5  5
    # 4  1 NA  8
    # 5  1  8  8
    # 6  1 NA NA
    # 7  2  1  1
    # 8  2  1  1
    # 9  2 NA  2
    # 10 2  2  2
    # 11 2 NA NA
    # 12 2 NA  3
    # 13 2  3  3
    

    If you need more speed you can translate the logic to Rcpp:

    Rcpp::cppFunction("NumericVector fill_one_up_cpp(NumericVector x) {
      int n = x.size();
      if (n <= 1) return x;
      for (int i = 1; i < n; i++) {
        if (ISNA(x[i-1]) && !ISNA(x[i])) {
          x[i-1] = x[i];
        }
      }
      return x;
    }")
    

    EDIT

    Using ThomasIsCoding's idea to first locate all the NAs fill_one_up() can be further simplified:

    fill_one_up2 <- \(x) {
      idx <- which(is.na(x))
      idx <- idx[which(!is.na(x[idx + 1]))]
      x[idx] <- x[idx+1]
      x
    } 
    

    Which seems can be made faster with collapse:

    library(collapse)
    fill_one_up4 <- \(x) {
      idx <- whichNA(x)
      idx <- fsubset(idx, whichNA(x[idx + 1], TRUE))
      copyv(x, idx, R = x[idx+1], vind1 = TRUE) # Could play with setv() with care
    }