rdataframedata.tablena

How can I fill the empty cells in each row of a dataframe with the value in the last non-empty cell in that row?


I have a dataframe (or datatable, if that's easier) with incomplete rows:

Input

ID Var1 Var2 Var3
1     2    5    1
2    12    3
3     8
4     4

Code

d <- data.frame(
  ID = 1:4,
  Var1 = c(2, 12, 8, 4),
  Var2 = c(5, 3, NA, NA),
  Var3 = c(1, NA, NA, NA)
)

library(data.table)
d <- fread("
  ID Var1 Var2 Var3
  1 2 5 1
  2 12 3 NA
  3 8 NA NA
  4 4 NA NA
")

The empty cells are always at the end of a row.

I would like to fill the empty cells in each row with the value in the last non-empty cell in that row, e.g.:

ID Var1 Var2 Var3
1     2    5    1
2    12    3 -> 3
3     8 -> 8 -> 8
4     4 -> 4 -> 4

How do I do that?


I don't want to use dplyr and I don't want to fill columns.


Solution

  • Another answer using the collapse package, which has the advantage of being particularly fast(er than data.table):

    library(collapse)
    dapply(d, na_locf, MARGIN = 1)
    
    #   ID Var1 Var2 Var3
    # 1  1    2    5    1
    # 2  2   12    3    3
    # 3  3    8    8    8
    # 4  4    4    4    4
    

    Microbenchmark:

    # Unit: microseconds
    #        expr     min       lq      mean   median      uq     max neval
    #    collapse    69.5   112.95   244.847   135.45   161.7  9964.4   100
    #          dt   592.9   788.70  1237.643   874.70  1186.6 14563.1   100
    #       tidyr 32283.2 36170.80 41293.420 40501.55 43809.1 75417.8   100
    #   Reduce_dt   645.0   803.70  1083.373   954.05  1222.6  2367.5   100
    #  Reduce_TiC   383.9   499.25   661.475   586.40   687.6  5179.1   100
    

    Code for benchmark:

    microbenchmark::microbenchmark(
      collapse = dapply(d, na_locf, MARGIN = 1),
      dt = data.table::transpose(
        setnafill(data.table::transpose(d, keep.names = "ID"), 
                  type = "locf", cols = 1:nrow(d) + 1), make.names = "ID"),
      tidyr = d %>% 
        pivot_longer(-ID) %>% 
        fill(value, .direction = "down") %>% 
        pivot_wider(),
      Reduce_dt = d[, (sprintf("Var%d", 1:3)) := Reduce(\(x, y) ifelse(is.na(y), x, y), .SD, accumulate = TRUE), .SDcols = sprintf("Var%d", 1:3)],
      Reduce_TiC = Reduce(\(x, y) ifelse(is.na(y), x, y), d[-1], accumulate = TRUE)
    )