rdataframenasubstitution

Replacing NA's within a set of similar values


I have a column in a dataset looking like this:

cluster_id
1
1
1
1
NA
1
NA
NA
2
NA
2
NA
3
NA
NA
3


cluster_id <- c("1","1","1","1","NA","1","NA","NA","2","NA","2","NA","3","NA","NA","3")

The order is already pre-defined before using a time column. What I want is to substitute the NA's that are within each cluster ID, i.e. if there's a row with 2, then an NA, and then a 2 again, I want that NA to become 2. The NA's between numbers stay as NA's. Example:

cluster_id   cluster_id_new
1            1
1            1
1            1
1            1
NA           1
1            1
NA           NA
NA           NA
2            2
NA           2
2            2
NA           NA
3            3
NA           3
NA           3
3            3 

I found the zoo::na.locf function in this post, which seems to be close to what I want, but I also need to take in consideration the value after the NA. Any thoughts?


Solution

  • Update: thanks to @Darren Tsai (who provided a better version):

    library(dplyr)
    library(tidyr)
    
    tibble(cluster_id) %>% 
      mutate(down = cluster_id, up = cluster_id) %>% 
      fill(down, .direction = "down") %>% 
      fill(up, .direction = "up") %>% 
      mutate(cluster_id_new = if_else(down == up, down, NA)) %>% 
      select(-c(down, up))
    

    Original answer: Here is tidyverse way using mainly fill:

    library(dplyr)
    library(tidyr)
    
    tibble(cluster_id) %>%
      mutate(helper = row_number(),
             filldown = fill(., cluster_id, .direction = "down")$cluster_id,
             fillup = fill(., cluster_id, .direction = "up")$cluster_id,
             cluster_id_new = ifelse(filldown == fillup, filldown, NA_real_)) %>%
      select(cluster_id, cluster_id_new)
    
    cluster_id cluster_id_new
            <dbl>          <dbl>
     1          1              1
     2          1              1
     3          1              1
     4          1              1
     5         NA              1
     6          1              1
     7         NA             NA
     8         NA             NA
     9          2              2
    10         NA              2
    11          2              2
    12         NA             NA
    13          3              3
    14         NA              3
    15         NA              3
    16          3              3