rdplyrrowwise

dplyr::rowwise and min outputs a single value


I have an odd situation where when I use dplyr::rowwise() and min in mutate, it outputs a single value across all rows rather than by row. It works with my other dataframes in the same session, and not sure what the issue is. I have also restarted my Rstudio.

df <- indf
  dplyr::rowwise(.) %>%
  mutate(test = min(as.Date(date1), as.Date(date2), na.rm = T)

enter image description here

structure(list(id = structure(c("5001", "3002", "2001", "1001", 
"6001", "9001"), label = "Subject name or identifier", format.sas = "$"), 
    date1 = structure(c(NA, 18599, NA, NA, NA, NA), class = "Date"), 
    date2 = structure(c(18472, 18597, 18638, 18675, 18678, 18696
    ), class = "Date"), test = structure(c(18472, 18472, 18472, 
    18472, 18472, 18472), class = "Date")), class = c("rowwise_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
    .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame")))

Solution

  • It could be a result of loading plyr package after dplyr which masked the mutate from dplyr

    library(dplyr)
    indf %>% 
       rowwise %>% 
       plyr::mutate(test = min(date1, date2, na.rm = TRUE))
    # A tibble: 6 × 4
    # Rowwise: 
      id    date1      date2      test      
      <chr> <date>     <date>     <date>    
    1 5001  NA         2020-07-29 2020-07-29
    2 3002  2020-12-03 2020-12-01 2020-07-29
    3 2001  NA         2021-01-11 2020-07-29
    4 1001  NA         2021-02-17 2020-07-29
    5 6001  NA         2021-02-20 2020-07-29
    6 9001  NA         2021-03-10 2020-07-29
    

    versus using :: to load the function from dplyr

    > indf %>%
       rowwise %>%
       dplyr::mutate(test = min(date1, date2, na.rm = TRUE))
    # A tibble: 6 × 4
    # Rowwise: 
      id    date1      date2      test      
      <chr> <date>     <date>     <date>    
    1 5001  NA         2020-07-29 2020-07-29
    2 3002  2020-12-03 2020-12-01 2020-12-01
    3 2001  NA         2021-01-11 2021-01-11
    4 1001  NA         2021-02-17 2021-02-17
    5 6001  NA         2021-02-20 2021-02-20
    6 9001  NA         2021-03-10 2021-03-10
    

    Note that rowwise is slow, it may be better to use vectorized pmin

    indf %>%
       ungroup %>%
       dplyr::mutate(test = pmin(date1, date2, na.rm = TRUE))
    # A tibble: 6 × 4
      id    date1      date2      test      
      <chr> <date>     <date>     <date>    
    1 5001  NA         2020-07-29 2020-07-29
    2 3002  2020-12-03 2020-12-01 2020-12-01
    3 2001  NA         2021-01-11 2021-01-11
    4 1001  NA         2021-02-17 2021-02-17
    5 6001  NA         2021-02-20 2021-02-20
    6 9001  NA         2021-03-10 2021-03-10