rdplyrlapplysapply

Finding earliest date within a column


I am trying to find the earliest date in a column split by a delimiter ("\n" in this case), and creating a new column.

data.frame(x = c("2023-1-2\n2034-02-10", "2023-1-2\n2001-10-30")) %>% 
    mutate(earliest_date = sapply(strsplit(x, "\\\n"), 
         function(x) min(parse_date_time(x, orders = c("mdy", "ymd")), na.rm = T)))

When I run this, it seems to produce the correct answer, but in seconds:

                     x earliest_date
1 2023-1-2\n2034-02-10    1672617600
2 2023-1-2\n2001-10-30    1004400000

How would I receive the correct dates, in the dates format?

Edit: this seems to do the trick but how would I add it as a column?

lapply(strsplit(c("2023-1-2\n2034-02-10", "2023-1-2\n2001-10-30"), "\\\n"), 
         function(x) min(parse_date_time(x, orders = c("mdy", "ymd")), na.rm = T))

Solution

  • Using your existing code, wrap the min(...) in as.character(), and if you need it in date format, wrap the whole thing in as.Date()

    data.frame(x = c("2023-1-2\n2034-02-10", "2023-1-2\n2001-10-30")) %>% 
      mutate(earliest_date = as.Date(sapply(strsplit(x, "\\\n"), 
                                    function(x) 
                                      as.character(min(lubridate::parse_date_time(x, orders = c("mdy", "ymd")), na.rm = TRUE)))))
    

    Output

                         x earliest_date
    1 2023-1-2\n2034-02-10    2023-01-02
    2 2023-1-2\n2001-10-30    2001-10-30