rdatetimetype-conversion

Merging datasets based on date column


I am trying to merge two datasets based on the date column in each. Below are the snapshots of each dataset, in which, you can clearly see that date column only is formatted as year-month-day, with no specification for hour:minute:second.

dataset 1; snapshot of date column

dataset 2; snapshot of date column

I am merging the two datasets using the following code:

final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)

Below is another snapshot which shows the result of the code above. As you can see, somehow the hour:minute:second appears in date column which dirsputs the merge.

result of the merge

Is there a way to merge the two columns properly? Maybe someone could suggest the function that would set hh:mm:ss to 00:00:00? Btw, beforehand in the script I have both columns "close_date" from dataset1 and "date" from dataset2 formatted in the same way using the as.POSIXct function.

You can check the R output for head() below:

 head(dataset1) %>%
   select(close_date)
 
 A tibble: 6 x 1
  close_date         
  <dttm>             
1 2020-03-01 00:00:00
2 2020-03-02 00:00:00
3 2020-03-03 00:00:00
4 2020-03-04 00:00:00
5 2020-03-05 00:00:00
6 2020-03-06 00:00:00


 head(dataset2) %>%
   select(date)

 A tibble: 6 x 1
  date               
  <dttm>             
 1 2020-03-01 00:00:00
 2 2020-03-02 00:00:00
 3 2020-03-03 00:00:00
 4 2020-03-04 00:00:00
 5 2020-03-05 00:00:00
 6 2020-03-06 00:00:00

Solution

  • With the provided data.

    We could define the date format in each dataset before merging using ymd function of lubridate package:

    library(lubridate)
    
    dataset1$close_date <- ymd(dataset1$close_date)
    dataset2$date <- ymd(dataset2$date)
    
    final <- merge(dataset1, dataset2, by.x="close_date", by.y = "date",all.x = TRUE, all.y = TRUE)
    final
    
      close_date    coin case_count_world_
    1 2020-03-01 BTCUSDT                  
    2 2020-03-02 BTCUSDT                  
    3 2020-03-03 BTCUSDT                  
    4 2020-03-04 BTCUSDT                  
    5 2020-03-05 BTCUSDT                  
    6 2020-03-06    <NA>                  
    7 2020-03-07    <NA>                  
    8 2020-03-08    <NA>  
    

    data:

    dataset1 <- structure(list(close_date = structure(c(18322, 18323, 18324, 
    18325, 18326), class = "Date"), coin = c("BTCUSDT", "BTCUSDT", 
    "BTCUSDT", "BTCUSDT", "BTCUSDT")), row.names = c(NA, -5L), class = c("tbl_df", 
    "tbl", "data.frame"))
    
    dataset2 <- structure(list(date = structure(c(18322, 18323, 18324, 18325, 
    18326, 18327, 18328, 18329), class = "Date"), case_count_world_ = c("", 
    "", "", "", "", "", "", "")), row.names = c(NA, -8L), class = c("tbl_df", 
    "tbl", "data.frame"))