rdateformattinglubridate

Column with number date and formatted dates (2020-10-03)


I have a lot of excel files that I read into R. I get new ones every month and they are similar but the column names or other things will change slightly. I have started just reading all the columns in as text and then formatting the columns later. This makes it easier for me.

Recently some of the date columns were reading in with some formatted character dates "2023-07-10" and 1 numeric string date "45448". I do not know why this happened and that really does not matter. I just want to be able to format the whole column into a date column. I know how to parse different formats of dates using lubridate and other packages, but this is not different formats really. I need to supply the origin date for just that numeric case and the rest I need to provide formats like ymd or mdy.

dat<- c("2024-04-12", "2024-05-08", "2024-05-17", "2024-05-28", "2024-06-05", "2024-07-04",  "45448")

id = seq(1:7)

data_frame <- as.data.frame(cbind(id,dat))

I tried to use an ifelse() statement based on the character length of the line, but this did not work.

I tried this below but it did not work. It did something, but I am not sure what. The formatted dates were a lot longer numbers than the one unformatted.

data_frame <- data_frame %>%
  mutate(
    dat2 = ifelse(nchar(dat) <10,as_date(as.numeric(dat), origin = "1899-12-30"),parse_date_time(dat, c("ymd", "mdy")) ) 
  )

#this is what my solution was but there has to be a better one
data_frame <- data_frame %>%
  mutate(
    dat = ifelse(dat == "45448","2024-06-05",dat)  
  )

Does anyone have a better way that I could use again on another instance instead of having to look at the date each time and transform on its own.

I need them to all be either in the character date format or in the same numeric format with the same origin.


Solution

  • 1) char2Date Convert dat to Date class and then replace the numbers in it which are NA's with converted Excel numbers. No warnings are generated.

    library(dplyr)
    
    char2Date <- function(x) {
      y <- as.Date(x)
      isna <- is.na(y)
      y[isna] <- as.Date(as.numeric(y[isna]), origin = "1899-12-30")
      y
    }
    
    data_frame %>% mutate(dat2 = char2Date(dat))
    

    giving

      id        dat       dat2
    1  1 2024-04-12 2024-04-12
    2  2 2024-05-08 2024-05-08
    3  3 2024-05-17 2024-05-17
    4  4 2024-05-28 2024-05-28
    5  5 2024-06-05 2024-06-05
    6  6 2024-07-04 2024-07-04
    7  7      45448 2024-06-05
    

    2) mutate_cond Another approach is to use mutate_cond from dplyr mutate/replace several columns on a subset of rows .

    library(dplyr)
    
    data_frame %>%
      mutate(dat2 = as.Date(dat)) %>%
      mutate_cond(is.na(dat2), dat2 = as.Date(as.numeric(dat), origin = "1899-12-30"))
    

    3) within

    library(dplyr)
    
    data_frame %>%
      mutate(dat2 = as.Date(dat), isna = is.na(dat2)) %>%
      within(dat2[isna] <- as.Date(as.numeric(dat[isna]), origin = "1899-12-30")) %>%
      select(-isna)
    

    Note

    Use this as input.

    dat <- c("2024-04-12", "2024-05-08", "2024-05-17", "2024-05-28", "2024-06-05",
      "2024-07-04",  "45448")
    id <- 1:7
    data_frame <- data.frame(id, dat)
    

    Update

    Added (2) and (3).