rdateexcel-dates

Convert Excel numeric dates to R dates, but in some rows there is only the year given, e.g. "2018" instead of "43465"


I read in an Excel file with one column containing dates. In some rows there aren't complete dates, but only the year is given, e.g., 2018. What I obtain in R is

> df$date
[1]    NA    NA 43465 43465 43465 43465  2018    NA 43465 43465 43465 43465

I want to convert those rows to date format which are complete dates, i.e. in my example only the rows with 43465 and leave the other rows as they are, i.e. NA should stay NA and 2018 should stay 2018.

I know that I can convert Excel dates as follows as.Date(df$date, origin="1899-12-30") but the following two ideas give me the wrong output

> as.Date(df$date, origin="1899-12-30")
[1] NA NA  "2018-12-31" "2018-12-31" "2018-12-31" "2018-12-31" "1905-07-10" NA "2018-12-31" "2018-12-31" "2018-12-31"
[12] "2018-12-31"

Of course "1905-07-10" is not what I expected.

> ifelse(df$date == 2018, 2018, as.Date(df$date, origin="1899-12-30"))
[1]    NA    NA 17896 17896 17896 17896  2018    NA 17896 17896 17896 17896

Here the wrong output is obvious.


Solution

  • Building off of @Ronak's answer, you can use regex to determine a four digit numerical number, then pad with four trailing zeros.

    x <- c(NA,NA,43465,43465,43465,43465,2018,NA,43465,43465, 43465, 43465)
    ifelse(grepl('^\\d{4}$', x, perl = TRUE), 
       as.integer(paste0(x, '0000')), 
       as.integer(format(as.Date(x, origin='1899-12-30'), '%Y%m%d')))
    [1]  NA       NA 20181231 20181231 20181231 20181231 20180000       NA 20181231 20181231 20181231 20181231
    

    You'll get some warning messages regarding the NAs, and if it bothers you you can add an additional ifelse to control the NAs. Here we use a logical grep test to see if there are only four numbers (a year), then we create an integer of the values. This allows you to still use mathematical operators such as >,<,==, etc and preserve all the information.

    You can change the '0000' during the paste0() call to a more appropriate number based upon the data or use case.