I read in an Excel file with one column containing dates. In some rows there aren't complete dates, but only the year is given, e.g., 2018. What I obtain in R is
> df$date
[1] NA NA 43465 43465 43465 43465 2018 NA 43465 43465 43465 43465
I want to convert those rows to date format which are complete dates, i.e. in my example only the rows with 43465
and leave the other rows as they are, i.e. NA
should stay NA
and 2018
should stay 2018
.
I know that I can convert Excel dates as follows as.Date(df$date, origin="1899-12-30")
but the following two ideas give me the wrong output
> as.Date(df$date, origin="1899-12-30")
[1] NA NA "2018-12-31" "2018-12-31" "2018-12-31" "2018-12-31" "1905-07-10" NA "2018-12-31" "2018-12-31" "2018-12-31"
[12] "2018-12-31"
Of course "1905-07-10"
is not what I expected.
> ifelse(df$date == 2018, 2018, as.Date(df$date, origin="1899-12-30"))
[1] NA NA 17896 17896 17896 17896 2018 NA 17896 17896 17896 17896
Here the wrong output is obvious.
Building off of @Ronak's answer, you can use regex to determine a four digit numerical number, then pad with four trailing zeros.
x <- c(NA,NA,43465,43465,43465,43465,2018,NA,43465,43465, 43465, 43465)
ifelse(grepl('^\\d{4}$', x, perl = TRUE),
as.integer(paste0(x, '0000')),
as.integer(format(as.Date(x, origin='1899-12-30'), '%Y%m%d')))
[1] NA NA 20181231 20181231 20181231 20181231 20180000 NA 20181231 20181231 20181231 20181231
You'll get some warning messages regarding the NA
s, and if it bothers you you can add an additional ifelse
to control the NAs. Here we use a logical grep test to see if there are only four numbers (a year), then we create an integer of the values. This allows you to still use mathematical operators such as >
,<
,==
, etc and preserve all the information.
You can change the '0000'
during the paste0()
call to a more appropriate number based upon the data or use case.