My data is stored in a .txt file, it looks like this:
"A" "M" "D" "Tmax" "Tmin"
"1" 1931 1 1 35.4 19.7
"2" 1931 1 2 35.8 20.7
"3" 1931 1 3 37.7 23.9
"4" 1931 1 4 39.5 25.8
"5" 1931 1 5 29 22.8
"6" 1931 1 6 27.5 18.6
"7" 1931 1 7 20.4 15.6
"8" 1931 1 8 28.2 15.1
"9" 1931 1 9 31.2 14.6
"10" 1931 1 10 31 18.5
(A = year, M = month, D = day)
Load the data with 'datos_diarios <- read.table(...)', then with the data.table syntax I create a new column named 'fecha' (date):
datos_diarios <- data.table(subset(datos_diarios, A >= 1971))
datos_diarios[, fecha := strptime(paste0(A,'-',M,'-',D), format= '%Y-%m-%d', tz = 'America/Argentina/Buenos_Aires')]
Surprisingly, I get some NA
which(is.na(datos_diarios$fecha))
[1] 1119 6545 6863 7234 7598 7962 13513 13807
datos_diarios[1118:1120,]
A M D Tmax Tmin fecha
1: 1974 1 22 25.0 21.9 1974-01-22
2: 1974 1 23 25.4 20.6 <NA>
3: 1974 1 24 29.7 20.5 1974-01-24
Also found this strange behaviour with dplyr package. Nevertheless, this problem doesn´t exist using base R
datos_diarios <- subset(datos_diarios, A >= 1971)
datos_diarios$fecha <- strptime(paste0(datos_diarios$A,'-',datos_diarios$M,'-',datos_diarios$D),format= '%Y-%m-%d', tz = 'America/Argentina/Buenos_Aires')
> which(is.na(datos_horarios$fecha))
integer(0)
> datos_diarios[1118:1120,]
A M D Tmax Tmin fecha
15728 1974 1 22 25.0 21.9 1974-01-22
15729 1974 1 23 25.4 20.6 1974-01-23
15730 1974 1 24 29.7 20.5 1974-01-24
Tried to create a column with date data using data.table & dplyr, expected no NA, got some NA.
As @thelatemail says in comments, this is a daylight saving time issue. https://www.timeanddate.com/time/change/argentina/buenos-aires?year=1974 :
@thelatemail further points out that this is because is data.table (and presumably dplyr?) is automatically converting POSIXlt to POSIXct: we get a warning
Values of type POSIXlt detected and converted to POSIXct
Because the POSIX*t types represent time-date objects (not just date), they are defaulting to midnight, which doesn't exist on that day in that timezone. If you use as.Date() instead, everything seems OK. (Still not exactly sure why POSIXt to POSIXct conversion messes things up, since strptime() does work in base R ...)
dd2[, date := as.Date(paste0(A,'-',M,'-',D),
format= '%Y-%m-%d', tz = 'America/Argentina/Buenos_Aires')]