I'm encountering an error where I try to join two data frames using the dplyr join functions by two Date columns. This is the error I get:
Error: cannot join on columns 'DateInfo' x 'DateInfo': Can't join on 'DateInfo' x 'DateInfo' because of incompatible types (Date / Date)
The base merge
function works fine and I can't seem to find an example of what could be causing this through googling or other stack overflow questions.
The problem is I can't create a reproducible example and the data I am using I can't share. For example this works with no problems:
d1 <- data_frame(Num = 1:5, DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d2 <- data_frame(Name = c("a", "b", "c", "d", "e"), DateInfo = as.Date(c("2014-01-03", "2014-04-05", "2015-01-03", "2014-04-02", "2011-07-28"), format = "%Y-%m-%d"))
d3 <- left_join(d1, d2, by = c("DateInfo" = "DateInfo"))
Has anyone had any experience with not being able to join on two columns that are, as far as the class
function is concerned, are the same type but still getting this error?
EDIT: Just to get this out of the way I can get around this error by using merge or converting the dates to characters and then joining, so I'm really just interested in why dplyr
would tell me I can't merge on two columns with the same type.
The reason I can't merge is how the two Date objects are stored. Thanks to this issue I decided to check the structure of how the two objects are stored and sure enough one is stored as an integer and one is stored as a numeric:
> dput(df1$DateInfo[1])
structure(16373, class = "Date")
> dput(df2$DateInfo[1])
structure(16372L, class = "Date")
It appears that the data that was pulled form a DB through the dplyr sql functions is stored as a numeric while the data from a csv is stored as an integer. I don't know why that won't let dplyr join on them while merge
can or why it happens in the first place but I think this specific question is answered.