I'm seeing some strange behavior when I use merge with ffdf and I was wondering if someone can explain to me why this is happening and how can I fix it.
Here is sample for regular data frames:
dfx<-data.frame(a=1: 3, b=4:6)
> dfy<-data.frame(a=c(1, 1, 1, 3), c=7:10)
> dfm<-merge(dfx,dfy)
> dfm
a b c
1 1 4 7
2 1 4 8
3 1 4 9
4 3 6 10
Here is the code for ffdf:
>ffdfx <- as.ffdf(data.frame(a=1: 3, b=4:6))
>ffdfy <- as.ffdf(data.frame(a=c(1, 1, 1, 3), c=7:10)
> ffdfm[1:nrow(ffdfm),]
a b c
1 1 4 7
2 3 6 10
I'm expecting the first case but I'm getting the second case. I appreciate any help on the matter.
The behaviour you see is exactly what is documented in merge.ffdf
from package ffbase
. From the help of merge.ffdf
: ?merge.ffdf
Merge two ffdf by common columns, or do other versions of database join operations. This method is similar to merge in the base package but only allows inner and left outer joins. Note that joining is done based on ffmatch or ffdfmatch: only the first element in y will be added to x;