I would like to replicate the following one-to-many join using ffdf
. What would be the best way to do this?
Below I present an example of what I would like to get, using data.tables. I am aware of the following description of the merge.ffdf function:
"This method is similar to merge in the base package but only allows inner and left outer joins. Note that joining is done based on ffmatch or ffdfmatch: only the first element in y will be added to x".
However, I wonder if there is a way around this.
> A <- data.table(col1 = LETTERS[1:3],col2 = c("john",'harry','potter'))
> A
col1 col2
1: A john
2: B harry
3: C potter
> B
col1 col2
1: A 1
2: A 2
3: A 3
4: B 4
5: B 5
6: B 6
7: C 7
8: C 8
9: C 9
> merge(A,B,by = 'col1',all.x = T)
col1 col2.x col2.y
1: A john 1
2: A john 2
3: A john 3
4: B harry 4
5: B harry 5
6: B harry 6
7: C potter 7
8: C potter 8
9: C potter 9
Now using ffdf:
> C <- as.ffdf(as.data.frame(unclass(A)))
> D <- as.ffdf(as.data.frame(unclass(B)))
> merge.ffdf(C,D,by = 'col1', all.x = T)
ffdf (all open) dim=c(3,3), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
PhysicalName VirtualVmode PhysicalVmode AsIs VirtualIsMatrix PhysicalIsMatrix
col1 col1 integer integer FALSE FALSE FALSE
col2.x col2 integer integer FALSE FALSE FALSE
col2.y col2.y double double FALSE FALSE FALSE
PhysicalElementNo PhysicalFirstCol PhysicalLastCol PhysicalIsOpen
col1 1 1 1 TRUE
col2.x 2 1 1 TRUE
col2.y 3 1 1 TRUE
ffdf data
col1 col2.x col2.y
1 A john 1
2 B harry 4
3 C potter 7
>
Note that the result using ffdf only contains the first element of each group. What would be the way to get the rest?
merge.ffdf(D,C,by = 'col1', all.x=T)
As per the docs, merge.ffdf only allows inner joins (all matching keys) or left joins (all "left" keys).
I suppose it treats each entire row as a key, and discards duplicates.