rindexingdata.tabler-s4

how to setMethod `[` for a S4 object to be applied to a data.table in a slot


I would like to be able to apply the "subset" (bracket) [ method on a S4 object let's call it foo in such a way that when it is called setMethod("[", 'foo', ... it will apply the [ operator on the data.table it holds in a specific slot.
Example:

foo <- setClass("foo", slots = c(myDT = "data.table"),  
                   prototype = prototype( myDT = NULL ))
# quickly make a foo class with a DT in the myDT slot 
myfoo <- new("foo", myDT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9))
# sneak peek
myfoo
An object of class "foo"
Slot "myDT":
   x y v
1: b 1 1
2: b 3 2
3: b 6 3
4: a 1 4
5: a 3 5
6: a 6 6
7: c 1 7
8: c 3 8
9: c 6 9

The tricky part

# I want to be able to do eg 
myfoo[1:3, 2:3]
   y v
1: 1 1
2: 3 2
3: 6 3

and have it give me the same result as if doing:

myfoo@myDT[1:3, 2:3]
   y v
1: 1 1
2: 3 2
3: 6 3

So far (I am guessing) it will/should be something along the lines of

setMethod(f = "[", signature = signature(x = "foo"),
                    definition = function(x, ...) {
                      `[`(x@.myDT, ...)
                    # OR maybe 
                    # x <- x@myDT  
                    # callNextMethod(x, ...)
                 }

)

But whatever I call myfoo[i,j] with it wll always just return the whole data.table.

Any ideas if this can be accomplished? So far I am stuck usually on errors about j not fitting the bill. And I would like to avoid having to fully implement some form of shadow-indexing for this slot if I can somehow "recycle" what is available in data.table already;
of course also with the added benefit of the other data.table functions maybe also being applicable this way?
But for a beginning "passing on" indices would be a good start.

PS: If you wonder why not just do myfoo@myDT - the real life foo class has multiple slots of which only one (the data.table one) is "worthy" to be indexed and so I want to "shortcut" that methods application a bit.


Solution

  • Here is a late, not-so-hacky answer:

    library(data.table)
    setClass("Foo", slots = c(dt = "data.table"), prototype = list(dt = data.table()))
    setMethod("[", signature(x = "Foo", i = "ANY", j = "ANY", drop = "ANY"),
              function(x, i, j, ..., drop = TRUE) {
                  if (missing(j))
                      callGeneric(x@dt, i, , ..., with = TRUE)
                  else callGeneric(x@dt, i, j, ..., with = FALSE)
              })
    
    foo <- new("Foo", dt = data.table(x = letters[1:6], y = 1:6, z = rnorm(6L)))
    identical(foo[1:3, 2:3], foo@dt[1:3, 2:3]) # TRUE
    

    This method still does not support the main features of [.data.table for the reason outlined in this question, namely that i and j must be evaluated in addition to x before multiple dispatch (a feature of S4, not S3) can occur. Hence:

    foo@dt[y >= 3L]
    ##    x y           z
    ## 1: c 3  0.02991911
    ## 2: d 4 -0.36919712
    ## 3: e 5 -0.03291414
    ## 4: f 6 -1.02399695
    
    foo[y >= 3L]
    ## Error in `[.data.table`(x@dt, i, , with = TRUE, ...) :
    ##   i is not found in calling scope and it is not a column name either. When the first argument inside DT[...] is a single symbol (e.g. DT[var]), data.table looks for var in calling scope.
    

    You can still use variables in your environment as index vectors:

    ii <- 3:6
    foo[ii]
    ##    x y           z
    ## 1: c 3  0.02991911
    ## 2: d 4 -0.36919712
    ## 3: e 5 -0.03291414
    ## 4: f 6 -1.02399695
    

    Anyway, I agree with the comments suggesting that it is often better to implement classes built around data.table in S3.