rdata.tabler-s3

R: how to use and extend data.table in an S3 class


I'd like to create an S3 class that extends data.table by adding attributes that would be used by other methods of that class. In the example below I'm adding an attribute colMeas that holds the name of the column with the measurement:

library(data.table)

myclass <- function(dt, colMeas) {

  stopifnot(data.table::is.data.table(dt))

  data.table::setattr(dt, "colMeas", colMeas)
  data.table::setattr(dt, "class", union("myclass", class(dt)))

}

is.myclass <- function(obj) inherits(obj, "myclass")

I have a method that modifies the existing measurement column:

modCol <- function(obj, arg) {
  UseMethod("modCol")
}

# Modify the existing column
modCol.myclass <- function(obj, arg) {

  stopifnot(is.myclass(obj))
  stopifnot(is.numeric(arg))

  colMeas <- attr(obj, "colMeas")

  obj[,
      (colMeas) := get(colMeas) + arg]
}

And a method that adds a new column:

addCol <- function(obj, arg) {
  UseMethod("addCol")
}

# Add a column
addCol.myclass <- function(obj, arg) {

  stopifnot(is.myclass(obj))
  stopifnot(is.numeric(arg))

  colMeas <- attr(obj, "colMeas")

  obj[,
      colNew := get(colMeas) + arg]

  data.table::setattr(obj, "colNew", "colNew")
}

I'm using everything as follows:

library(data.table)
dt = data.table(x = 1:10,
                y = rep(1, 10))
myclass(dt, colMeas = "y")


modCol(dt, 10)
addCol(dt, 10)

Which gives:

> dt
     x  y colNew
 1:  1 11     21
 2:  2 11     21
 3:  3 11     21
 4:  4 11     21
 5:  5 11     21
 6:  6 11     21
 7:  7 11     21
 8:  8 11     21
 9:  9 11     21
10: 10 11     21

> attributes(dt)
$names
[1] "x"      "y"      "colNew"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10

$class
[1] "myclass"    "data.table" "data.frame"

$.internal.selfref
<pointer: 0x7f841e016ee0>

$colMeas
[1] "y"

$colNew
[1] "colNew"

The question is more about the R/S3 "doctrine". In the methods above I'm modifying the data.table object "in-place" and I can call these functions without assigning results to new objects. Is this a correct way of handling data.table objects in S3 classes? Or should I add explicit return(obj) to all functions and then assign the results like so:

dt = myclass(dt, colMeas = "y")
    
dt = modCol(dt, 10)
dt = addCol(dt, 10)

Wouldn't that lead to an excessive copying of the dt object?


Solution

  • I would vote Yes to modify it in place, that is, do not make it necessary to catch the returned value.

    (I changed my mind twice during thinking about this reply, but now I'm sure).

    There are several function in the data.table that modify objects in place, setnames(...) for example. There is clear precedence for this.

    There is also a general phiolosophy in the data.table code base to work by reference, it is an important feature that sets it apart from data.frames

    Playing into this design philosophy sounds like the rigth thing to do.

    Note: I think it's still nice to invisibly return the data.table object.