I'd like to create an S3 class that extends data.table
by adding attributes that would be used by other methods of that class. In the example below I'm adding an attribute colMeas
that holds the name of the column with the measurement:
library(data.table)
myclass <- function(dt, colMeas) {
stopifnot(data.table::is.data.table(dt))
data.table::setattr(dt, "colMeas", colMeas)
data.table::setattr(dt, "class", union("myclass", class(dt)))
}
is.myclass <- function(obj) inherits(obj, "myclass")
I have a method that modifies the existing measurement column:
modCol <- function(obj, arg) {
UseMethod("modCol")
}
# Modify the existing column
modCol.myclass <- function(obj, arg) {
stopifnot(is.myclass(obj))
stopifnot(is.numeric(arg))
colMeas <- attr(obj, "colMeas")
obj[,
(colMeas) := get(colMeas) + arg]
}
And a method that adds a new column:
addCol <- function(obj, arg) {
UseMethod("addCol")
}
# Add a column
addCol.myclass <- function(obj, arg) {
stopifnot(is.myclass(obj))
stopifnot(is.numeric(arg))
colMeas <- attr(obj, "colMeas")
obj[,
colNew := get(colMeas) + arg]
data.table::setattr(obj, "colNew", "colNew")
}
I'm using everything as follows:
library(data.table)
dt = data.table(x = 1:10,
y = rep(1, 10))
myclass(dt, colMeas = "y")
modCol(dt, 10)
addCol(dt, 10)
Which gives:
> dt
x y colNew
1: 1 11 21
2: 2 11 21
3: 3 11 21
4: 4 11 21
5: 5 11 21
6: 6 11 21
7: 7 11 21
8: 8 11 21
9: 9 11 21
10: 10 11 21
> attributes(dt)
$names
[1] "x" "y" "colNew"
$row.names
[1] 1 2 3 4 5 6 7 8 9 10
$class
[1] "myclass" "data.table" "data.frame"
$.internal.selfref
<pointer: 0x7f841e016ee0>
$colMeas
[1] "y"
$colNew
[1] "colNew"
The question is more about the R/S3 "doctrine". In the methods above I'm modifying the data.table
object "in-place" and I can call these functions without assigning results to new objects. Is this a correct way of handling data.table
objects in S3 classes? Or should I add explicit return(obj)
to all functions and then assign the results like so:
dt = myclass(dt, colMeas = "y")
dt = modCol(dt, 10)
dt = addCol(dt, 10)
Wouldn't that lead to an excessive copying of the dt
object?
I would vote Yes to modify it in place, that is, do not make it necessary to catch the returned value.
(I changed my mind twice during thinking about this reply, but now I'm sure).
There are several function in the data.table that modify objects in place, setnames(...)
for example. There is clear precedence for this.
There is also a general phiolosophy in the data.table code base to work by reference, it is an important feature that sets it apart from data.frames
Playing into this design philosophy sounds like the rigth thing to do.
Note: I think it's still nice to invisibly return the data.table object.