rdata.tablecolon-equals

R data.table ':=' works in direct call, but same function in a package fails


Using R's data.table package,

This works:

instruction = "a = data.table(name=1:3, value=1:3, blah=1:3); a[,c('value', 'blah'):=NULL]"
eval(parse(text=instruction))
#   name
#1:    1
#2:    2
#3:    3

This works:

myFunc = function(instruction) {
eval(parse(text=instruction))
}
myFunc(instruction)
#   name
#1:    1
#2:    2
#3:    3

Now, put this function into a package, load it, and try to call it. This doesn't work:

myFuncInPackage(instruction)
#Error in `:=`(c("value", "blah"), NULL) : 
#  Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").

Why?


EDIT: @Roland points out that adding data.table in the package Depends field makes it work. However, I don't think this is a great solution because the package doesn't really depend on, require, or use data.table. I just want to be able to use data.table with the package.

In addition, everything else with data.table works fine in the function, just not the := operator.

So I guess a followup question could be: should I add data.table to the Depends of every package I write, so that data.tables work as expected within functions of that package? This doesn't seem right... what is the correct way to approach this?


Solution

  • I've finally figured out the answer to this question (after several years). All comments and answers suggested adding data.table to Depends or Imports, but this is incorrect; the package does not depend on data.table and, that could be any package hypothetically, not just data.table, meaning taken to logical conclusion, the suggestion would require adding all possible packages to Depends -- since that dependency is provided by the user providing the instruction, not by the function provided by the package.

    Instead, basically, it's because call to eval is done within the namespace of the package, and this does not include the functions provided by other packages. I ultimately solved this by specifying the global environment in the eval call:

    myFunc = function(instruction) {
    eval(parse(text=instruction), envir=globalenv())
    }
    

    Why this works

    This causes the eval function to be done in the environment that will include the requisite packages in the search path.

    In the data.table case it's particularly hard to debug because of the complexity of the function overloading. In this case, the culprit is not actually the := function, but the [ function. The := error is a red herring. At the time of writing, the := function in data.table is defined like this:

    https://github.com/Rdatatable/data.table/blob/348c0c7fdb4987aa6da99fc989431d8837877ce4/R/data.table.R#L2561

    ":=" <- function(...) stop('Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").')

    That's it. What that means: any call to := as a function is stopped with an error message, because this is not how the authors intend := to be used. Instead, := is really just keyword that's interpreted by the [ function in data.table.

    But what happens here: if the [ function isn't correctly mapped to the version specified by data.table, and instead is mapped to the base [, then we have a problem -- since it can't handle := and so it's getting treated as a function and triggering the error message. So the culprit function is [.data.table -- the overloaded bracket operator.

    What's happening is in my new package (that holds myFuncInPackage), when it goes to evaluate the code, it resolves the [ function to the base [ function instead of to data.table's [ function. It tries to evaluate := as a function, which is not being consumed by the [ since it's not the correct [, so := is getting passed as a function instead of as a value to data.table's, because data.table is not in the namespace (or is lower in the search() hierarchy. In this setting, := is not understood and so it's being evaluated as a function, thus triggering the error message in the data.table code above.

    When you specify the eval to happen in the global environment, it correctly resolves the [ function to [.data.table, and the := is interpreted correctly.

    Incidentally, you can also use this if you're passing not a character string but a code block (better) to eval() inside a package:

    eval(substitute(instruction), envir=globalenv())

    Here, substitute prevents the instruction from being parsed (incorrectly) within the package namespace at the argument-eval stage, so that it makes it intact back to the globalenv where it can be correctly evaluated with the required functions in place.