r

Restrict R functions for an exam


An unusual question: What is the simplest way to restrict R to certain functions for an exam? For instance, in our use case we would like students to use functions for distributions, but not other functions. So calls to qt, dt, pt, rt (an so on) are allowed, but not other functions.

We use the safe exam browser, so are able to call a specific version of R or use a specific startup file (.Rprofile).


Solution

  • The following is a rough outline. I would never recommend this as a security-hardened sandbox: there may be subtle ways to break out of it, and I would at least work on it for a few days and request feedback from multiple sources before having any confidence in it.

    But as a toy sandbox for an exam it may be good enough.

    It essentially enforces a list of allowed functions by masking all other functions that are provided by the core R packages. This is in contrast to using a blocklist, since the latter will basically be guaranteed to always miss something.

    However, out of laziness the following uses a hybrid approach rather than a pure allowlist. This is because we need to allow most primitive R functions, otherwise the console would be completely broken. We therefore maintain a (rather small) blocklist of primitive functions and allow all other primitives. But we block all the rest (i.e. non-primitive functions).

    To block functions globally, we insert a “firewall” environment into the environment search path, which will be hit before any of the masked, attached environments. And since we mask all functions/operators that could be used to circumvent the usual name lookup rules, users (hopefully) aren’t able to circumvent this firewall.

    One last thing is that this code needs to be executed after all core R packages have been attached, which is done by base::.First.sys(). Unfortunately R doesn’t provide a customisation hook that we could execute at this point. But we can override base::.First.sys().

    Put the following into the site or user profile:

    local({
      # We need to allow most primitives; so we take a (dangerous!) shortcut of allowing
      # them all, and then selectively removing (potentially) dangerous functions from them.
      primitives = function () {
        fun_names = utils::lsf.str(baseenv())
        Filter(\(x) is.primitive(get(x, baseenv())), fun_names)
      }
    
      block_list = c(
        '::',
        ':::',
        'as.environment',
        'browser',
        'baseenv',
        'emptyenv',
        'environment<-',
        'Exec',
        'globalenv',
        'lazyLoadDBfetch',
        'on.exit',
        'pos.to.env'
      )
    
      allow_list = c(
        # Put explicitly allowed stuff here!
        'qt', 'dt', 'pt', 'rt',
        'print',
        'q', 'quit',
        setdiff(primitives(), block_list)
      )
    
      stop = stop
      forbidden = function (...) stop('Forbidden')
    
      # The subsequent logic has to happen after the core packages are attached — i.e. after
      # `.First.sys()` is run. Since there is no suitable hook, we override the latter:
    
      .First.sys = .First.sys
    
      first_sys = function () {
        .First.sys()
    
        core_packages = sub('package:', '', search()[startsWith(search(), 'package:')])
        all_exported_functions = unlist(lapply(core_packages, \(pkg) getNamespaceExports(pkg)))
        forbidden_names = setdiff(all_exported_functions, allow_list)
    
        # Eagerly load all names, otherwise they will not be able to be loaded later on.
        lapply(allow_list, \(name) get(name))
    
        forbidden_list = stats::setNames(
          lapply(forbidden_names, \(.) forbidden),
          forbidden_names
        )
    
        # ‘compiler’ needs special handling.
    
        eapply(loadNamespace('compiler'), force, all.names = TRUE)
    
        `:::` = `:::`
        match_call = match.call
        eval = eval
        getNamespace = getNamespace
        ..getNamespace = ..getNamespace
    
        forbidden_list$`:::` = function (name, value) {
          if (as.character(substitute(name)) == 'compiler') {
            call = match_call()
            call[[1L]] = `:::`
            eval(call)
          } else {
            stop('Forbidden')
          }
        }
    
        # Allow calling this only once; this is required by the R interpreter, but we can
        # disable subsequent calls by the user.
        first_call = TRUE
        self = environment()
    
        forbidden_list$getNamespace = function (name) {
          if (name == 'compiler' && first_call) {
            self$first_call = FALSE
            getNamespace(name)
          } else {
            stop('Forbidden')
          }
        }
    
        list2env(
          forbidden_list,
          envir = attach(NULL, name = 'blocked')
        )
      }
    
      unlockBinding('.First.sys', baseenv())
      assign('.First.sys', first_sys, envir = baseenv())
    })
    

    Some more caveats:

    1. If the user is able to control the session start, the above won’t work. This includes quitting the R session and starting a new session from the shell, as well as hitting Ctrl-C while R is starting, to abort execution of the profile code.
    2. The above disables debugging/tracing functions and error handling customisation (which can be done via options() and possibly via Sys.setenv()), but it’s possible that I overlooked some clever way of breaking into the scope of a called function. If that ever happens, it’s game over: at that point the user can freely roam around “behind” the firewall environment, and therefore call any functions they desire.
    3. Adding functions to the allow_list requires care! Adding the wrong function will completely circumvent the block. For example, allowing as.environment() may seem innocuous. But as.environment(3) gives the user access to the search path beyond the firewall, and thus breaks the sandbox.
    4. The above causes an (innocuous, as far as I can tell) error message at startup. I’m sure this can be avoided, but it requires digging through the R startup source code, to figure out where this is coming from.
    5. Auto-completion isn’t working, since that is implemented via function calls that are blocked. It could be reenabled with some extra work.
    6. I have only tested this directly in native the R terminal. It’s possible that other terminal shells for R work slightly differently and break the sandbox (or lead to other issues).
    7. The above code contains an exception to allow loading the ‘compiler’ package, which is required by the R interpreter. Unfortunately this punches a gaping hole into our firewall, since that package allows lots of shenanigans. I am almost certain that this could be exploited. A better (i.e. non-proof-of-concept) implementation would tighten the code further to ensure that getNamespace('compiler') can only be called by the R interpreter, not by the user. This is “left as an exercise to the reader”.