An unusual question: What is the simplest way to restrict R to certain functions for an exam? For instance, in our use case we would like students to use functions for distributions, but not other functions. So calls to qt, dt, pt, rt
(an so on) are allowed, but not other functions.
We use the safe exam browser, so are able to call a specific version of R or use a specific startup file (.Rprofile).
The following is a rough outline. I would never recommend this as a security-hardened sandbox: there may be subtle ways to break out of it, and I would at least work on it for a few days and request feedback from multiple sources before having any confidence in it.
But as a toy sandbox for an exam it may be good enough.
It essentially enforces a list of allowed functions by masking all other functions that are provided by the core R packages. This is in contrast to using a blocklist, since the latter will basically be guaranteed to always miss something.
However, out of laziness the following uses a hybrid approach rather than a pure allowlist. This is because we need to allow most primitive R functions, otherwise the console would be completely broken. We therefore maintain a (rather small) blocklist of primitive functions and allow all other primitives. But we block all the rest (i.e. non-primitive functions).
To block functions globally, we insert a “firewall” environment into the environment search path, which will be hit before any of the masked, attached environments. And since we mask all functions/operators that could be used to circumvent the usual name lookup rules, users (hopefully) aren’t able to circumvent this firewall.
One last thing is that this code needs to be executed after all core R packages have been attached, which is done by base::.First.sys()
. Unfortunately R doesn’t provide a customisation hook that we could execute at this point. But we can override base::.First.sys()
.
Put the following into the site or user profile:
local({
# We need to allow most primitives; so we take a (dangerous!) shortcut of allowing
# them all, and then selectively removing (potentially) dangerous functions from them.
primitives = function () {
fun_names = utils::lsf.str(baseenv())
Filter(\(x) is.primitive(get(x, baseenv())), fun_names)
}
block_list = c(
'::',
':::',
'as.environment',
'browser',
'baseenv',
'emptyenv',
'environment<-',
'Exec',
'globalenv',
'lazyLoadDBfetch',
'on.exit',
'pos.to.env'
)
allow_list = c(
# Put explicitly allowed stuff here!
'qt', 'dt', 'pt', 'rt',
'print',
'q', 'quit',
setdiff(primitives(), block_list)
)
stop = stop
forbidden = function (...) stop('Forbidden')
# The subsequent logic has to happen after the core packages are attached — i.e. after
# `.First.sys()` is run. Since there is no suitable hook, we override the latter:
.First.sys = .First.sys
first_sys = function () {
.First.sys()
core_packages = sub('package:', '', search()[startsWith(search(), 'package:')])
all_exported_functions = unlist(lapply(core_packages, \(pkg) getNamespaceExports(pkg)))
forbidden_names = setdiff(all_exported_functions, allow_list)
# Eagerly load all names, otherwise they will not be able to be loaded later on.
lapply(allow_list, \(name) get(name))
forbidden_list = stats::setNames(
lapply(forbidden_names, \(.) forbidden),
forbidden_names
)
# ‘compiler’ needs special handling.
eapply(loadNamespace('compiler'), force, all.names = TRUE)
`:::` = `:::`
match_call = match.call
eval = eval
getNamespace = getNamespace
..getNamespace = ..getNamespace
forbidden_list$`:::` = function (name, value) {
if (as.character(substitute(name)) == 'compiler') {
call = match_call()
call[[1L]] = `:::`
eval(call)
} else {
stop('Forbidden')
}
}
# Allow calling this only once; this is required by the R interpreter, but we can
# disable subsequent calls by the user.
first_call = TRUE
self = environment()
forbidden_list$getNamespace = function (name) {
if (name == 'compiler' && first_call) {
self$first_call = FALSE
getNamespace(name)
} else {
stop('Forbidden')
}
}
list2env(
forbidden_list,
envir = attach(NULL, name = 'blocked')
)
}
unlockBinding('.First.sys', baseenv())
assign('.First.sys', first_sys, envir = baseenv())
})
Some more caveats:
options()
and possibly via Sys.setenv()
), but it’s possible that I overlooked some clever way of breaking into the scope of a called function. If that ever happens, it’s game over: at that point the user can freely roam around “behind” the firewall environment, and therefore call any functions they desire.allow_list
requires care! Adding the wrong function will completely circumvent the block. For example, allowing as.environment()
may seem innocuous. But as.environment(3)
gives the user access to the search path beyond the firewall, and thus breaks the sandbox.getNamespace('compiler')
can only be called by the R interpreter, not by the user. This is “left as an exercise to the reader”.