pythonrestrictedpython

RestrictedPython: Call other functions within user-specified code?


Using Yuri Nudelman's code with the custom _import definition to specify modules to restrict serves as a good base but when calling functions within said user_code naturally due to having to whitelist everything is there any way to permit other user defined functions to be called? Open to other sandboxing solutions although Jupyter didn't seem straight-forward to embed within a web interface.

from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitem

def _import(name, globals=None, locals=None, fromlist=(), level=0):
    safe_modules = ["math"]
    if name in safe_modules:
       globals[name] = __import__(name, globals, locals, fromlist, level)
    else:
        raise Exception("Don't you even think about it {0}".format(name))

safe_builtins['__import__'] = _import # Must be a part of builtins

def execute_user_code(user_code, user_func, *args, **kwargs):
    """ Executed user code in restricted env
        Args:
            user_code(str) - String containing the unsafe code
            user_func(str) - Function inside user_code to execute and return value
            *args, **kwargs - arguments passed to the user function
        Return:
            Return value of the user_func
    """

    def _apply(f, *a, **kw):
        return f(*a, **kw)

    try:
        # This is the variables we allow user code to see. @result will contain return value.
        restricted_locals = { 
            "result": None,
            "args": args,
            "kwargs": kwargs,
        }   

        # If you want the user to be able to use some of your functions inside his code,
        # you should add this function to this dictionary.
        # By default many standard actions are disabled. Here I add _apply_ to be able to access
        # args and kwargs and _getitem_ to be able to use arrays. Just think before you add
        # something else. I am not saying you shouldn't do it. You should understand what you
        # are doing thats all.
        restricted_globals = { 
            "__builtins__": safe_builtins,
            "_getitem_": default_guarded_getitem,
            "_apply_": _apply,
        }   

        # Add another line to user code that executes @user_func
        user_code += "\nresult = {0}(*args, **kwargs)".format(user_func)

        # Compile the user code
        byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")

        # Run it
        exec(byte_code, restricted_globals, restricted_locals)
       # User code has modified result inside restricted_locals. Return it.
        return restricted_locals["result"]

    except SyntaxError as e:
        # Do whaever you want if the user has code that does not compile
        raise
    except Exception as e:
        # The code did something that is not allowed. Add some nasty punishment to the user here.
        raise

i_example = """
import math

def foo():
    return 7

def myceil(x):
    return math.ceil(x)+foo()
"""
print(execute_user_code(i_example, "myceil", 1.5))

Running this returns 'foo' is not defined


Solution

  • First of all, the replacement for the __import__ built-in is implemented incorrectly. That built-in is supposed to return the imported module, not mutate the globals to include it:

    Python 3.9.12 (main, Mar 24 2022, 13:02:21)
    [GCC 11.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> __import__('math')
    <module 'math' (built-in)>
    >>> math
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'math' is not defined
    

    A better way to reimplement __import__ would be this:

    _SAFE_MODULES = frozenset(("math",))
    
    def _safe_import(name, *args, **kwargs):
        if name not in _SAFE_MODULES:
            raise Exception(f"Don't you even think about {name!r}")
        return __import__(name, *args, **kwargs)
    

    The fact that you mutated globals in your original implementation was partially masking the primary bug. Namely: name assignments within restricted code (function definitions, variable assignments and imports) mutate the locals dict, but name look-ups are by default done as global look-ups, bypassing the locals entirely. You can see this by disassembling the restricted bytecode using __import__('dis').dis(byte_code):

      2           0 LOAD_CONST               0 (0)
                  2 LOAD_CONST               1 (None)
                  4 IMPORT_NAME              0 (math)
                  6 STORE_NAME               0 (math)
    
      4           8 LOAD_CONST               2 (<code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>)
                 10 LOAD_CONST               3 ('foo')
                 12 MAKE_FUNCTION            0
                 14 STORE_NAME               1 (foo)
    
      7          16 LOAD_CONST               4 (<code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>)
                 18 LOAD_CONST               5 ('myceil')
                 20 MAKE_FUNCTION            0
                 22 STORE_NAME               2 (myceil)
                 24 LOAD_CONST               1 (None)
                 26 RETURN_VALUE
    
    Disassembly of <code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>:
      5           0 LOAD_CONST               1 (7)
                  2 RETURN_VALUE
    
    Disassembly of <code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>:
      8           0 LOAD_GLOBAL              0 (_getattr_)
                  2 LOAD_GLOBAL              1 (math)
                  4 LOAD_CONST               1 ('ceil')
                  6 CALL_FUNCTION            2
                  8 LOAD_FAST                0 (x)
                 10 CALL_FUNCTION            1
                 12 LOAD_GLOBAL              2 (foo)
                 14 CALL_FUNCTION            0
                 16 BINARY_ADD
                 18 RETURN_VALUE
    

    The documentation for exec explains (emphasis mine):

    If only globals is provided, it must be a dictionary (and not a subclass of dictionary), which will be used for both the global and the local variables. If globals and locals are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object. Remember that at the module level, globals and locals are the same dictionary. If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.

    This makes separate mappings for locals and globals completely spurious. We can therefore simply get rid of the locals dict, and put everything in globals. The entire code should look something like this:

    from RestrictedPython import safe_builtins, compile_restricted
    
    
    _SAFE_MODULES = frozenset(("math",))
    
    
    def _safe_import(name, *args, **kwargs):
        if name not in _SAFE_MODULES:
            raise Exception(f"Don't you even think about {name!r}")
        return __import__(name, *args, **kwargs)
    
    
    def execute_user_code(user_code, user_func, *args, **kwargs):
        my_globals = {
            "__builtins__": {
                **safe_builtins,
                "__import__": _safe_import,
            },
        }
    
        try:
            byte_code = compile_restricted(
                user_code, filename="<user_code>", mode="exec")
        except SyntaxError:
            # syntax error in the sandboxed code
            raise
    
        try:
            exec(byte_code, my_globals)
            return my_globals[user_func](*args, **kwargs)
        except BaseException:
            # runtime error (probably) in the sandboxed code
            raise
    

    Above I also managed to fix a couple of tangential issues:

    (Note, however, that this still does not protect against DoS via infinite loops within the sandbox.)