I was attempting a Python Sandbox Escape challenge (Python 3.8.10) and it had builtins disabled:
exec(command, {'__builtins__': None, 'print':print}
There was also a filter that the input went through that blacklisted the usage of '
, "
and `
Due to these rules, I saw many people using ().__class__.__base__.__subclasses__()
and it returned a list of classes. They called these classes using their index value in the list.
For example:
print(().__class__.__base__.__subclasses__()[4])
would return something like
<class 'int'>
Why does ().__class__.__base__.__subclasses__()
work in this case and why does this command have to be in this specific order?
Piece by piece:
()
- The empty tuple
().__class__
- The tuple
class().__class__.__base__
- The object
class (the ultimate base class of all classes is object
)().__class__.__base__.__subclasses__()
- The __subclasses__
method returns a list
of all the immediate subclasses of the class it is called on; for object
, that means all classes with no other explicit parent class.The reason they're using this technique is that, with all builtin names disabled, ()
is still an object literal they can use to gain access to most of them.
If you did it in any other order, you'd get gibberish as a result; all the attributes save __class__
only exist on classes, so calling them on tuple
itself would be nonsensical. Failing to go to __base__
before calling __subclasses__()
would only get you the immediate subclasses of tuple
(possibly useful, as it includes all the named tuple types, both C level and Python level, in a program, but it wouldn't include stuff like int
).
The only part that can be trivially swapped out is the ()
; any legal Python object that is an instance of an immediate subclass of object
would work equivalently here (which I believe includes all types with literal syntax), so ""
, or 1.
(not 1
alone unless you put a space after it or wrap it in parentheses, as the method call .
would make it a float
literal, and without a second .
, you'd get a SyntaxError
), or []
, or {}
or whatever would all work equally well. The only real advantage to ()
is that it's a short name that's a singleton, so it's slightly more efficient than most of the other options, both in keystrokes and in runtime performance. Given the restrictions on '
, "
and \
, ()
and various numeric literals would be the only literals competitive with ().__class__
(assuming zero values for each numeric type are singletons; not a language guarantee, but a common optimization).