Let's say I have the following code :
def returns_false():
breakpoint()
return False
assert(returns_false())
print("Hello world")
Is there a sequence of pdb commands that will print "Hello world" without triggering an AssertionError first ?
I can't modify a single character of this source file, i'm only looking for what I can achieve while it's already running.
What I tried :
return True
s
to get in return mode, and then either retval=True
or locals()['__return__']=True
interact
and then return True
but that throws an exceptionBut none of that change the actual return value
This is actually possible in a dynamic language such as Python, where the code is interpreted line by line and can be self-modifying.
You'll have a lot of typing to do, but here's what you can type into the debugger:
import ctypes
tup = returns_false.__code__.co_consts
obj = ctypes.py_object(tup)
pos = ctypes.c_ssize_t(1)
o = ctypes.py_object(True)
ref_count = ctypes.c_long.from_address(id(tup))
original_count = ref_count.value
ref_count.value = 1
ctypes.pythonapi.Py_IncRef(o)
ctypes.pythonapi.PyTuple_SetItem(obj, pos, o)
ref_count.value = original_count
c
This modifies the return value itself in memory, and will cause returns_false
to return True
instead of False
. The subsequent assertion after exiting pdb will pass. There is no sleight-of-hand here, and no modifying of the original source files, we're literally changing the return value at runtime.
The last line here "c" is the shortcut for "continue" in pdb, and will exit the debugger.
I'm using Python 3.12.4 here. Some implementation detail may be different in other Python versions, but the same basic technique should work.
For the original source code:
def returns_false():
breakpoint()
return False
assert(returns_false())
print("Hello world")
Consider the disassembly of your function using stdlib dis module:
>>> import dis
>>> dis.dis(returns_false)
1 0 RESUME 0
2 2 LOAD_GLOBAL 1 (NULL + breakpoint)
12 CALL 0
20 POP_TOP
3 22 RETURN_CONST 1 (False)
The last line indicates the return value, RETURN_CONST (False). You'll also see three numbers on the last line of the disassembly:
3
refers to the line number of the return statement in the source file.22
is the offset of that instruction within the bytecode.1
is the op arg of the RETURN_CONST op.This last bullet point is interesting. It actually means the function is returning item 1 from the consts table of the function object. dis
has helpfully indicated this item is "False" in parentheses, but it's just rendering whatever item 1 in the consts table is:
>>> returns_false.__code__.co_consts
(None, False)
The consts table will be longer if you have more consts in your function body, for example if you added the line x = 1234
inside the function you'd expect to see 1234
in the consts table, and the return value would now be found at index 2 instead (the pos
in my example would also have to be changed accordingly).
So the consts table is a tuple, which is an immutable type, but what if we could modify that tuple (everything in Python is mutable, if you know where to look). Would changing the item at index 1 change the return value of the function? Indeed, it would.
Part of the C API is PyTuple_SetItem
int PyTuple_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o)
Insert a reference to object o at position pos of the tuple pointed to by p. Return
0
on success. If pos is out of bounds, return-1
and set anIndexError
exception.
And the CPython devs are even so helpful as to provide a Python API in ctypes.pythonapi
to use functions such as PyTuple_SetItem
directly from within the runtime.
The rest of the answer is a matter of technique, and some know-how about the implementation, taking care not to cause a segfault or screw up reference counting.
Note that the disassembly of the "hacked" function will respect updates to the consts table, and will now render RETURN_CONST with (True):
>>> returns_false.__code__.co_consts
(None, False)
>>> import ctypes
... tup = returns_false.__code__.co_consts
... obj = ctypes.py_object(tup)
... pos = ctypes.c_ssize_t(1)
... o = ctypes.py_object(True)
... ref_count = ctypes.c_long.from_address(id(tup))
... original_count = ref_count.value
... ref_count.value = 1
... ctypes.pythonapi.Py_IncRef(o)
... ctypes.pythonapi.PyTuple_SetItem(obj, pos, o)
... ref_count.value = original_count
...
>>> returns_false.__code__.co_consts
(None, True)
>>> returns_false()
> /private/tmp/p.py(3)returns_false()
-> return False
(Pdb) c
True
>>> import dis
>>> dis.dis(returns_false)
1 0 RESUME 0
2 2 LOAD_GLOBAL 1 (NULL + breakpoint)
12 CALL 0
20 POP_TOP
3 22 RETURN_CONST 1 (True)
Finally, I'll mention that RETURN_CONST
is not the only return opcode for a function, in fact it is new in Python 3.12. RETURN_VALUE
is likely more common. In other Python 3.x (I've checked 3.6-3.11) your code will use RETURN_VALUE
op, but the exact same patch typed verbatim in the debugger will work, because it will just be returning a value which was previously loaded from the consts table onto the stack. The disassembly detail will look different on Python 3.6-3.10, 3.11 and 3.12.
You may also be interested in the Q&A it is possible to monkeypatch a local variable introduced in a function body? where I have demonstrated a similar technique for mutating local variables of a function. In that answer I've also changed the return of a function which was using a RETURN_VALUE
opcode.