pythonintrospectioncpythontrepan

getting the C python exec argument string or accessing the evaluation stack


In my python debugger I have a way of remapping a string to a filename so that when you are stepping through an exec'd function inside the debugger you can list lines pygmentized, or view them along inside an editor like Emacs via realgud.

So I'd like to be able to extract the string in an exec statement when CPython is stopped inside evaluating that.

I already have a mechanism that can look back in the call frame to see if the caller was an EXEC_STMT and I can look back one instruction to see if the previous instruction was say DUP_TOP. So I'd be home free if I could just figure out a way to read the stack entry at the time of the call and that gives the string evaluated. There is probably a way to drop into C to get this, but my knowledge of CPython internals lacking, and would prefer not to do this. If there's a package out there, maybe I could include that optionally.

CPython already provides access to function arguments, and local variables but of course since this is a built-in function this isn't recorded as a function parameter.

If there are other thoughts at how to do the same thing, that'd be okay too. I feel a less good solution would be to somehow try to overload or replace exec since debuggers can be brought in late in the game.

I understand that CPython2 and CPython3 may be a little bit different here, but to start off either would do.


Solution

  • I think I've now found a way.

    Inside the debugger I go up the call stack one level to get to the exec statement. Then I can use uncompyle6 to get a syntax tree of the source code. (A change may be needed in uncompyle6 to make this easier.)

    The tree at the point of call will have something like exec_stmt -> expr .... That expression will have the text of the expression which is not necessarily the value of the expression. The expression could be a constant string value, but it could be something complex like "foo" + var1.

    So then the debugger can evaluate that string in the context of the debugger which knows how to evaluate expressions up the call stack.

    This still has a problem of the reevaluating the expression may have side effects. But that's bad programming practice, right? ;-)

    So instead what I do is just decompile the code from the bytecode if the source isn't there. This has a disadvantage in that the line numbers mentioned in the bytecode don't always line up with those in the bytecode. For that the method of recreating the string above is better.

    In closing, I hope to give some idea why writing a really good debugger is hard and why the vast number of debuggers have a number of limitations on even simple things like getting the source text at the point you are currently stopped.

    A totally different approach would be to stop early and switch to an sub-interpreter like x-python (or some suitably modified Python C module) which would have access to a stack.