pythonparsingtreeparse-tree

How can I convert python code into a parse tree and back into the original code?


I would like to be able to convert python code (a string) into a parse tree, modify it at the tree level, then convert the tree into code (a string). When converting to a parse tree and back into code without any tree-level modification, the resulting code should match the original input code exactly.

I would like to use python for this. I found the ast and parser python modules, however ast trees lose information about the original code. As for the parser module, I can't seem to figure out how to manipulate the parse tree or convert it into code.

Here's what I have so far.

import ast
import astor # pip install astor
import parser

code = 'hi = 0'
ast_tree = ast.parse(code)
code_from_ast = astor.to_source(tree) # 'hi = 0\n'
parser_tree = parser.suite(code)
code_from_parser = ???

Solution

  • As you mentioned, the built-in ast module doesn't preserve many formatting information (whitespaces, comments, etc). You need a Concrete Syntax Tree (e.g. LibCST) instead of Abstract Syntax Tree in this case. (You can install by pip install libcst)

    Here is an example shows how to change the code from hi = 0 to hi = 2 by parse code as tree, mutating the tree and render tree back to source code. More advanced usage could be found in https://libcst.readthedocs.io/

    In [1]: import libcst as cst
    
    In [2]: code = 'hi = 0'
    
    In [3]: tree = cst.parse_module(code)
    
    In [4]: print(tree)
    Module(
        body=[
            SimpleStatementLine(
                body=[
                    Assign(
                        targets=[
                            AssignTarget(
                                target=Name(
                                    value='hi',
                                    lpar=[],
                                    rpar=[],
                                ),
                                whitespace_before_equal=SimpleWhitespace(
                                    value=' ',
                                ),
                                whitespace_after_equal=SimpleWhitespace(
                                    value=' ',
                                ),
                            ),
                        ],
                        value=Integer(
                            value='0',
                            lpar=[],
                            rpar=[],
                        ),
                        semicolon=MaybeSentinel.DEFAULT,
                    ),
                ],
                leading_lines=[],
                trailing_whitespace=TrailingWhitespace(
                    whitespace=SimpleWhitespace(
                        value='',
                    ),
                    comment=None,
                    newline=Newline(
                        value=None,
                    ),
                ),
            ),
        ],
        header=[],
        footer=[],
        encoding='utf-8',
        default_indent='    ',
        default_newline='\n',
        has_trailing_newline=False,
    )
    
    In [5]: class ModifyValueVisitor(cst.CSTTransformer):
       ...:     def leave_Assign(self, node, updated_node):
       ...:         return updated_node.with_changes(value=cst.Integer(value='2'))
       ...:
    
    In [6]: modified_tree = tree.visit(ModifyValueVisitor())
    
    In [7]: modified_tree.code
    Out[7]: 'hi = 2'