I would like to be able to convert python code (a string) into a parse tree, modify it at the tree level, then convert the tree into code (a string). When converting to a parse tree and back into code without any tree-level modification, the resulting code should match the original input code exactly.
I would like to use python for this. I found the ast
and parser
python modules, however ast trees lose information about the original code. As for the parser
module, I can't seem to figure out how to manipulate the parse tree or convert it into code.
Here's what I have so far.
import ast
import astor # pip install astor
import parser
code = 'hi = 0'
ast_tree = ast.parse(code)
code_from_ast = astor.to_source(tree) # 'hi = 0\n'
parser_tree = parser.suite(code)
code_from_parser = ???
As you mentioned, the built-in ast
module doesn't preserve many formatting information (whitespaces, comments, etc).
You need a Concrete Syntax Tree (e.g. LibCST) instead of Abstract Syntax Tree in this case. (You can install by pip install libcst
)
Here is an example shows how to change the code from hi = 0
to hi = 2
by parse code as tree, mutating the tree and render tree back to source code.
More advanced usage could be found in https://libcst.readthedocs.io/
In [1]: import libcst as cst
In [2]: code = 'hi = 0'
In [3]: tree = cst.parse_module(code)
In [4]: print(tree)
Module(
body=[
SimpleStatementLine(
body=[
Assign(
targets=[
AssignTarget(
target=Name(
value='hi',
lpar=[],
rpar=[],
),
whitespace_before_equal=SimpleWhitespace(
value=' ',
),
whitespace_after_equal=SimpleWhitespace(
value=' ',
),
),
],
value=Integer(
value='0',
lpar=[],
rpar=[],
),
semicolon=MaybeSentinel.DEFAULT,
),
],
leading_lines=[],
trailing_whitespace=TrailingWhitespace(
whitespace=SimpleWhitespace(
value='',
),
comment=None,
newline=Newline(
value=None,
),
),
),
],
header=[],
footer=[],
encoding='utf-8',
default_indent=' ',
default_newline='\n',
has_trailing_newline=False,
)
In [5]: class ModifyValueVisitor(cst.CSTTransformer):
...: def leave_Assign(self, node, updated_node):
...: return updated_node.with_changes(value=cst.Integer(value='2'))
...:
In [6]: modified_tree = tree.visit(ModifyValueVisitor())
In [7]: modified_tree.code
Out[7]: 'hi = 2'