I have a python script that I am trying to "decode" as it were so that I can cast it as an xml, but for this exercise I am just trying to get my head around using ast.walk() and how best to get info out of it. Below is a very simple function that I wrote to simple_if.py and what I am trying to do is extract the function name and the return statements as a list in order.
def if_else(abc):
if abc > 0:
return "Hello"
elif abc < 0:
return "Goodbye"
return "Neither Hello Nor Goodbye"
When I put the above into the file and run the below ast functions on it like so
with open("simple_if.py", "r", encoding = "utf-8") as ast_tree_walker
tree = ast.parse(ast_tree_walker.read())
expList = []
counter = 0
for node in ast.walk(tree):
print(node)
if isInstance(node, ast.FunctionDef) and counter == 0:
expList.append(node.__dict__["name"]
counter = 1 # this adds the functiona name if_else to the list
print(node.__dict__)
The above would give a terminal output of nodes, dicts, and args, but the only thing I have been able to accomplish is extracting the function name since it is the first non-Module node in the tree. I know I am dumping everything that I "want" but its not clear to me how I should keep track of the ordering for instance "Neither Hello Nor Goodbye" would appear before either of "Hello" or "Goodbye" and although I assume that is because in terms of a tree the final return is on the same level as the if statement it is not clear to me how I can maintain order (if at all). Basically does anyone know how I could return a list of ["if_else", "Hello", "Goodbye", "Neither Hello Nor Goodbye"]? On some level I feel this is a fools errand, but in the pursuit of knowledge I am wondering how to go about this.
Nuanced question the above prints out types of:
<ast.Module object at (random token string that I won't write)>, <ast.FunctionDef object at >, <ast.If object at >, <ast.Return object at >...
When I come across a "<ast.If object at >" should I step into the 'body' key directly or somehow just wait till that child node gets visited and then extract the info?
#children of ast.If
{'test': <ast.Compare object at>, 'body" <ast.Return object at >, 'orelse': [<ast.If object at>, 'lineno':2, 'col_offset':7}
Although the documentation claims that ast.walk
generates nodes "in no specified order" and even misleadingly characterizes its behavior as "Recursively yield all descendant nodes...", a quick look at the source code reveals that it really performs a simple breadth-first traversal by iteratively yielding child nodes in a queue:
def walk(node):
"""
Recursively yield all descendant nodes in the tree starting at *node*
(including *node* itself), in no specified order. This is useful if you
only want to modify nodes in place and don't care about the context.
"""
from collections import deque
todo = deque([node])
while todo:
node = todo.popleft()
todo.extend(iter_child_nodes(node))
yield node
So the node for return "Neither Hello Nor Goodbye"
is yielded before the node for return "Hello"
because the former is an immediate child of def if_else(...)
while the latter is a child of its child.
To produce the nodes in your desired depth-first order you can write your own recursive node traversal function instead:
import ast
def dfs_walk(node):
yield node
for child in ast.iter_child_nodes(node):
yield from dfs_walk(child)
so that:
tree = ast.parse('''\
def if_else(abc):
if abc > 0:
return "Hello"
elif abc < 0:
return "Goodbye"
return "Neither Hello Nor Goodbye"
''')
expList = []
for node in dfs_walk(tree):
if isinstance(node, ast.FunctionDef):
expList.append(node.name)
elif isinstance(node, ast.Return):
expList.append(node.value.value)
print(expList)
outputs:
['if_else', 'Hello', 'Goodbye', 'Neither Hello Nor Goodbye']
Demo here