pythonabstract-syntax-tree

Using python AST to traverse code and extract return statements


I have a python script that I am trying to "decode" as it were so that I can cast it as an xml, but for this exercise I am just trying to get my head around using ast.walk() and how best to get info out of it. Below is a very simple function that I wrote to simple_if.py and what I am trying to do is extract the function name and the return statements as a list in order.

def if_else(abc):
    if abc > 0:
        return "Hello"
    elif abc < 0:
        return "Goodbye"
    return "Neither Hello Nor Goodbye"

When I put the above into the file and run the below ast functions on it like so

with open("simple_if.py", "r", encoding = "utf-8") as ast_tree_walker
    tree = ast.parse(ast_tree_walker.read())

expList = []
counter = 0

for node in ast.walk(tree):
    print(node)
    if isInstance(node, ast.FunctionDef) and counter == 0:
        expList.append(node.__dict__["name"]
        counter = 1 # this adds the functiona name if_else to the list 
    print(node.__dict__)

The above would give a terminal output of nodes, dicts, and args, but the only thing I have been able to accomplish is extracting the function name since it is the first non-Module node in the tree. I know I am dumping everything that I "want" but its not clear to me how I should keep track of the ordering for instance "Neither Hello Nor Goodbye" would appear before either of "Hello" or "Goodbye" and although I assume that is because in terms of a tree the final return is on the same level as the if statement it is not clear to me how I can maintain order (if at all). Basically does anyone know how I could return a list of ["if_else", "Hello", "Goodbye", "Neither Hello Nor Goodbye"]? On some level I feel this is a fools errand, but in the pursuit of knowledge I am wondering how to go about this.

Nuanced question the above prints out types of:

<ast.Module object at (random token string that I won't write)>, <ast.FunctionDef object at >, <ast.If object at >, <ast.Return object at >... 

When I come across a "<ast.If object at >" should I step into the 'body' key directly or somehow just wait till that child node gets visited and then extract the info?

#children of ast.If
{'test': <ast.Compare object at>, 'body" <ast.Return object at >, 'orelse': [<ast.If object at>, 'lineno':2, 'col_offset':7}

Solution

  • Although the documentation claims that ast.walk generates nodes "in no specified order" and even misleadingly characterizes its behavior as "Recursively yield all descendant nodes...", a quick look at the source code reveals that it really performs a simple breadth-first traversal by iteratively yielding child nodes in a queue:

    def walk(node):
        """
        Recursively yield all descendant nodes in the tree starting at *node*
        (including *node* itself), in no specified order.  This is useful if you
        only want to modify nodes in place and don't care about the context.
        """
        from collections import deque
        todo = deque([node])
        while todo:
            node = todo.popleft()
            todo.extend(iter_child_nodes(node))
            yield node
    

    So the node for return "Neither Hello Nor Goodbye" is yielded before the node for return "Hello" because the former is an immediate child of def if_else(...) while the latter is a child of its child.

    To produce the nodes in your desired depth-first order you can write your own recursive node traversal function instead:

    import ast
    
    def dfs_walk(node):
        yield node
        for child in ast.iter_child_nodes(node):
            yield from dfs_walk(child)
    

    so that:

    tree = ast.parse('''\
    def if_else(abc):
        if abc > 0:
            return "Hello"
        elif abc < 0:
            return "Goodbye"
        return "Neither Hello Nor Goodbye"
    ''')
    
    expList = []
    for node in dfs_walk(tree):
        if isinstance(node, ast.FunctionDef):
            expList.append(node.name)
        elif isinstance(node, ast.Return):
            expList.append(node.value.value)
    print(expList)
    

    outputs:

    ['if_else', 'Hello', 'Goodbye', 'Neither Hello Nor Goodbye']
    

    Demo here