pythonpycparseranytree

How to convert pycparser ast to python anytree format?


I am trying to convert the ast from pycparser to python anytree for further processing tasks. But the anytree tree I am currently getting does not contain several useful information and I also cannot figure it out how to do that. Currently my code has a parsing function for a C code like this:

def program_parser(func):
    parser = c_parser.CParser()
    ast = parser.parse(func, filename='<none>')
    #print(ast)
    return ast

After getting this ast I am trying to build the anytree tree like this:

# Initialize head node of the code.
head = Node(["1",get_token(c_code)])
# Recursively construct AST tree.
for child_order in range(len(get_children(c_code))):
    get_trees(get_children(c_code)[child_order], head, "1"+str(int(child_order)+1))

The get_trees, get_children, and get_token are also given below:

def get_token(node):
    token = ''
    if isinstance(node, c_ast.FileAST):
        token = node.__class__.__name__
    elif isinstance(node[1], str):
        token = node[1]
        #print(token)
    elif isinstance(node[1], set):
        token = 'Modifier'  # node.pop()
        #print(token)
    elif isinstance(node[1], c_ast.Node):
        token = node[1].__class__.__name__
        #print(token)
    #print(token)
    return token

def get_children(root):
    if isinstance(root, c_ast.FileAST):
        children = root.children()
    elif isinstance(root[1], c_ast.Node):
        children = root[1].children()
    elif isinstance(root, set):
        children = list(root)
    else:
        children = []

    def expand(nested_list):
        for item in nested_list:
            if isinstance(item, list):
                for sub_item in expand(item):
                    yield sub_item
            elif item:
                yield item

    return list(expand(children))

def get_trees(current_node, parent_node, order):
    
    token, children = get_token(current_node), get_children(current_node)
    node = Node([order,token], parent=parent_node, order=order)

    for child_order in range(len(children)):
        get_trees(children[child_order], node, order+str(int(child_order)+1))

There could be some extra variables I guess, since I am trying to convert the same code used for javalang for Java to C code. The current code creates an anytree instance with head as the root node, but this tree does not include some details. Here is an example output for the anytree:

void main()
{
    printf("Hello world");
}

output: anytree ast

Here, details like "Hello world" (as the child node for "Constant" node) or ID value like "printf" or even function names, type names are missing. Also, I am not sure if the conversion from pycparser to anytree is fully correct as the anytree looks a bit odd to me with so any nodes for a simple program. Could someone help me with this?


Solution

  • You can update your get_token() function so that you can access the token names or replace the class name with values wherever you want. Here I have added some codes so that it can replace the class names with attribute names and values where it is available.

    def get_token(node):
        token = ''
        if isinstance(node, c_ast.FileAST):
            token = node.__class__.__name__
            return token
        elif isinstance(node[1], str):
            token = node[1]
            return token
            #print(token)
        elif isinstance(node[1], c_ast.Node):
            token = node[1].__class__.__name__
        #print(token)
        
        if len(get_children(node))==0:
            attr_names = node[1].attr_names
            if attr_names:
                if 'names' in attr_names:
                    token = node[1].names[0]
                elif 'name' in attr_names:
                    token = node[1].name
                else:
                    token = node[1].value
        else:
            if token == 'TypeDecl':
                token = node[1].declname
            if node[1].attr_names:
                attr_names = node[1].attr_names
                if 'op' in attr_names:
                    if node[1].op[0] == 'p':
                        token = node[1].op[1:]
                    else:
                        token = node[1].op
     
        if token == '':
            token = node[1].__class__.__name__
        return token