I am trying to convert the ast from pycparser to python anytree for further processing tasks. But the anytree tree I am currently getting does not contain several useful information and I also cannot figure it out how to do that. Currently my code has a parsing function for a C code like this:
def program_parser(func):
parser = c_parser.CParser()
ast = parser.parse(func, filename='<none>')
#print(ast)
return ast
After getting this ast I am trying to build the anytree tree like this:
# Initialize head node of the code.
head = Node(["1",get_token(c_code)])
# Recursively construct AST tree.
for child_order in range(len(get_children(c_code))):
get_trees(get_children(c_code)[child_order], head, "1"+str(int(child_order)+1))
The get_trees, get_children, and get_token are also given below:
def get_token(node):
token = ''
if isinstance(node, c_ast.FileAST):
token = node.__class__.__name__
elif isinstance(node[1], str):
token = node[1]
#print(token)
elif isinstance(node[1], set):
token = 'Modifier' # node.pop()
#print(token)
elif isinstance(node[1], c_ast.Node):
token = node[1].__class__.__name__
#print(token)
#print(token)
return token
def get_children(root):
if isinstance(root, c_ast.FileAST):
children = root.children()
elif isinstance(root[1], c_ast.Node):
children = root[1].children()
elif isinstance(root, set):
children = list(root)
else:
children = []
def expand(nested_list):
for item in nested_list:
if isinstance(item, list):
for sub_item in expand(item):
yield sub_item
elif item:
yield item
return list(expand(children))
def get_trees(current_node, parent_node, order):
token, children = get_token(current_node), get_children(current_node)
node = Node([order,token], parent=parent_node, order=order)
for child_order in range(len(children)):
get_trees(children[child_order], node, order+str(int(child_order)+1))
There could be some extra variables I guess, since I am trying to convert the same code used for javalang for Java to C code. The current code creates an anytree instance with head as the root node, but this tree does not include some details. Here is an example output for the anytree:
void main()
{
printf("Hello world");
}
output: anytree ast
Here, details like "Hello world" (as the child node for "Constant" node) or ID value like "printf" or even function names, type names are missing. Also, I am not sure if the conversion from pycparser to anytree is fully correct as the anytree looks a bit odd to me with so any nodes for a simple program. Could someone help me with this?
You can update your get_token() function so that you can access the token names or replace the class name with values wherever you want. Here I have added some codes so that it can replace the class names with attribute names and values where it is available.
def get_token(node):
token = ''
if isinstance(node, c_ast.FileAST):
token = node.__class__.__name__
return token
elif isinstance(node[1], str):
token = node[1]
return token
#print(token)
elif isinstance(node[1], c_ast.Node):
token = node[1].__class__.__name__
#print(token)
if len(get_children(node))==0:
attr_names = node[1].attr_names
if attr_names:
if 'names' in attr_names:
token = node[1].names[0]
elif 'name' in attr_names:
token = node[1].name
else:
token = node[1].value
else:
if token == 'TypeDecl':
token = node[1].declname
if node[1].attr_names:
attr_names = node[1].attr_names
if 'op' in attr_names:
if node[1].op[0] == 'p':
token = node[1].op[1:]
else:
token = node[1].op
if token == '':
token = node[1].__class__.__name__
return token