I've implemented the CYK parsing algorithm which uses a bottoms-up approach to build a parse tree. As it traverses the algorithm, the path to the ultimate solution is stored in backpointers. From the backpointers, we construct the tree. This final step is what I am having issues with.
This is the data structure I'm using to store the tree:
class GrammarTree(object):
def __init__(self, data):
self.data = data
self.left = None
self.right = None
def insertLeft(self, new_node):
self.left = GrammarTree(new_node)
def insertRight(self, new_node):
self.right = GrammarTree(new_node)
The following is how I build the tree, where back
stores a tuple where split
is the index used to split the tree and left_rule
and right_rule
are the rules for the respective tree represented by int. If a leaf node is reached there is no tuple, just an int representing the terminal rule.
def build_tree(start,end,idx,back):
tree = GrammarTree(idx)
node = back[start][end][idx]
if isinstance(node,tuple):
split,left_rule,right_rule = node
tree.insertLeft(build_tree(start,split,left_rule,back))
tree.insertRight(build_tree(split,end,right_rule,back))
return tree
else:
tree.insertLeft(GrammarTree(node))
return tree
The problem is that when function is done building the tree, there are extra branches being added, i.e. the nodes aren't being properly glued together.
This is what it looks like:
Lvl0 root
/ \
Lvl1 L1 R1
/ | \ / | \
/ | \ / | \
/ | \ / | \
/ | \ / | \
/ | \ / | \
Lvl2 L1.left=None L1.right=None L1.data R1.left=None R1.right=None R1.data
/ \ / \
Lvl3 L2 R2 L3 R3
There shouldn't be a data
node between the trees.
Edit:
The problem is not that there is an extra data node (above statement is wrong), it's that after Lvl1, instead of new branches being added to L1.left/right
and R1.left/right
on Lvl2, they are added to L1
and R1's
data
fields. So L1/R1.data
ends up being a tree in and of itself and L1.left/right
and R1.left/right
are not used and therefore None
.
It should look like this:
root
/ \
/ \
L1=root.left R1=root.right
/ \ / \
/ \ / \
/ \ / \
L2=L1.left R2=L1.right L3=R1.left R3=R1.right
This is where I call build tree:
back = [[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (1, 6, 7), 0, 3, 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (1, 7, 3), 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (2, 7, 3), 0, (1, 7, 7)]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (2, 7, 7)], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (3, 7, 7)]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (3, 6, 7), 0, 3, 0, (3, 7, 7)]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]]
build_tree(0,4,5,back)
The problem was in the insertLeft()
and insertRight()
methods of the GrammarTree
class. Instead of simply connecting the branches, you'll seet that I was calling the GrammarTree
constructor so I was essentially wrapping a tree inside another tree.
I fixed the problem by removing the call to the constructor.
class GrammarTree(object):
def __init__(self, data):
self.data = data
self.left = None
self.right = None
def insertLeft(self, new_node):
self.left = new_node ## <- NOT GrammarTree(new_node)
def insertRight(self, new_node):
self.right = new_node ## <- NOT GrammarTree(new_node)