pythonalgorithmrecursiontreecyk

Extra branches being added to tree when generated


I've implemented the CYK parsing algorithm which uses a bottoms-up approach to build a parse tree. As it traverses the algorithm, the path to the ultimate solution is stored in backpointers. From the backpointers, we construct the tree. This final step is what I am having issues with.

This is the data structure I'm using to store the tree:

class GrammarTree(object):
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

    def insertLeft(self, new_node):
        self.left = GrammarTree(new_node)

    def insertRight(self, new_node):
        self.right = GrammarTree(new_node)

The following is how I build the tree, where back stores a tuple where split is the index used to split the tree and left_rule and right_rule are the rules for the respective tree represented by int. If a leaf node is reached there is no tuple, just an int representing the terminal rule.

def build_tree(start,end,idx,back):
    tree = GrammarTree(idx)
    node = back[start][end][idx]
    if isinstance(node,tuple):
        split,left_rule,right_rule = node
        tree.insertLeft(build_tree(start,split,left_rule,back))
        tree.insertRight(build_tree(split,end,right_rule,back))
        return tree
    else:
        tree.insertLeft(GrammarTree(node))
        return tree 

The problem is that when function is done building the tree, there are extra branches being added, i.e. the nodes aren't being properly glued together.

This is what it looks like:

Lvl0                                root
                          /                            \
Lvl1                     L1                             R1
                  /       |    \             /           |       \
                 /        |     \           /            |        \
                /         |      \         /             |         \
               /          |       \       /              |          \
              /           |        \     /               |           \
Lvl2  L1.left=None L1.right=None L1.data R1.left=None R1.right=None R1.data
                                  /    \                            /    \
Lvl3                             L2     R2                         L3     R3

There shouldn't be a data node between the trees.

Edit:

The problem is not that there is an extra data node (above statement is wrong), it's that after Lvl1, instead of new branches being added to L1.left/right and R1.left/right on Lvl2, they are added to L1 and R1's data fields. So L1/R1.data ends up being a tree in and of itself and L1.left/right and R1.left/right are not used and therefore None.

It should look like this:

                  root
           /               \
          /                 \
   L1=root.left         R1=root.right
     /    \                 /   \
    /      \               /     \
   /        \             /       \
L2=L1.left  R2=L1.right L3=R1.left R3=R1.right

This is where I call build tree:

back = [[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (1, 6, 7), 0, 3, 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (1, 7, 3), 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (2, 7, 3), 0, (1, 7, 7)]],\ 
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (2, 7, 7)], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (3, 7, 7)]],\
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (3, 6, 7), 0, 3, 0, (3, 7, 7)]],\
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2]],\
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]]
build_tree(0,4,5,back)

Solution

  • The problem was in the insertLeft() and insertRight() methods of the GrammarTree class. Instead of simply connecting the branches, you'll seet that I was calling the GrammarTree constructor so I was essentially wrapping a tree inside another tree.

    I fixed the problem by removing the call to the constructor.

    class GrammarTree(object):
        def __init__(self, data):
            self.data = data
            self.left = None
            self.right = None
    
        def insertLeft(self, new_node):
            self.left = new_node  ## <- NOT GrammarTree(new_node) 
    
        def insertRight(self, new_node):
            self.right = new_node ## <- NOT GrammarTree(new_node)