pythonalgorithmparsingnlpautomata-theory

Python implementation for the CYK Algorithm


EDIT: error is this line if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:

I was able to implement the cky algorithm based off of the cky parser wiki with a small set of rules, terminals, and non terminals. But I scaled it to have more rules, words, grammar, and now it is giving me IndexError: list index out of range Does anyone have any idea of what I'm doing wrong with a bigger grammar set?

Here is the previous smaller scale of grammar that works if that helps.

non_terminals = ["NP", "Nom", "Det", "AP",  
                  "Adv", "A"] 
terminals = ["book", "orange", "man",  
             "tall", "heavy",  
             "very", "muscular"] 
  
# Rules of the grammar 
R = { 
     "NP": [["Det", "Nom"]], 
     "Nom": [["AP", "Nom"], ["book"],  
             ["orange"], ["man"]], 
     "AP": [["Adv", "A"], ["heavy"],  
            ["orange"], ["tall"]], 
     "Det": [["a"]], 
     "Adv": [["very"], ["extremely"]], 
     "A": [["heavy"], ["orange"], ["tall"],  
           ["muscular"]] 
    } 

Here's my function

def cykParse(w): n = len(w)

# Initialize the table 
T = [[set([]) for j in range(n)] for i in range(n)] 

# Filling in the table 
for j in range(0, n): 

    # Iterate over the rules 
    for lhs, rule in R.items(): 
        for rhs in rule: 
              
            # If a terminal is found 
            if len(rhs) == 1 and rhs[0] == w[j]: 
                T[j][j].add(lhs) 

    for i in range(j, -1, -1):    
           
        # Iterate over the range i to j + 1    
        for k in range(i, j + 1):      

            # Iterate over the rules 
            for lhs, rule in R.items(): 
                for rhs in rule: 
                      
                    # If a terminal is found 
                    if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]: 
                        T[i][j].add(lhs) 

# If word can be formed by rules  
# of given grammar 
if len(T[0][n-1]) != 0: 
    print("True") 
else: 
    print("False") 

Solution

  • I guess (because you didn't show the actual error indicating where the error occurs) that it's in this line:

    if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
    

    and that k is n-1. If the first two conditions are true then the third one will execute and blow up.

    I suspect that there is an off-by-one error in the iteration limit for k. Some code comments would have been useful, or at least a reference to the pseudocode you based your implementation on.