EDIT: error is this line if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
I was able to implement the cky algorithm based off of the cky parser wiki with a small set of rules, terminals, and non terminals. But I scaled it to have more rules, words, grammar, and now it is giving me
IndexError: list index out of range
Does anyone have any idea of what I'm doing wrong with a bigger grammar set?
Here is the previous smaller scale of grammar that works if that helps.
non_terminals = ["NP", "Nom", "Det", "AP",
"Adv", "A"]
terminals = ["book", "orange", "man",
"tall", "heavy",
"very", "muscular"]
# Rules of the grammar
R = {
"NP": [["Det", "Nom"]],
"Nom": [["AP", "Nom"], ["book"],
["orange"], ["man"]],
"AP": [["Adv", "A"], ["heavy"],
["orange"], ["tall"]],
"Det": [["a"]],
"Adv": [["very"], ["extremely"]],
"A": [["heavy"], ["orange"], ["tall"],
["muscular"]]
}
Here's my function
def cykParse(w): n = len(w)
# Initialize the table
T = [[set([]) for j in range(n)] for i in range(n)]
# Filling in the table
for j in range(0, n):
# Iterate over the rules
for lhs, rule in R.items():
for rhs in rule:
# If a terminal is found
if len(rhs) == 1 and rhs[0] == w[j]:
T[j][j].add(lhs)
for i in range(j, -1, -1):
# Iterate over the range i to j + 1
for k in range(i, j + 1):
# Iterate over the rules
for lhs, rule in R.items():
for rhs in rule:
# If a terminal is found
if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
T[i][j].add(lhs)
# If word can be formed by rules
# of given grammar
if len(T[0][n-1]) != 0:
print("True")
else:
print("False")
I guess (because you didn't show the actual error indicating where the error occurs) that it's in this line:
if len(rhs) == 2 and rhs[0] in T[i][k] and rhs[1] in T[k + 1][j]:
and that k
is n-1
. If the first two conditions are true then the third one will execute and blow up.
I suspect that there is an off-by-one error in the iteration limit for k
. Some code comments would have been useful, or at least a reference to the pseudocode you based your implementation on.