In order to understand cyk algorithm I've worked through example on : https://www.youtube.com/watch?v=VTH1k-xiswM&feature=youtu.be .
The result of which is :
How do I extract the probabilities associated with each parse and extract the most likely parse tree ?
These are two distinct problems for PCFG:
The CKY algorithm video linked in the question only deals with the recognition problem. To perform the parsing problem simultaneously, we need to (i) maintain the score of each parsing item and (ii) keep track of the hierarchical relationships (e.g. if we use the rule S -> NP VP: we must keep track of which NP and which VP are used to predict S).
Notations:
[X, i, j]: s
means that there is a node labelled X spanning token i (included) to j (excluded) with score s.
The score is the log probability of the subtree rooted in X
.w_1 w_2 ... w_n
.At an abstract level, PCFG parsing can be formulated as a set of deduction rules:
Scan Rule (read tokens)
____________________________{precondition: X -> w_i is a grammar rule
[X, i, i+1]: log p(X -> w_i)
Gloss: if there is a rule X -> w_i
in the grammar, then we can add the item [X, i, i+1]
in the chart.
Complete Rule (create a new constituent bottom up)
[X, i, k]: s1 [Y, k, j]: s2
_____________________________________{precondition: Z -> X Y is a grammar rule
[Z, i, j]: s1 + s2 + log p(Z -> X, Y)
Gloss: if there are 2 parsing items [X, i, k]
and [Y, k, j]
in the chart, and a rule Z -> X Y
in the grammar, then we can add [Z, i, j]
to the chart.
The goal of weighted parsing is to deduce the parsing item [S, 0, n]:s
(S
: axiom, n
: length of sentence) with the highest score s
.
Pseudo code for whole algorithm
# The chart stores parsing items and their scores
chart[beginning(int)][end(int)][NonTerminal][score(float)]
# the backtrack table is used to recover the parse tree at the end
backtrack[beginning][end][NonTerminal][item_left, item_right]
# insert a new item in the chart
# for a given span (i, j) and nonterminal X, we only need to
# keep the single best scoring item.
def insert(X, i, j, score, Y, Z, k):
if X not in chart[i][j] or chart[i][j][X] < score
chart[i][j][X] <- score
backtrack[i][j][X] <- (Y, i, k), (Z, k, j)
n <- length of sentence
for i in range(0, n):
# apply scan rule
insert(X, i, i+1, log p(X -> w_i)) for each grammar rule X -> w_i
for span_length in range(2, n):
for beginning in range(0, n - span_length):
end <- beginning + span_length
for k in range(beginning+1, end -1):
# apply completion rules
for each grammar rule X -> Y Z such that
* Y is in chart[beginning][k]
* Z is in chart[k][end]
score_left <- chart[beginning][k][Y]
score_right <- chart[k][end][Z]
insert(X, beginning, end, log p(X -> Y Z) + score_left + score_right)
if there is S (axiom) in chart[0][n], then parsing is successful
the score (log probability) of the best tree is chart[0][n][S]
the best tree can be recovered recursively by following pointers from backtrack[0][n][S]
else:
parsing failure, the sentence does not belong to the language
generated by the grammar
Some notes: