I was reading about the CYK algorithm, and there is one part of pseudo-code I cannot understand. The whole pseudo-code is:
let the input be a string S consisting of n characters: a1 ... an.
let the grammar contain r nonterminal symbols R1 ... Rr.
This grammar contains the subset Rs which is the set of start symbols.
let P[n,n,r] be an array of booleans. Initialize all elements of P to false.
for each i = 1 to n
for each unit production Rj -> ai
set P[i,1,j] = true
for each i = 2 to n -- Length of span
for each j = 1 to n-i+1 -- Start of span
for each k = 1 to i-1 -- Partition of span
for each production RA -> RB RC
if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true
if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then
S is member of language
else
S is not member of language
These parts are which I am confused:
for each production RA -> RB RC
if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true
Would someone give some hints about these pseudocode?
The pseudocode
For each production RA → RB RC:
if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true
Should be interpreted in the following way. Suppose that it's the case that P[j, k, B] is true. That means that the string formed from k characters starting at position j can derived from the nonterminal RB. If it's also the case that P[j + k, i - k, C] is true, then the string formed from the i - k characters starting at position j + k can be derived from nonterminal RC. Therefore, since RA → RB RC is a production, it's the case that the string formed from the i characters starting at position j can be derived from RA.
I think it might help to interpret that pseudocode as
For each production RA → RB RC:
if P[j,k,B] == true and P[j+k,i-k,C] == true, then set P[j,i,A] = true
Hope this helps!