javaalgorithmcontext-free-grammarcyk

CYK algorithm pseudocode confusion


so I've been reading about the CYK algorithm in wikipedia and in many powerpoints/pdfs.

In wikipedia there is a part where I'm not 100% what it's trying to say. Can you guys break it down for me?

let the input be a string S consisting of n characters: a1 ... an.
let the grammar contain r nonterminal symbols R1 ... Rr.
This grammar contains the subset Rs which is the set of start symbols.
let P[n,n,r] be an array of booleans. Initialize all elements of P to false.
for each i = 1 to n
  for each unit production Rj -> ai
     set P[i,1,j] = true

for each i = 2 to n -- Length of span
 for each j = 1 to n-i+1 -- Start of span
  for each k = 1 to i-1 -- Partition of span
   for each production RA -> RB RC
    if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true

if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then S is member of language
  else
S is not member of language

The part that really confuses me is "if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then S is member of language else S is not member of language"

Is it saying for any n and x that exists if it's true, then it is a member? or is it saying for the string length n and x if it's true then it's a member? or something completely different?

Also what exactly is X?

Edit:

Thanks guys, I've definitely learned how to do it. wish I could get both your answers as the selected answer.


Solution

  • When you are performing the CYK algorithm, you are basically filling the bottom triangular matrix from its bottom to the uppermost element. Whenever some element (j,i,x) where j is the column index, i is the row index and x is the non-terminal symbol is true, it means that you are able to generate the subsequence j to j+i-1 of your word from the symbol Rx.

    Your goal is to generate the whole word from one of the starting symbols. The element corresponding to the possibility of generating the whole word is (1,n,x) - the leftmost and uppermost element of the matrix, where the x is the index of your non-terminal symbol. As you have to start in one of the start symbols, you are looking just for a subset of all of your non-terminals - the subset of s. If you manage to generate the whole word from one of the start symbols, you simply state that the word is part of the language. If no such start symbol exists, you aren't able to generate that word and the word isn't part of the language described by the grammar.