nlpgrammarcontext-free-grammarchomsky-normal-formcyk

Does CKY really require CNF?


I've read a number of places that the CYK/CKY algorithms require the grammar to be in Chomsky Normal Form (CNF), e.g.

The standard version of CYK operates only on context-free grammars given in Chomsky normal form (CNF) ~Wikipedia

However, I've also seen a number of examples of CKY algorithms where the grammar was not in CNF. A common example that Christopher Manning uses is "fish people fish tanks" (ref: PPT slide #19) which contains unary rules:

S -> NP VP [0.9]
S -> VP [0.1]
VP -> V NP [0.4]
Vp -> V [0.6]
...

I've also seen other examples demonstrating CKY that use three non-terminals in the RHS of the production (e.g. VP -> Verb NP NP reference). Why the discrepancy?


Solution

  • The runtime of CYK depends on the length of the longest production rule, since the algorithm considers all possible ways of decomposing a string into k parts for a production of length k. This means that the runtime per phase is O(nk), where k is the length of the longest production. Since there are O(n) phases, the runtime of CYK on a grammar with maximum production length k is O(nk+1).

    CYK will work correctly on grammars that aren't in CNF, but the runtime may not end up being cubic in the length of the string. The CNF requirement just forces k = 2 and therefore guarantees an O(n3) overall runtime.