complexity-theorycontext-free-grammarcomputation-theorycontext-free-languagechomsky-normal-form

Chomsky Normal Form Conversion Algorithm


Why do we add a new start state S0 -> S when we want to convert a grammar to Chomsky normal form? What goes wrong if we do not do that?

At first I thought it's because of epsilon rules. But we do not remove an epsilon rule from start variable. So, what is benefit of adding S0 -> S?

Thanks


Solution

  • Depending on whether the empty string is in the language you might have the rule $S --> \epsilon$ (or $S_0 --> \epsilon$). This could delete an arbitrary number of symbols $S$ if these could appear on the right hand sides of rules. Because we do not want the start symbol to appear again, we introduce a new one.

    This way we get exactly one more symbol per application of a rule A -> BC.