theorycontext-free-grammarcomputation-theorychomsky-normal-form

Construct a context free grammar for a language in Chomsky Normal Form


I'm trying to construct a CFG in Chomsky Normal Form with as few productions as possible that accepts the language containing the only string a^21.

I understand that I could just convert

S -> AAAAAAAAAAAAAAAAAAAAA A -> a

but is there any other way to shorten that language then convert it into Chomsky Normal Form?


Solution

  • We can pretty easily show that we need at least six symbols to hope to generate a CFG in CNF for this language by recognizing we can at best double the generated string length with each production and we must start with 2^0:

    A_21 := ...
    A_16 := A_16 A_16
    A_8  := A_4  A_4
    A_4  := A_2  A_2
    A_2  := A_1  A_1
    A_1  := a
    

    We can then show there is no grammar in CNF with six productions that generates our target language. We start the argument by building the grammar from the bottom up.

    1. We must have A_1 := a to get any strings.
    2. We must have A_2 := A_1 A_1 to get any string with length greater than 1.
    3. We can now generate either A_3 or skip that and generate A_4, or both. We consider each of these cases below.

    Case 1: A_3

    1. We add A_3 := A_2 A_1.
    2. We already have 3 productions, and know we need one of the form A_21 := X Y. So we can add up to two more. Even if we add the biggest productions that are now possible - A_6 and A_12 - we can't produce A_21 as required (we can produce at most A_18 := A_6 A_12. So adding A_3 can't get us a grammar that generates our language in six productions.

    Case 2: A_4

    1. We add A_4 := A_2 A_2.
    2. We already have 3 productions, and know we need one of the form A_21 := X Y. So we can add up to two more. Our options currently are A_5, A_6 and A_8. A_5 and A_6 will fail for the same reason we stated for Case 1 above. A_8, however, does not fail by that reasoning, so we add A_8 := A_4 A_4.
    3. We now have only one production and need it to be either A_20, A_19, A_17 or A_13. We cannot generate any of these using our existing productions.

    We have thus ruled out a grammar with 6 productions. If you try to find a grammar with 7 productions using the above reasoning, you will find several. Since we know there are grammars in CNF with 7 productions and none with 6, you're done. Here are a few of the 7-production grammars:

    S    := A_18 A_3
    A_18 := A_9  A_9
    A_9  := A_6  A_3
    A_6  := A_3  A_3
    A_3  := A_2  A_1
    A_2  := A_1  A_1
    A_1  := a
    
    S    := A_17 A_4
    A_17 := A_9  A_8
    A_9  := A_8  A_1
    A_8  := A_4  A_4
    A_4  := A_2  A_2
    A_2  := A_1  A_1
    A_1  := a
    
    S    := A_16 A_5
    A_16 := A_8  A_8
    A_8  := A_4  A_4
    A_5  := A_4  A_1
    A_4  := A_2  A_2
    A_2  := A_1  A_1
    A_1  := a
    
    S    := A_15 A_6
    A_15 := A_9  A_6
    A_9  := A_6  A_3
    A_6  := A_3  A_3
    A_3  := A_2  A_1
    A_2  := A_1  A_1
    A_1  := a
    
    S    := A_14 A_7
    A_14 := A_7  A_7
    A_7  := A_4  A_3
    A_4  := A_3  A_1
    A_3  := A_2  A_1
    A_2  := A_1  A_1
    A_1  := a
    
    S    := A_13 A_8
    A_13 := A_8  A_5
    A_8  := A_5  A_3
    A_5  := A_3  A_2
    A_3  := A_2  A_1
    A_2  := A_1  A_1
    A_1  := a
    
    S    := A_12 A_9
    A_12 := A_9  A_3
    A_9  := A_6  A_3
    A_6  := A_3  A_3
    A_3  := A_2  A_1
    A_2  := A_1  A_1
    A_1  := a
    
    S    := A_11 A_10
    A_11 := A_10 A_1
    A_10 := A_8  A_2
    A_8  := A_4  A_4
    A_4  := A_2  A_2
    A_2  := A_1  A_1
    A_1  := a
    

    And there are lots more. The hard part was showing there weren't any with 6 productions.