I'm trying to construct a CFG in Chomsky Normal Form with as few productions as possible that accepts the language containing the only string a^21.
I understand that I could just convert
S -> AAAAAAAAAAAAAAAAAAAAA A -> a
but is there any other way to shorten that language then convert it into Chomsky Normal Form?
We can pretty easily show that we need at least six symbols to hope to generate a CFG in CNF for this language by recognizing we can at best double the generated string length with each production and we must start with 2^0:
A_21 := ...
A_16 := A_16 A_16
A_8 := A_4 A_4
A_4 := A_2 A_2
A_2 := A_1 A_1
A_1 := a
We can then show there is no grammar in CNF with six productions that generates our target language. We start the argument by building the grammar from the bottom up.
A_1 := a
to get any strings.A_2 := A_1 A_1
to get any string with length greater than 1.A_3
or skip that and generate A_4
, or both. We consider each of these cases below.Case 1: A_3
A_3 := A_2 A_1
.A_21 := X Y
. So we can add up to two more. Even if we add the biggest productions that are now possible - A_6
and A_12
- we can't produce A_21
as required (we can produce at most A_18 := A_6 A_12
. So adding A_3
can't get us a grammar that generates our language in six productions.Case 2: A_4
A_4 := A_2 A_2
.A_21 := X Y
. So we can add up to two more. Our options currently are A_5
, A_6
and A_8
. A_5
and A_6
will fail for the same reason we stated for Case 1 above. A_8
, however, does not fail by that reasoning, so we add A_8 := A_4 A_4
.A_20, A_19, A_17
or A_13
. We cannot generate any of these using our existing productions.We have thus ruled out a grammar with 6 productions. If you try to find a grammar with 7 productions using the above reasoning, you will find several. Since we know there are grammars in CNF with 7 productions and none with 6, you're done. Here are a few of the 7-production grammars:
S := A_18 A_3
A_18 := A_9 A_9
A_9 := A_6 A_3
A_6 := A_3 A_3
A_3 := A_2 A_1
A_2 := A_1 A_1
A_1 := a
S := A_17 A_4
A_17 := A_9 A_8
A_9 := A_8 A_1
A_8 := A_4 A_4
A_4 := A_2 A_2
A_2 := A_1 A_1
A_1 := a
S := A_16 A_5
A_16 := A_8 A_8
A_8 := A_4 A_4
A_5 := A_4 A_1
A_4 := A_2 A_2
A_2 := A_1 A_1
A_1 := a
S := A_15 A_6
A_15 := A_9 A_6
A_9 := A_6 A_3
A_6 := A_3 A_3
A_3 := A_2 A_1
A_2 := A_1 A_1
A_1 := a
S := A_14 A_7
A_14 := A_7 A_7
A_7 := A_4 A_3
A_4 := A_3 A_1
A_3 := A_2 A_1
A_2 := A_1 A_1
A_1 := a
S := A_13 A_8
A_13 := A_8 A_5
A_8 := A_5 A_3
A_5 := A_3 A_2
A_3 := A_2 A_1
A_2 := A_1 A_1
A_1 := a
S := A_12 A_9
A_12 := A_9 A_3
A_9 := A_6 A_3
A_6 := A_3 A_3
A_3 := A_2 A_1
A_2 := A_1 A_1
A_1 := a
S := A_11 A_10
A_11 := A_10 A_1
A_10 := A_8 A_2
A_8 := A_4 A_4
A_4 := A_2 A_2
A_2 := A_1 A_1
A_1 := a
And there are lots more. The hard part was showing there weren't any with 6 productions.