grammarcontext-free-grammarregular-languageformal-languageschomsky-hierarchy

recognise max type of formal language


Currently I´m trying to learn and understand formal languages and grammatics. I understand the Chomsky hierarchy but I found an task where I don´t know how they got the solution. The task is:

G=({S},{a,b},S,P)
P={S->epsilon, S->aS, S->Sb}
  1. What is the maximal type of this grammar?
  2. What is the maximal type of L(G)?

I know that the grammar is type 2, but in the answer was written that L(G) is type 3.

It seems that there is also a type 3 grammar describing this language, but how do you know which is the maximal type of a formal language? Is there some trick?


Solution

  • I don't think there's some easy, algorithmic way to always tell what class of language L(G) is; I mean to say, in general, I think there are cases that simply cannot be proven one way or the other. This from incompleteness/undecidability, given that unrestricted grammars can encode arithmetic. This is very hand-wavy.

    Heuristically, you can try to understand what your language is, you can transform the grammar to try to force the grammar to be simpler, etc., but these do not guarantee success.

    In this particular case, it's not too bad to figure out what the language is, recognize it is regular, and write down a grammar for it.

    S -> e
    S -> aS
    S -> Sb
    

    Any string derived from S can begin with any number of as by S -> aS, and can end with any number of bs by S -> Sb. We might hypothesize that the language is a*b*, and then prove it by showing (1) L(G) is a subset of ab and (2) ab is a subset of L(G).

    (1) The proof is by induction on the number of applications of S -> aS and S -> Sb. Base case: if neither of these is applied, only the string e is derivable. e is in a*b*, so the base case is confirmed. Induction hypothesis: assume any string derived by up to and including k applications of S -> aS and S -> Sb is in a*b*. Induction step: we show that any string derived using k+1 applications of these productions is also in a*b*. Any string derived in k+1 productions has one derived in k productions by skipping the k+1st production. This is because the productions keep the nonterminal sets the same. We already know this shorter string derived in k applications is in a*b*. We have two cases to consider:

    1. S -> aS: this adds one a to the front of the shorter string in a*b*, keeping it in a*b*.
    2. S -> Sb: this adds one b to the end of the shorter string in a*b*, keeping it in a*b*.

    Since both cases keep the string in a*b*, all strings derived in k+1 applications of the productions are in a*b* and, by induction, all strings generated by the grammar are.

    (2) Consider any string a^n b^m in a*b*. It is generated by the grammar as follows: n applications of A -> aS, m applications of B -> Sb, and one application of S -> e. Thus all strings in a*b* are generated by the grammar.