I'm currently studying compiler's and am on the topic of "Chomsky Hierarchy and the 4 languages." But it beats me as to what the practical purpose of all this is?
It'd be great if I could see real-life examples of the 4 grammars: Unrestricted, CSG, CFG, Regular Grammer come to play.
I found online that Chomsky hierarchy along with the 4 grammars is used to evaluate proposals within cognitive science but this goes way over my head. It'd be great if someone could break it down for me, thanks a lot!
There is no practical value. That's the whole point.
Let me try to break that down a bit. It's useful to remember that Chomsky is a linguist --someone who studies human languages-- and that he was writing in the late 1950s when computational theory was not as well-developed as it is today. (To put it mildly.) His goal was to find a mathematical model which could provide some insights into the mechanisms by which human beings generate and understand sentences, and he took as his starting point a particular simple model of sentence generation.
In this model, a grammar is a function F
, which transforms elements from an arbitrary sequence of symbols from some alphabet onto another sequence of symbols from the same alphabet. F
is defined by a finite set of pairs (called productions) α → β
. We then say that F(ω) = F(ζ) if the definition of F
contains some pair α → β
such that α
is a substring of ω
and ζ
is the result of substituting a single instance of α
in ω
with β
.
That's not very interesting in and of itself; we make that into a full language by starting with some designated starting sequence, normally represented as the single symbol S
, and repeatedly apply F
as many times as is necessary. (In all interesting grammars, the set so constructed is infinite, so it cannot actually be constructed. But we can imagine proceeding from the starting point until we find the sentence we wanted to generate.)
The problem with this model is that it can be used to describe an arbitrary Turing Machine. Or, if you like, an arbitrary computer program, although the equivalence is easier to see with a Turing Machine. In other words, it is at least theoretically possible to construct a finite grammar which will recognise strings consisting of the description of a Turing Machine (i.e., a program written in some programming language) followed by an input and an output only if the Turing Machine applied to the input would produce the output. In other words, there exists (in the mathematical sense) a grammar of this form which is computationally equivalent to a general purpose computer.
Unfortunately, that's not actually very useful if our goal is to understand sentences, because there is actually no algorithm for running computers backwards. The best we can do as a general solution is to enumerate all possible inputs and run the program on each of them until we find the output we hoped for. And that doesn't actually work because there is no limit to the amount of time the program might take to produce an output and no way to even know if the program will eventually come to an end. (This is called the "halting problem".) So we might get stuck on some possible input, and we'll never know if some other input might have produced the desired output.
As a result, we cannot tell whether the provided input was "grammatical", that is, whether it conformed to the grammar provided. And that's not just the case with the particular grammar we built to emulate Turing Machines. It means that we have no confidence that we can recognise sentences from any arbitrary grammar, and even if we stumble upon an answer, we have no way to limit how much time it might take to get there.
Clearly, this is not how human beings understand each other. So if it is to serve a practical purpose, we must restrict the possible grammars in some way to make them computationally feasible.
On the other end of the spectrum, a lot was known about finite-state machines. A finite-state machine is a Turing Machine without a tape; that is to say, it is simply a finite collection of states. In each state, the machine reads a single input symbol and uses it to decide what the next state will be. It turns out that finite-state machines can be modelled using a grammar (as above) restricted to very simple productions, each of which is either of the form A → a B
or A → a
, where a
is a symbol from the "terminal alphabet" (that is, a word) and A
and B
are single grammatical symbols. These grammars are called "regular grammars" and they are computationally equivalent to what mathematicians call "regular expressions" (which are a small subset of what is recognised by "regex" libraries, but that's a whole other discussion).
Regular grammars are easy to parse. All that is needed to is trace through the state machine, so it can be done without backtracking in time proportional to the length of the input. But regular grammars are far too weak to be able to represent human language, or even most computer languages. As a simple example, algebraic expressions with parentheses cannot be recognised with a regular grammar (or with a finite-state machine) because there is no way to count the parenthesis depth; the finite-state machine has no memory at all (other than knowing which state it is in, and there are only a finite number of states).
So unrestricted grammars are too powerful to parse and regular grammars are too weak to be useful. (Useful for complex parsing problems, that is. There are certain applications for regular expressions, but parsing complete computer programs is not one of them.)
The next step, then, was to try to find a restriction on grammars which was still powerful enough to represent human language without being so powerful that parsing became impossible.
That, finally, is the origin of Chomsky's hierarchy. Between the two extremes described above (type 0 and type 3 grammars), Chomsky proposed two possible intermediate restrictions -- type 1 and type 2 grammars -- and proved a number of important properties about each of them.
While this work turned out to be fundamental in the development of formal language theory, it cannot really be said to have answered the question Chomsky started with. Type 2 grammars -- context-free grammars -- are indeed computationally tractable; they can be parsed with simple algorithms in polynomial time, and can represent a large number of useful languages. But they are still too weak to represent human language. In particular, context-free grammars cannot represent a language as simple as "all strings which contain two instances of the same substring". Type 1 grammars -- context-sensitive grammars -- can probably represent any useful language, and are not quite as unruly as unrestricted grammars, but they are still too powerful to parse. (Since the derivation steps in a context-sensitive grammar never get shorter, it is possible to enumerate all possible derivations from a starting point in order by length, which means that you can decide whether a sentence is generated by the grammar without running into the halting problem. But that's as good as it gets; that procedure takes exponential time and is not remotely feasible for non-trivial inputs.)
In the six decades since Chomsky published his seminal papers, a lot of work has been done to try to find useful intermediate restrictions between type 1 and type 2 grammars. And there has been a lot of useful study into algorithms for parsing context-free languages, which is of enormous utility in building compilers. All of this builds on the crucial work done by Chomsky and the other computational theorists whose work he built on -- Markov, Turing, Church and Kleene, just to name a few worthy of study. But Chomsky's original project remains unsolved.
So if your goal is to build a simple parser for a programming language, the Chomsky hierarchy is probably just an interesting footnote. But if you are interested in the academic study of formal language theory, there are still lots of interesting unsolved problems to work on.