grammarchomsky-hierarchy

How can one define a language which does not fit in the Chomsky Hierarchy?


I'm asking this question because I've stumbled across the accepted answer of Chomsky Language Types

This quote is referring to Type-0 Grammars:

This means that if you have a language that is more expressive than this type (e.g. English), you cannot write an algorithm that can list each an every (and only these) words of the language

As far as I know:

Hence:

Therefore my problem:


Solution

  • A language is a possibly-infinite set of finite words written with some finite alphabet. Since the alphabet is finite and the length of each word is finite, the words of any language are enumerable, in the sense that there exists an enumeration. In other words, the size of any language is at most countably infinite.

    However, since any subset of the Kleene closure of the alphabet is a language, the number of languages is not countably infinite. Hence, there is no enumeration of languages.

    The Chomsky hierarchy is based on a formalism which can be expressed as a finite sentence with a finite alphabet (the same alphabet as the language being described, plus a couple of extra symbols). [Note 1] So the number of possible Type 0 grammars is countably infinite, and there cannot be a correspondence between the set of grammars and the set of languages.

    However. The existence of languages (i.e. sets) for which no generative grammar exists does not necessarily mean that there is some other way of describing these languages which is "more expressive" than generative grammars. Any description which can be written as a finite string using a finite alphabet can only describe a countable infinity of sets. Whether or not it is the same countable infinity will depend on the formalisms, and in general there will be no algorithm which can demonstrate homomorphism. But some equivalences are known (such as the equivalence with Turing machines, which is a particularly interesting equivalence).

    So, we have an interesting little conundrum, which is (of course) related to Gödel's Incompleteness Theorems. That is, there are more languages than ways of describing a language, no matter what system we use to describe a language. So the question "How do we describe a language for which no description is available?" does not have a good answer (and if we answer it, by calling some set "Sue", then there will still be an uncountable infinitude of possible sets for which no name exists).

    While all this foraging into infinitudes is interesting, it has a few issues:

    1. It has very little (if anything) to do with programming, so it's questionable whether it's on topic for StackOverflow.

    2. Kurt Gödel and Georg Cantor, the two mathematicians responsible for most of the concepts in this answer, both suffered from severe depression. Just saying.


    Notes

    1. Although at first glance it might appear that the alphabet for a Type 0 grammar might be arbitrarily larger than the alphabet of the language being described, that is not actually the case. The grammar's alphabet consists of the target alphabet plus a finite set of non-terminals plus an → symbol; the non-terminals can be written using numbers in any convenient base, say binary. So only three additional symbols are required (and you could reduce that to two by arbitrarily designating one of the non-terminal numbers to be the arrow). (It might seem like you need a third symbol to delimit the names of non-terminals, but you can use a fibonacci encoding to produce codes which always start with a 1 and never include two 1s, so that you can use an extra 1 at the beginning to unambiguously mark the start of the symbol.)