nlpgrammarcontext-free-grammarautomatachomsky-hierarchy

Chomsky hierarchy - examples with real languages


I'm trying to understand the four levels of the Chomsky hierarchy by using some real languages as models. He thought that all the natural languages can be generated through a Context-free Grammar, but Schieber contradicted this theory proving that languages such as Swiss German can only be generated through Context-sensitive grammar. Since Chomsky is from US, I guess that the American language is an example of Context-free grammar. My questions are:

  1. Are there languages which can be generated by regular grammars (type 3)?
  2. Since the Recursively enumerable grammars can generate all languages, why not using that? Are they too complicated and less linear?
  3. What the characteristic of Swiss German which make it impossible to be generated through Context-free grammars?

Solution

  • I don't think this is an appropriate question for StackOverflow, which is a site for programming questions. But I'll try to address it as best I can.

    I don't believe Chomsky was ever under the impression that natural languages could be described with a Type 2 grammar. It is not impossible for noun-verb agreement (singular/plural) to be represented in a Type 2 grammar, because the number of cases is finite, but the grammar is awkward. But there are more complicated features of natural language, generally involving specific rules about how word order can be rearranged, which cannot be captured in a simple grammar. It was Chomsky's hope that a second level of analysis -- "transformational grammars" -- could useful capture these rearrangement rules without making the grammar computationally intractable. That would require finding some systematization which fit between Type 1 and Type 2, because Type 1 grammars are not computationally tractable.

    Since we do, in fact, correctly parse our own languages, it stands to reason that there be some computational algorithm. But that line of reasoning might not actually be correct, because there is a limit to the complexity of a sentence which we can parse. Any finite language is regular (Type 3); only languages which have an unlimited number of potential sentences require more sophisticated grammars. So a large collection of finite patterns could suffice to understand natural language. These patterns might be a lot more sophisticated than regular expressions, but as long as each pattern only applies to a sentence of limited length, the pattern could be expressed mathematically as a regular expression. (The most obvious one is to just list all possible sentences as alternatives, which is a regular expression if the number of possible sentences is finite. But in many cases, that might be simplified into something more useful.)

    As I understand it, modern attempts to deal with natural language using so-called "deep learning" are essentially based on pattern recognition through neural networks, although I haven't studied the field deeply and I'm sure that there are many complications I'm skipping over in that simple description.

    Noam Chomsky is an American, but "American" is not a language (y si fuera, podría ser castellano, hablado por la mayoría de los residentes de las Americas). As far as I know, his first language is English, but he is not by any means unilingual, although I don't know how much Swiss German he speaks. Certainly, there have been criticisms over the years that his theories have an Indo-European bias. Certainly, I don't claim competence in Swiss German, despite having lived several years in Switzerland, but I did read Shieber's paper and some of the follow-ups and discussed them with colleagues who were native Swiss German speakers. (Opinions were divided.)

    The basic issue has to do with morphological agreement in lists. As I mentioned earlier, many languages (all Indo-European languages, as far as I know) insist that the form of the verb agrees with the form of the subject, so that a singular subject requires a singular verb and a plural subject requires a plural verb. [Note 1]

    In many languages, agreement is also required between adjectives and nouns, and this is not just agreement in number but also agreement in grammatical gender (if applicable). Also, many languages require agreement between the specific verb and the article or adjective of the object of the verb. [Note 2]

    Simple agreement can be handled by a context-free (Type 2) grammar, but there is a huge restriction. To put it simply, a context-free grammar can only deal with parenthetic constructions. This can work even if there is more than one type of parenthesis, so a context-free grammar can insist that an [ be matched with a ] and not a ). But the grammar must have this "inside-out" form: the matching symbols must be in the reverse order to the symbols being matched.

    One consequence of this is that there is a context-free grammar for palindromes -- sentences which read the same in both directions, which effectively means that they consist of a phrase followed by its reverse. But there is no context-free grammar for duplications: a language consisting of repeated phrases. In the palindrome, the matching words are in the reverse order to the matched words; in the duplicate, they are in the same order. Hence the difference.

    Agreement in natural languages mostly follows this pattern, and some of the exceptions can be dealt with by positing simple rules for reordering finite numbers of phrases -- Chomsky's transformational grammar. But Swiss German features at least one case where agreement is not parenthetic, but rather in the same order. [Note 3] This involves the feature of German in which many sentences are in the order Subject-Object-Verb, which can be extended to Subject Object Object Object... Verb Verb Verb... when the verbs have indirect objects. Shieber showed some examples in which object-verb agreement is ordered, even when there are intervening phrases.

    In the general case, such "cross-serial agreement" cannot be expressed in a context-free grammar. But there is a huge underlying assumption: that the length of the agreeing series be effectively unlimited. If, on the other hand, there are a finite number of patterns actually in common use, the "deep learning" model referred to above would certainly be able to handle it.

    (I want to say that I'm not endorsing deep learning here. In fact, the way "artificial intelligence" is "trained" involves the uses of trainers whose cultural biases may well not be sufficiently understood. This could easily lead to the same unfortunate consequences alluded to in my first footnode.)

    Notes

    1. This is not the case in many native American languages, as Whorf pointed out. In those languages, using a singular verb with a plural noun implies that the action was taken collectively, while using a plural verb would imply that the action was taken separately. Roughly transcribed to English, "The dogs run" would be about a bunch of dogs independently running in different directions, whereas "The dogs runs" would be about a single pack of dogs all running together. Some European "teachers" who imposed their own linguistic prejudices on native languages failed to correctly understand this distinction, and concluded that the native Americans must be too primitive to even speak their own language "correctly"; to "correct" this "deficiency", they attempted to eliminate the distinction from the language, in some cases with success.

    2. These rules, not present in English, are one of the reasons some English speakers are tortured by learning German. I speak from personal experience.

    3. Ordered agreement, as opposed to parenthetic agreement, is known as cross-serial dependency.