mathcompiler-constructioncomputer-science

In computer science, what is NOT a formal language?


On the wikipedia https://www.wikiwand.com/en/Formal_language, I found the definition of a formal language:

In mathematics, computer science, and linguistics, a formal language is a set of strings of symbols that may be constrained by rules that are specific to it.

This looks quite abstract to me. And I can't image any language which doesn't fit to this definition. Does anyone have ideas about what a informal language looks like and how it doesn't fit the definition?


Solution

  • Let me get to your question first. A good non-example of a formal language are the natural languages. English and Slovene are examples. So are Tagalog and Tarifit Berber. Unfortunately linguists don't seem to have a definition of natural language that all would agree upon.

    Noam Chomsky famously tried to model natural language using context-free gammars in his 1956 paper Three Models for the Description of Language. He invented (or discovered, if you prefer) them in that paper; although he didn't call them that; while they weren't useful to model the English language, they revolutionized computer science.

    Formally, a formal language is just a set of strings over a finite alphabet. That's it.

    Examples include all valid C programs, all valid HTML files, all valid XML files, all strings of "balanced" parentheses (e.g. (), ()(), ((()))()(()), ...), the set (codes under some encoding) of all deterministic Turing machines that always halt, the set of all simple graphs that can be colored with k-colors (actually their codes under some encoding), the set of all binary strings that end and begin with a 1, etc.

    Some are easy to recognize using regex (or, equivalently, DFA); some are impossible to recognize using DFA's, but can be recognized using PDA (or, equivalently, can be described with a context-free grammar); other's don't admit such a description, but can be recognized by a Turing machine; some aren't recognizable even by a Turing machine (called uncomputable).

    This is why the definition is so useful. Many things we encounter in CS evey day can be cast in terms of formal languages.

    For a good introduction to the subject, I highly recommend the superb book Introduction to Automata Theory, Languages, and Computation by Hopcroft et al.