language-designcontext-free-grammarmodularityparse-treecontext-sensitive-grammar

How to separate out the context-free part of a language from the context-sensitive part?


I read this fantastic post on the comp.theory list:

http://coding.derkeiler.com/Archive/General/comp.theory/2004-03/0189.html

The poster makes the point that most programming languages define a context-free core, and then have additional algorithms which run on the parse tree to filter out constructs that are illegal in the language:

This separates out the context-free part of the language from the context-sensitive part -- which is generally regarded as good practice (a kind of modular "programming" discipline for language design).

Can you provide a "Hello World" example to illustrate this technique? That is, provde a simple context-sensitive language, identify the context-free core, and then sketch out how to parse an input using the context-free core followed by filtering out illegal constructs in the parse tree.

Can you refer me to any articles or books that discuss this technique?


Solution

  • One of the simplest languages that are not context-free is one where the words are of the type anbncn (a, b, and c repeated the same number of times, that is, abc, aabbcc, aaabbbccc, ...).

    You can parse it using a grammar for the context-free language {anbncm}, where the number of c's is not restricted. Once you have the parse tree, you check using a separate algorithm that the number of repetitions of c is equal to the number of repetitions of a and b.