pythonsyntaxcompiler-constructiondefinitionasdl

Zephyr ASDL (Abstract Syntax Description Language)


Question:

What is the Zephyr ASDL and how does it relate to other compiler technologies like lexers and parser generators?

(I would appreciate it if you were reasonably complete, but point to other references online when it gets rather technical, because most of what I know about compilers come from playing with yacc and flex, writing a simple maximal munch lexer in C, and looking up and reading stuff online)

Question Background:

I've been reading http://docs.python.org/devguide/compiler.html and I came across the following line:

The specification of the AST nodes is specified using the Zephyr Abstract Syntax Definition Language (ASDL).

I followed the citation at the bottom to find: http://www.cs.princeton.edu/research/techreps/TR-554-97.

My first reading through the article has been rather tumultuous, and I was hoping I could first get a better understanding of what the purpose of ASDL was (in context of the compilation process), before trying again.


Solution

  • Lexer and Parser generators accept descriptions of lexemes and grammars, and generate code that implement the corresponding artifact. Lex requires a regular expression to describe tokens. Parser generators take various kinds of extended BNF notations.

    The paper you reference is pretty clear IMHO: ASDL is a little language for describing abstractly a set of tree node (their types and signatures). Using this language, one can write (and the paper's authors did so) a tool that converts these descriptions into the set of record types that you'd need to implement trees to be used with a parser. So ADSL is kind of like Regexes and BNF, in that its purpose is to be fed to a code generator that produces a part of a compiler.

    An expansive view is that compilers are a pretty well-understood technology, and that one ought to be able to generate them from descriptions of various pieces. Regex/BNF/ADSL are the essential keys to the parsing phase.

    You'd ideally like description languages for target instruction sets, flow analyses, translations (you mentioned maximal munch) from the abstract trees to the target instruction set, and a way to describe optimizations. Then with corresponding tools for each piece, you could build the entire compiler from "specifications". There's actually been a lot of work in this area; people have done all of these separately and together. Unsurprisingly some of it comes from the "Zephyr" project formerly out of Princeton (seems the Zephyr web site there is now dead), whose goal was to do just this kind of thing.

    Anyway try looking under Google Scholar for "compiler generator".