algorithmparsingnlpearley-parser

Practical Earley Parsing (Aycock & Horspool 2002): How to add back pointers?


I've already coded an Earley parser with back pointers but it doesn't handle nullable grammars very well. I've also implemented Aycock & Horspool 2002's solution which is to make PREDICT skip over the nonterminal token if it is nullable. Unfortunately this does not tell you exactly which particular path the token needs to take to get to epsilon.

My idea (pretty stupid) is:

For each nullable nonterminal, create a list of paths for that nonterminal to get to epsilon.

Every time you skip over a nullable nonterminal, add a back pointer called NULL.

When you're expanding the tree, every time you encounter NULL, you create a list of trees, one for each path in the list for that nullable nonterminal.

Finally, you go through the list of trees and get rid of duplicates.

I think this would significantly increase the time complexity of my algorithm but I can't think of a more efficient method to generate all the possible parse trees.

Can anyone suggest a more efficient method of implementing Aycock & Horspool 2002 to create parse trees?


Solution

  • Have you heard about Marpa?

    It's basically Earley + Aycock&Horspool + Leo + author's (Jeffrey Kegler's) improvements.

    You might be interested in Theory section of the author's blog and the author's paper about Marpa.

    Hope this helps.