I'm currently trying to familiarize myself with packrat parsing. So I've read the PDF paper from 2002 linked here and in section 2.3 it describes packrat caching as a preliminary process (which occurs before the actual parsing) in which a full caching table is pre-constructed by reading the input from right to left. Only then, the actual linear parsing from left to right can start.
But in every PEG parser implementation I found, the "cache" option is usually a caching process that occurs during the actual left to right parsing. For example here.
Is there any difference between both approaches? Thank you.
I recently worked on similar research, met the exact same confusion, and resolved it. Regardless if you are still working on this topic, here's my answer.
Your understanding is correct:
But there's just one approach, not two. Let's use one simple example Parsing Expression Grammar (PEG) without left-recursion: E -> Num + E | Num
(Note that, a left-recursion example requires another long explanation, you can refer CPython's implementation for details)
The Syntax Directed Translation (SDT) will be something like:
E -> a=Num + b=E { a + b }
E -> Num { Num }
And we can write a parse_E
function in below:
def parse_E(idx):
if idx in cache['parse_E']:
return cache['parse_E'][idx]
lval, nidx = parse_Char(idx)
if nidx < len(self.tokens):
operator, nnidx = parse_Char(nidx)
if operator == '+':
# E -> Num + E
rval, nnnidx = parse_E(nnidx)
cache['parse_E'][idx] = lval + rval, nnnidx
return cache['parse_E'][idx]
# E -> Num
cache['parse_E'][idx] = lval, nidx
return cache['parse_E'][idx]
According to Byran Ford's paper, the parser needs to scan the input string from left to right and construct the cache in any position:
for idx in len(input_string):
parse_E(idx)
parse_Char(idx)
So, let's check the cache construction under the hood, initially, we have an empty cache and input string:
cache: {'parse_E': {}, 'parse_Char': {}}
input string: `2 + 3 + 4`
The function call happens in the following order when idx=0
. Clearly, we construct the cache from right to left at position 0 (not even to mention idx=1
or above).
parse_Char(Y)
happens earlier than parse_Char(X)
(Y > X)parse_Char(X)
must happens earlier than parse_E(X)
parse_E(0) --- (E -> Num + E) (pending)
-> parse_Char(0) --- 2 (pending)
-> parse_Char(1) --- + (pending)
-> parse_E(2) --- E (E -> Num + E) (pending)
-> parse_Char(2) --- 3 (pending)
-> parse_Char(3) --- + (pending)
-> parse_E(4) --- E (E -> Num) (pending)
-> parse_Char(4) --- 4 (acc)
# Only after parse_Char(4) succeed and fill into cache, parse_E(4) can be successful...and so on.
If you want to read the full Python example of Packrat parser implementation, you can check my repository. It contains a handmade Packrat parser and a CPython PEG generated Packrat parser based on a simple meta grammar.