lualpeg

How to do lookahead properly with LPeg


To match a string starting with dog, followed by cat(but not consuming cat), this works:

local lpeg = require 'lpeg'
local str1 = 'dogcat'
local patt1 = lpeg.C(lpeg.P('dog')) * #lpeg.P('cat')
print(lpeg.match(patt1, str1))

Output: dog

To match a string starting with dog, followed with any character sequences, then followed by cat(but not consuming it), like the regex lookahead (dog.+?)(?=cat), I tried this:

local str2 = 'dog and cat'
local patt2 = lpeg.C(lpeg.P("dog") * lpeg.P(1) ^ 1) * #lpeg.P("cat")
print(lpeg.match(patt2, str2))

My expected result is dog and, but it returns nil.

If I throws away the lookahead part (i.e, using the pattern lpeg.C(lpeg.P("dog") * lpeg.P(1) ^ 1)), it can match the whole string successfully. This means * lpeg.P(1) ^ 1 part matches any character sequence correctly, isn't it?

How to fix it?


Solution

  • You need to negate "cat" at each position in the lookahead that can match:

    local patt2 = lpeg.C(lpeg.P"dog" * (lpeg.P(1)-lpeg.P"cat") ^ 1) * #lpeg.P"cat"
    

    I think it's appropriate to plug the debugger I've been working on (pegdebug), as it helps in cases like this. Here is the output it generates for the original lpeg-expression:

    +   Exp 1   "d"
     +  Dog 1   "d"
     =  Dog 1-3 "dog"
     +  Separator   4   " "
     =  Separator   4-11    " and cat"
     +  Cat 12  ""
     -  Cat 12
    -   Exp 1
    

    You can see that the Separator expression "eats" all the characters, including "cat" and there is nothing left to match against P"cat".

    The output for the modified expression looks like this:

    +   Exp 1   "d"
     +  Dog 1   "d"
     =  Dog 1-3 "dog"
     +  Separator   4   " "
     =  Separator   4-8 " and "
     +  Cat 9   "c"
     =  Cat 9-11    "cat"
    =   Exp 1-8 "dog and "
    /   Dog 1   0   
    /   Separator   4   0   
    /   Exp 1   1   "dog and "
    

    Here is the full script:

    require 'lpeg'
    local peg = require 'pegdebug'
    local str2 = 'dog and cat'
    local patt2 = lpeg.P(peg.trace { "Exp";
      Exp = lpeg.C(lpeg.V"Dog" * lpeg.V"Separator") * #lpeg.V"Cat";
      Cat = lpeg.P("cat");
      Dog = lpeg.P("dog");
      Separator = (lpeg.P(1) - lpeg.P("cat"))^1;
    })
    print(lpeg.match(patt2, str2))