luapeglpeg

Non-greedy search in lpeg without consuming the end match


This was spun off from the comments on this question.

As I understand, in the PEG grammar, it's possible to implement a non-greedy search by writing S <- E2 / E1 S (or S = pattern E2 if possible or pattern E1 and continued S).

However, I don't want to capture E2 in the final pattern - I want to capture up to E2. When trying to implement this in LPEG I've run into several issues, including 'Empty loop in rule' errors when building this into a grammar.

How would we implement the following search in a LPEG grammar: [tag] foo [/tag] where we want to capture the contents of the tag in a capture table ('foo' in the example), but we want to terminate before the ending tag? As I understand from the comments on the other question, this should be possible, but I can't find an example in LPEG.

Here's a snippet from the test grammar

local tag_start = P"[tag]"
local tag_end = P"[/tag]"

G = P{'Pandoc', 
  ...
  NotTag = #tag_end + P"1" * V"NotTag"^0;
  ...
  tag = tag_start * Ct(V"NotTag"^0) * tag_end;
}

Solution

  • It's me again. I think you need better understanding about LPeg captures. Table capture (lpeg.Ct) is a capture that gathers your captures in a table. As there's no simple captures (lpeg.C) specified in NotTag rule, the final capture would become an empty table {}.

    Once more, I recommend you start from lpeg.re because it's more intuitive.

    local re = require('lpeg.re')
    local inspect = require('inspect')
    
    local g = re.compile[=[--lpeg
      tag       <- tag_start {| {NotTag} |} tag_end
      NotTag    <- &tag_end / . NotTag
      
      tag_start <- '[tag]'
      tag_end   <- '[/tag]'
    ]=]
    
    print(inspect(g:match('[tag] foo [/tag]')))
    -- output: { " foo " }
    

    Additionally, S <- E2 / E1 S is not S <- E2 / E1 S*, these two are not equivalent.


    However, if I were to do the same task, I won't try to use a non-greedy match, as non-greedy matches are always slower than greedy match.

    tag <- tag_start {| {( !tag_end . (!'[' .)* )*} |} tag_end
    

    Combining not-predicate and greedy matching is enough.