
lpeg parse first-order logic term

As the title says, I'm trying to parse for example

term(A, b, c(d, "e", 7))

in a Lua table like

{term, {A, b, {c, {d, "e", 7}}}}

This is the grammar I built:

local pattern = re.compile[=[
  term      <- variable / function
  argument  <- variable / lowercase /number / string
  function  <- {|lowercase {|(open argument (separator (argument / function))* close)?|}|}
  variable  <- uppercase
  lowercase <- {[a-z][A-Za-z0-9]*}
  uppercase <- {[A-Z][A-Za-z0-9]*}
  string    <- '"' {~ [^"]* ~} '"'
  number    <- {[0-9]+}
  close     <- blank ")"
  open      <- "(" blank
  separator <- blank "," blank
  blank     <- " "*

I'm having the following problems:


  • In your grammar you have:

    argument  <- variable / lowercase /number / string
    function  <- {|lowercase {|(open argument (separator (argument / function))* close)?|}|}

    Keep in mind that lpeg tries to match the patterns/predicates in the rule in the order you have it. Once it finds a match lpeg won't consider further possible matches in that grammar rule even if there could be a "better" match later on.

    Here it fails to match nested function calls because it sees that c can match

    `argument  <- variable`

    Since your variable non-terminal is listed before function, lpeg doesn't consider the latter and so it stops parsing the tokens that comes after.

    As an experiment, I've modified your grammar slightly and added some table&named captures for most of the non-terminals you're interested in.

    local pattern = re.compile
      term      <- {| {:type: '' -> "term" :} term_t |}
      term_t    <- func / var
      func      <- {| {:type: '' -> "func":} {:name: func_id:} "(" arg(separator arg)* ")" |}
      func_id   <- lower / upper
      arg       <- number / string / term_t
      var       <- {| {:type: '' -> "var" :} {:name: lower / upper:} |}
      string    <- '"' {~ [^"]* ~} '"'
      lower <- {%l%w*}
      upper <- {%u%w*}
      number    <- {%d+}
      separator <- blank "," blank
      blank     <- " "*

    With a quick pattern test:

    local test = [[fun(A, b, c(d(42), "e", f, 7))]]
    dump( pattern:match(test) )

    Which gives the following output on my machine:

          type = "var",
          name = "A"
          type = "var",
          name = "b"
            type = "func",
            name = "d"
            type = "var",
            name = "f"
          type = "func",
          name = "c"
        type = "func",
        name = "fun"
      type = "term"

    Looking carefully at the above, you'll notice that the function arguments appear in the index part of the table in the order that they were passed in. OTOH the type and name can appear in any order since it's in the associative part of the table. You can wrap those "attributes" in another table and put that inner attribute table in the index part of the outer table.

    Edit: Here's a revised grammar to make the parse a bit more uniform. I've removed the term capture to help prune some unnecessary branches.

    local pattern2 = re.compile
      term      <- term_t
      term_t    <- func / var
      func      <- {| {:type: '' -> "func":} {:name: func_id:} "(" args? ")" |}
      func_id   <- lower / upper
      arg       <- number / string / term_t
      args      <- arg (separator args)?
      var       <- {| {:type: '' -> "var" :} {:name: lower / upper:} |}
      string    <- {| {:type: '' -> "string" :}'"' {:value: [^"]* :} '"' |}
      lower     <- {%l%w*}
      upper     <- {%u%w*}
      number    <- {| {:type: '' -> "number":} {:value: %d+:} |}
      separator <- blank "," blank
      blank     <- " "*

    Which yields the following:

        type = "var",
        name = "A"
        type = "var",
        name = "b"
            type = "number",
            value = "42"
          type = "func",
          name = "d"
          type = "string",
          value = "e"
          type = "var",
          name = "f"
          type = "number",
          value = "7"
        type = "func",
        name = "c"
      type = "func",
      name = "fun"