xmlluapattern-matchinglpeg

Parsing XML-type file with LPeg re module


I'm trying to learn LPeg's re module and it has been quite an interesting experience, specially since the official documentation is so nice.

However there are some topics that seem to be poorly explaned there. For example the named group capture construction: {:name: p :}.

Consider the following example, I don't understand why it does not match:

print(re.compile
  [[item <- ('<' {:tag: %w+!%w :} '>' item+ '</' =tag '>') / %w+!%w]]
  :match[[<person><name>James</name><address>Earth</address></person>]])

-- outputs nil

Can anyone help me understand what is going wrong here? I thought quite a bit about that, and it really seems like I'm missing something important.


Solution

  • This is a late answer but you can try following pattern

    result = re.compile[[
      item <- ({| %s* '<' {:tag: %w+ :} %s* '>' (item / %s* { (!(%s* '<') .)+ }) %s* '</' =tag '>' |})+
    ]]:match[[
    <person>
        <name>
        James
        </name>
        <address>Earth</address>
    </person>
    ]]
    

    which uses tables captures to parse XML w/ whitespace for elements texts stripped

    tag = "person"
    [1] = {
      tag = "name"
      [1] = "James"
    }
    [2] = {
      tag = "address"
      [1] = "Earth"
    }