pythonpython-3.xpegarpeggio

Parsing single line and multi line comments with Arpeggio


I'm trying to use Arpeggio to parse file containing single line and multi line comments.

Arpeggio's documentation suggests to have a look at their "simple" example to see how to deal with them (see documentation and linked code). The example indeed includes the following definition:

def comment():          return [_(r"//.*"), _(r"/\*.*\*/")]

which is used by the parser as follows:

parser = ParserPython(simpleLanguage, comment, debug=debug)

Unfortunately, however, their example doesn't contain any comment so it's not really possible to see how it works. If I add the following dummy comments to the example:

/*
This is a multi-line comment.
*/
// This is a single-line comment.
function fak(n) {
...

then the following exception is raised:

arpeggio.NoMatch: Expected '//.*' or keyword at position (1, 1) => '*/* This is'.

which seems to suggest the example file doesn't match the comment rule nor keyword function that is the first token that the production of simpleLanguage allows.

Does anyone know how we are supposed to deal with comments?

Please find below a MRE if it helps debugging the problem:

from __future__ import unicode_literals

import os

from arpeggio import *
from arpeggio import RegExMatch as _


def comment():  return [_(r"//.*"), _(r"/\*.*\*/")]
def document(): return Kwd("hello"), _(r"[a-z]+"), '!', EOF


def main(filename, debug=False):
    current_dir = os.path.dirname(__file__)
    content = open(os.path.join(current_dir, filename), "r").read()
    parser = ParserPython(document, comment, debug=debug)
    parse_tree = parser.parse(content)


if __name__ == "__main__":
    main('simple.ex', debug=True)

and the content of the file to parse:

/*
This is a multi-line comment.
*/
// This is a single line comment.
hello world!

Solution

  • Your version of the comment() method does not handle line breaks very well. It might work if you try to adjust it like this, to include both whitespace and non-whitespace characters:

    def comment():  return [_(r"//.*"), _(r"/\*[\s\S]*?\*/")]