pythonpython-3.xtextx

textx Grammar, Use Whitespace as Repetition Modifier


I'm trying to write a parser for input files used by Fire Dynamics Simulator using textx. For the most part, ignoring the whitespace is perfect as most parameters and values should be separated by commas. However, it is acceptable to ignore the comma at the end of a parameter and it appears that the program also will look past missing comma in the value list.

I've trimmed down the grammar definition for this question, but the code below shows that there are 2 records in data which start with & and end with /. Each record has two parameters. The first one has some string values while the second record has numeric values. This code works well.

import textx

def test_parse_mesh(model):
    expected = {
        'XB': [0.0, 5.0, 0.0, 4.0, 0.0, 3.0],
        'IJK': [50, 40, 30],
    }
    parsed = {
        param.name: param.value
        for param in model.records[1].parameters
    }
    assert expected == parsed

grammar = """
Model: records*=Record;
Record: /^&/ namelist=ID parameters*=Parameter[','] '/';
Parameter: name=ID '=' value+=Value[','];
Value: (NUMBER | STRING);
"""

data = """
&HEAD CHID='Some Value', TITLE='another string'/
&MESH IJK = 50,40,30,
      XB  = 0.0,5.0,0.0,4.0,0.0,3.0/
"""

mm = textx.metamodel_from_str(grammar)
model = mm.model_from_str(data)
test_parse_mesh(model)

However, I want to be able to modify the grammar to be able to parse the following version of data:

data = """
&HEAD CHID='Some Value', TITLE='another string'/
&MESH IJK = 50 40 30
      XB  = 0.0 5.0 0.0 4.0 0.0 3.0/
"""

I've looked at repetition modifiers and was able to get the following to work, but was unable to get it to work in the larger grammar definition. I also tried to alter the parser configuration, but was unable to get that to work properly.

grammar = """
Record[noskipws]: parameters*=Parameter[/,|\s+/];
Parameter[noskipws]: /\s*/ name=ID /\s*=\s*/ value+=NUMBER[/,|\s+/];
"""

data = """
IJK = 50 40 30 
XB = 0 5 0 4 0 3
"""

mm = textx.metamodel_from_str(grammar)
model = mm.model_from_str(data)

expected = {'XB': [0.0, 5.0, 0.0, 4.0, 0.0, 3.0], 'IJK': [50, 40, 30]}
assert {param.name: param.value for param in model.parameters} == expected

How can I modify the grammar at the top to correctly parse the second version of data where there are no commas? I'll accept an answer that just gets it to work with one parameter per line, but ideally would be able to handle the missing comma in both cases.


Solution

  • When I saw your post I though the solution should be a simple optional match for separator like:

    Model: records*=Record;
    Record: /^&/ namelist=ID parameters*=Parameter[/,?/] '/';
    Parameter: name=ID '=' value+=Value[/,?/];
    Value: NUMBER | STRING;
    

    But, it didn't work. After some debug output investigation I noticed that optional separator match ,? although successful on places where comma was absent would terminate repetition match. The bug was in Arpeggio. I just made a fix. It is available on the master branch so please install from GitHub for now. I will be releasing a new version of Arpeggio soon. With the provided bugfix the above grammar works correctly. You can even mix and match, use separator just on some places, like e.g.:

    &HEAD CHID='Some Value' TITLE='another string'/
    &MESH IJK = 50 40, 30,
          XB  = 0.0 5.0, 0.0 4.0 0.0 3.0/
    
    

    Edit [2021-04-25]: Arpeggio 1.10.2 is released with the fix mentioned in this answer.