textx

How to make a textX grammar recognize normal strings and special keywords


I feel like I am missing something very simple. I am trying to create a textX grammar that would make my parser to recognize normal text tokens vs special keywords. In the following grammar, I cannot make textX to recognize my [LINK ...] keyword represented by the SpecialKeyword rule because it gets absorbed by the more general NormalString rule.

The input I am getting is as follows:

['\n', 'Text part before [LINK: REQ-001] Text part after.', '\n', 'Text part before [LINK: REQ-002] Text part after.', '\n']

While I would like it to be:

['\n', 'Text part before ', My Link object with 'REQ-001', 'Text part after.', '\n', 'Text part before ', My Link object with 'REQ-002', 'Text part after.', '\n']

A related question is: how can I make the NormalString rule to support multiline strings?

from textx import metamodel_from_str

mm = metamodel_from_str('''
Text:
    parts+=TextPart;

TextPart[noskipws]:
  (NormalString | SpecialKeyword | '\n')
;

NormalString[noskipws]:
  !SpecialKeyword /(.*)?/ // this is too greedy
;

SpecialKeyword[noskipws]:
  Link // more keywords are coming later
;

Link[noskipws]:
  '[LINK: ' value = /[^\\]]*/ ']'
;
''')


textx_input = '''
Text part before [LINK: REQ-001] Text part after.
Text part before [LINK: REQ-002] Text part after.
'''


model = mm.model_from_str(textx_input, debug=False)

print(model.parts)

Solution

  • You were close. The solution is to match a single char after each negative assertion in NormalString and then repeat. Also, matching across multiple lines is achieved by (?ms) regex param.

    More can be read in the textX docs.

    Link rule is common which will result in Python object, so you need to extract actual keyword which should be a match rule resulting in a Python string.

    Here is the full solution:

    from textx import metamodel_from_str
    
    mm = metamodel_from_str('''
    Text:
        parts+=TextPart;
    
    TextPart[noskipws]:
      Link | NormalString
    ;
    
    NormalString[noskipws]:
      (!SpecialKeyword /(?ms)./)*
    ;
    
    SpecialKeyword:
      LinkKW // more keywords are coming later
    ;
    
    LinkKW: '[LINK: ';
    
    Link[noskipws]:
       LinkKW value = /[^\\]]*/ ']'
    ;
    ''')
    
    
    textx_input = '''
    Text part before [LINK: REQ-001] Text part after.
    Text part before [LINK: REQ-002] Text part after.
    '''
    
    
    model = mm.model_from_str(textx_input, debug=True)
    
    print(model.parts)