parsingantlrsyntax-highlightingantlr4qscintilla

How to highlight QScintilla using ANTLR4?


I'm trying to learn ANTLR4 and I'm already having some issues with my first experiment.

The goal here is to learn how to use ANTLR to syntax highlight a QScintilla component. To practice a little bit I've decided I'd like to learn how to properly highlight *.ini files.

First things first, in order to run the mcve you'll need:

by running antlr ini.g4 -Dlanguage=Python3 -o ini

and run it, if everything went well you should get this outcome:

showcase

Here's my questions:

enter image description here

you can see on that screenshot the highlighting is different on variable assignments (variable=deeppink and values=yellowish) but I don't know how to achieve that, I've tried using this slightly modified grammar:

grammar ini;

start : section (option)*;
section : '[' STRING ']';
option : VARIABLE '=' VALUE;

COMMENT : ';'  ~[\r\n]*;
VARIABLE  : [a-zA-Z0-9]+;
VALUE  : [a-zA-Z0-9]+;
WS      : [ \t\n\r]+;

and then changing the styles to:

style = {
    "T__0": lst[3],
    "T__1": lst[3],
    "T__2": lst[3],
    "COMMENT": lst[2],
    "VARIABLE": lst[0],
    "VALUE": lst[1],
    "WS": lst[3],
}

but if you look at the lexing output you'll see there won't be distinction between VARIABLE and VALUES because order precedence in the ANTLR grammar. So my question is, how would you modify the grammar/snippet to achieve such visual appearance?


Solution

  • The problem is that the lexer needs to be context sensitive: everything on the left hand side of the = needs to be a variable, and to the right of it a value. You can do this by using ANTLR's lexical modes. You start off by classifying successive non-spaces as being a variable, and when encountering a =, you move into your value-mode. When inside the value-mode, you pop out of this mode whenever you encounter a line break.

    Note that lexical modes only work in a lexer grammar, not the combined grammar you now have. Also, for syntax highlighting, you probably only need the lexer.

    Here's a quick demo of how this could work (stick it in a file called IniLexer.g4):

    lexer grammar IniLexer;
    
    SECTION
     : '[' ~[\]]+ ']'
     ;
    
    COMMENT
     : ';' ~[\r\n]*
     ;
    
    ASSIGN
     : '=' -> pushMode(VALUE_MODE)
     ;
    
    KEY
     : ~[ \t\r\n]+
     ;
    
    SPACES
     : [ \t\r\n]+ -> skip
     ;
    
    UNRECOGNIZED
     : .
     ;
    
    mode VALUE_MODE;
    
      VALUE_MODE_SPACES
       : [ \t]+ -> skip
       ;
    
      VALUE
       : ~[ \t\r\n]+
       ;
    
      VALUE_MODE_COMMENT
       : ';' ~[\r\n]* -> type(COMMENT)
       ;
    
      VALUE_MODE_NL
       : [\r\n]+ -> skip, popMode
       ;
    

    If you now run the following script:

    source = """
    ; Comment outside
    
    [section s1]
    ; Comment inside
    a = 1
    b = 2
    
    [section s2]
    c = 3 ; Comment right side
    d = e
    """
    
    lexer = IniLexer(InputStream(source))
    stream = CommonTokenStream(lexer)
    stream.fill()
    
    for token in stream.tokens[:-1]:
        print("{0:<25} '{1}'".format(IniLexer.symbolicNames[token.type], token.text))
    

    you will see the following output:

    COMMENT                   '; Comment outside'
    SECTION                   '[section s1]'
    COMMENT                   '; Comment inside'
    KEY                       'a'
    ASSIGN                    '='
    VALUE                     '1'
    KEY                       'b'
    ASSIGN                    '='
    VALUE                     '2'
    SECTION                   '[section s2]'
    KEY                       'c'
    ASSIGN                    '='
    VALUE                     '3'
    COMMENT                   '; Comment right side'
    KEY                       'd'
    ASSIGN                    '='
    VALUE                     'e'
    

    And an accompanying parser grammar could look like this:

    parser grammar IniParser;
    
    options {
      tokenVocab=IniLexer;
    }
    
    sections
     : section* EOF
     ;
    
    section
     : COMMENT
     | SECTION section_atom*
     ;
    
    section_atom
     : COMMENT
     | KEY ASSIGN VALUE
     ;
    

    which would parse your example input in the following parse tree:

    enter image description here