I want to generate a nested parse tree for the below yaml sample file using ANTLR tool. I tried the below grammar, but for some reason its not properly displaying the nesting of nodes according to the yaml file.
yaml file is:
kind: Test
metadata:
name: target
labels:
runnable: target
annotations:
message_value: Hi
id: 1
node_id: 2
hex_id: 3
the ANTLR grammar I tried is:
grammar Sample;
yaml: (entry NEWLINE)* EOF;
entry: (keyValue | mapping);
keyValue: SPACE* KEY COLON SPACE* value SPACE* NEWLINE;
mapping: SPACE* KEY COLON SPACE* NEWLINE (nestedEntry)+;
nestedEntry: SPACE* keyValue | mapping;
value: STRING | NUMBER | (NEWLINE SPACE* mapping);
KEY: [a-zA-Z_]+[0-9]*[a-zA-Z_]*;
STRING: [a-zA-Z._]+;
NUMBER: [0-9]+;
NEWLINE: [\r\n]+;
SPACE: [ ] -> skip;
COLON: ':' -> skip;
The expected parse tree out put is like below:
How can I achieve this proper parse tree.
Any idea what could be this issue in above grammar and how to resolve this?
Your KEY
and STRING
rules overlap too much, causing STRING
to almost never get matched. With ANTLR, when 2 (or more) rules match the same, the one defined first "wins". So Test
, target
and hi
will not get matched as a STRING
, but as a KEY
.
Also, you're skipping SPACE
and COLON
in your lexer, making them unavailable in parser rules. COLON
shouldn't be skipped in the first place.
Try something like this instead:
yaml : entry (NEWLINE+ entry)* NEWLINE* EOF;
entry : (keyValue | mapping);
keyValue : KEY_OR_VALUE COLON value;
mapping : KEY_OR_VALUE COLON NEWLINE nestedEntry+;
nestedEntry : keyValue | mapping;
value : KEY_OR_VALUE+ | NUMBER | NEWLINE mapping;
KEY_OR_VALUE : [a-zA-Z_]+ [a-zA-Z_0-9.]*;
NUMBER : [0-9]+;
COLON : ':';
NEWLINE : [\r\n]+;
SPACE : [ \t] -> skip;
which will parse your example input like this:
I am sure you're aware of it, but writing an ANTLR grammar for YAML is rather tricky because of indentation. You could have a look here: https://github.com/umaranis/FastYaml
Without being able to recognize indentations, you cannot make a distinction between:
property:
key:
value: 1
value: 2
and
property:
key:
value: 1
value: 2
Both value: 1
and value: 2
start with an indentation, but you have no way to recognize how many of them there are. I only showed how your grammar would be a valid ANTLR grammar. Your current grammar cannot easily be changed to support indentation recognition. You should study the FastYaml grammar I linked to.