I am doing my research by making a programming language using antlr4 and I am struggling for whole day to fix the problem with two words being one token after whitespace removal.
This is my grammar for antlr:
grammar Grammar;
start: (statement ';')*;
//needs expressions extension
statement
: variable
| //class
| if
| function
| loop
| functionCall
| show
;
variable
: TYPE ID ('=' VAR_TYPE)?
| ...
;
array
: TYPE ID '[]' ('=' '[' VAR_TYPE (',' VAR_TYPE)* ']')?
;
//needs expressions extension
function
: (ACCESS TYPE ID '(' ID* ')' '{'
(
variable
| if
| loop
| functionCall
) 'return' VAR_TYPE
'}')
| (ACCESS 'void' ID '(' ID* ')' '{'
(
variable
| if
| loop
| functionCall
)
'}')
;
//needs expressions extension
if: 'if' (ID | VAR_TYPE) COMPARISON (ID | VAR_TYPE) ':'
(
'\t' variable
| '\t' if
| '\t' loop
| '\t' functionCall
| '\t' show
)*
('else if' (ID | VAR_TYPE) COMPARISON (ID | VAR_TYPE) ':'
(
'\t' variable
| '\t' if
| '\t' loop
| '\t' functionCall
| '\t' show
)*
)*
('else' ':'
(
'\t' variable
| '\t' if
| '\t' loop
| '\t' functionCall
| '\t' show
)*
)?
;
loop: 'foreach' ID 'in' ID ':'
(
'\t' variable
| '\t' if
| '\t' loop
| '\t' functionCall
| '\t' show
)*
;
functionCall: (ID '.')? ID '()';
//needs expressions extension
show: 'show' '(' (ID | VAR_TYPE)? ('+' (ID | VAR_TYPE))* ')';
ACCESS: 'private' | 'public';
COMPARISON: '>' | '<' | '>=' | '<=' | '==';
TYPE: 'int' | 'float' | 'string';
VAR_TYPE: STRING | INT | BOOL | FLOAT | ID;
ID: [a-zA-Z_][a-zA-Z0-9_]* ;
STRING : '"' .*? '"' ;
INT : [0-9]+ ;
BOOL : 'true' | 'false' ;
FLOAT : [0-9]+ '.' [0-9]+ ;
WS : [ \t\r\n]+ -> skip;
This is what console gives after making a tree:
line 1:7 no viable alternative at input 'stringname'
line 2:4 no viable alternative at input 'intage'
And here is also input.txt file for grammar:
string name;
int age;
bool sex;
string children[];
public string returnPerson() {
return "Name " + name + "\nAge " + age + "\nSex " + sex + "\n";
}
public bool isMinor() {
if age > 17:
return false;
else:
return true;
}
public void showChildren() {
int i = 0;
foreach child in children:
show("Children №" + (i + 1) + ": " + child + "\n");
}
I basically just don't know what to do with this, I have witespaces sorted out, but it still thinks it is one token. Also, by the output tree I see that it doesnt go further than two first lines of input.txt.
Help me to fix this problem please.
Your lexer will never produce an ID
token because of this:
VAR_TYPE: STRING | INT | BOOL | FLOAT | ID;
ID: [a-zA-Z_][a-zA-Z0-9_]* ;
Because VAR_TYPE
also matches an ID
. ANTLR's lexer works like this:
Because of rule 2, it is clear that ID
will never get matched.
VAR_TYPE
seems a better candidate for a parser rule:
var_type : STRING | INT | BOOL | FLOAT | ID;
But there are quite a few other things incorrect with the grammar you posted. If you define '()'
in your grammar, then a single '('
token will not be matched. When creating literal tokens inside parser rules, ANTLR creates tokens for them like this:
functionCall: (ID '.')? ID '()';
show: 'show' '(' (ID | VAR_TYPE)? ('+' (ID | VAR_TYPE))* ')';
T__0 : '.';
T__1 : '()';
T__2 : 'show';
T__3 : '(';
T__4 : ')';
...
If you now try to parse the input:
public string returnPerson() {
return "Name " + name + "\nAge " + age + "\nSex " + sex + "\n";
}
using the parser rule:
function
: ACCESS TYPE ID '(' ...
;
it will fail, because ()
is tokenized as a T__1
token, not as T__3
and T__4
tokens.
Also, BOOL : 'true' | 'false' ;
will never get matched because of the 2 match-rules I mentioned earlier (true
and false
will also be matched as VAR_TYPE
tokens).
Here's a quick edit of your grammar so that it will correctly parse your input:
grammar Grammar;
start : statement* EOF;
statement
: variable ';'
| array ';'
| if
| function
| loop
| functionCall ';'
| show ';'
| 'return' expression ';'
;
function
: ACCESS TYPE ID '(' ID* ')' '{' statement* '}'
| ACCESS 'void' ID '(' ID* ')' '{' statement* '}'
;
variable : TYPE ID ('=' expression)?;
array : TYPE ID '[' ']' ('=' '[' expression (',' expression)* ']')?;
if : 'if' expression ':' statement* ('else if' expression ':' statement*)* ('else' ':' statement*)?;
loop : 'foreach' ID 'in' expression ':' statement*;
functionCall : (ID '.')? ID '(' ')';
show : 'show' '(' expression ')';
expression
: '(' expression ')'
| expression '+' expression
| expression COMPARISON expression
| STRING
| ID
| INT
| BOOL
| FLOAT
| ID
;
ACCESS : 'private' | 'public';
COMPARISON : '>' | '<' | '>=' | '<=' | '==';
TYPE : 'int' | 'float' | 'string' | 'bool';
BOOL : 'true' | 'false' ;
ID : [a-zA-Z_][a-zA-Z0-9_]* ;
STRING : '"' (~[\\"] | '\\' .)* '"';
INT : [0-9]+;
FLOAT : [0-9]+ '.' [0-9]+;
WS : [ \t\r\n]+ -> channel(HIDDEN);