ANTLR4: no viable alternative at input 'stringname'

I am doing my research by making a programming language using antlr4 and I am struggling for whole day to fix the problem with two words being one token after whitespace removal.

This is my grammar for antlr:

grammar Grammar;

start: (statement ';')*;

//needs expressions extension
statement
    : variable
    | //class
    | if
    | function
    | loop
    | functionCall
    | show
    ;

variable
    : TYPE ID ('=' VAR_TYPE)?
    | ...
    ;

array 
    : TYPE ID '[]' ('=' '[' VAR_TYPE (',' VAR_TYPE)* ']')?
    ;

//needs expressions extension
function
    : (ACCESS TYPE ID '(' ID* ')' '{' 
        (
            variable
            | if
            | loop
            | functionCall
        ) 'return' VAR_TYPE
      '}')
    | (ACCESS 'void' ID '(' ID* ')' '{' 
        (
            variable
            | if
            | loop
            | functionCall
        )
      '}')
    ;

//needs expressions extension
if: 'if' (ID | VAR_TYPE) COMPARISON (ID | VAR_TYPE) ':'
        (
            '\t' variable
            | '\t' if
            | '\t' loop
            | '\t' functionCall
            | '\t' show
        )*
    ('else if' (ID | VAR_TYPE) COMPARISON (ID | VAR_TYPE) ':'
        (
            '\t' variable
            | '\t' if
            | '\t' loop
            | '\t' functionCall
            | '\t' show
        )*
    )*
    ('else' ':'
        (
            '\t' variable
            | '\t' if
            | '\t' loop
            | '\t' functionCall
            | '\t' show
        )*
    )?
    ;

loop: 'foreach' ID 'in' ID ':'
    (
        '\t' variable
        | '\t' if
        | '\t' loop
        | '\t' functionCall
        | '\t' show
    )*
    ;

functionCall: (ID '.')? ID '()';

//needs expressions extension
show: 'show' '(' (ID | VAR_TYPE)? ('+' (ID | VAR_TYPE))* ')';

ACCESS: 'private' | 'public';
COMPARISON: '>' | '<' | '>=' | '<=' | '==';
TYPE: 'int' | 'float' | 'string';
VAR_TYPE: STRING | INT | BOOL | FLOAT | ID;
ID: [a-zA-Z_][a-zA-Z0-9_]* ;
STRING : '"' .*? '"' ;
INT : [0-9]+ ;
BOOL : 'true' | 'false' ;
FLOAT : [0-9]+ '.' [0-9]+ ;
WS : [ \t\r\n]+ -> skip;

This is what console gives after making a tree:

line 1:7 no viable alternative at input 'stringname'
line 2:4 no viable alternative at input 'intage'

And here is also input.txt file for grammar:

string name;
int age;
bool sex;
string children[];

public string returnPerson() {
    return "Name " + name + "\nAge " + age + "\nSex " + sex + "\n";
}

public bool isMinor() {
    if age > 17:
        return false;
    else:
        return true;
}

public void showChildren() {
    int i = 0;
    foreach child in children:
        show("Children №" + (i + 1) + ": " + child + "\n");
}

I basically just don't know what to do with this, I have witespaces sorted out, but it still thinks it is one token. Also, by the output tree I see that it doesnt go further than two first lines of input.txt.

Help me to fix this problem please.

Solution

Your lexer will never produce an ID token because of this:

VAR_TYPE: STRING | INT | BOOL | FLOAT | ID;
ID: [a-zA-Z_][a-zA-Z0-9_]* ;

Because VAR_TYPE also matches an ID. ANTLR's lexer works like this:

try to match a rule with as many characters as possible
if 2 (or more) rules match the same amount of characters, let tthe one defined first "win"

Because of rule 2, it is clear that ID will never get matched.

VAR_TYPE seems a better candidate for a parser rule:

var_type : STRING | INT | BOOL | FLOAT | ID;

But there are quite a few other things incorrect with the grammar you posted. If you define '()' in your grammar, then a single '(' token will not be matched. When creating literal tokens inside parser rules, ANTLR creates tokens for them like this:

functionCall: (ID '.')? ID '()';
show: 'show' '(' (ID | VAR_TYPE)? ('+' (ID | VAR_TYPE))* ')';

T__0 : '.';
T__1 : '()';
T__2 : 'show';
T__3 : '(';
T__4 : ')';
...

If you now try to parse the input:

public string returnPerson() {
    return "Name " + name + "\nAge " + age + "\nSex " + sex + "\n";
}

using the parser rule:

function
 : ACCESS TYPE ID '(' ...
 ;

it will fail, because () is tokenized as a T__1 token, not as T__3 and T__4 tokens.

EDIT

Also, BOOL : 'true' | 'false' ; will never get matched because of the 2 match-rules I mentioned earlier (true and false will also be matched as VAR_TYPE tokens).

Here's a quick edit of your grammar so that it will correctly parse your input:

grammar Grammar;

start : statement* EOF;

statement
 : variable ';'
 | array ';'
 | if
 | function
 | loop
 | functionCall ';'
 | show ';'
 | 'return' expression ';'
 ;

function
 : ACCESS TYPE ID '(' ID* ')' '{' statement* '}'
 | ACCESS 'void' ID '(' ID* ')' '{' statement* '}'
 ;

variable     : TYPE ID ('=' expression)?;
array        : TYPE ID '[' ']' ('=' '[' expression (',' expression)* ']')?;
if           : 'if' expression ':' statement* ('else if' expression ':' statement*)* ('else' ':' statement*)?;
loop         : 'foreach' ID 'in' expression ':' statement*;
functionCall : (ID '.')? ID '(' ')';
show         : 'show' '(' expression ')';

expression
 : '(' expression ')'
 | expression '+' expression
 | expression COMPARISON expression
 | STRING
 | ID
 | INT
 | BOOL
 | FLOAT
 | ID
 ;

ACCESS     : 'private' | 'public';
COMPARISON : '>' | '<' | '>=' | '<=' | '==';
TYPE       : 'int' | 'float' | 'string' | 'bool';
BOOL       : 'true' | 'false' ;
ID         : [a-zA-Z_][a-zA-Z0-9_]* ;
STRING     : '"' (~[\\"] | '\\' .)* '"';
INT        : [0-9]+;
FLOAT      : [0-9]+ '.' [0-9]+;
WS         : [ \t\r\n]+ -> channel(HIDDEN);