Question: I'm working on a custom parser using ANTLR to define a small programming language. One of the requirements is that return statements can only appear inside the body of a function. If a return statement appears outside a function, the parser should throw an error.
Here's the simplified grammar I'm working with (in ANTLR):
grammar Grammar;
options {
language=Python3;
}
// Parser Rules
program: (var_decl | fun_decl)*;
fun_decl: type_spec ID '(' param_decl* (';' param_decl)* ')' body; // Function declarations
param_decl: type_spec ID (',' ID)* ; // Parameters for functions
type_spec: 'int' | 'float' ; // Valid types
body: '{' stmt* '}';
expr: 'expr';
stmt: assignment | call | r_return | var_decl;
var_decl: param_decl ';'; // Variable declarations
assignment: ID '=' expr ';';
call: ID '(' expr* (',' expr)* ')' ';';
r_return: 'return' expr ';';
// Lexer Rules
WS: [ \t\r\n] -> skip ; // Skip whitespace
ID: [a-zA-Z]+ ; // Identifiers (variable and function names)
ERROR_CHAR: . {raise ErrorToken(self.text)} ; // Error handling
The issue is that this grammar allows return statements (r_return) to appear anywhere a stmt is allowed, including in the global scope. For example:
int x;
return x; // This should throw an error.
But inside a function, it should work:
int myFunction() {
return 42; // Valid
}
I thought about it but I did not come up with a solution. Please help me.
Add EOF to the end of your program
parser rule...
program: (var_decl | fun_decl)* EOF;
...to cause the parser to indicate an error in your first test case.
Not directly related to your question, I suggest defining lexer rules such as...
OPEN_PAREN: '(';
CLOSE_PAREN: ')';
SEMICOLON: ';';
COMMA: ',';
OPEN_CURLY: '{';
CLOSE_CURLY: '}';
EQ: '=';
INT: 'int';
FLOAT: 'float';
RETURN: 'return';
...to use in your parser rules instead of character literals.