I'm currently working on a project and i have a task to validate an identifier using an ANTLR4 grammar. This part of the project if the frontend using Angular 6, and the grammar will also be compiled to a backend microservice.
The validation consist in validating a string that start with a letter|digit character, and then it can have letter|digit|underscore and finishes with a letter|digit character.
I'm currently having problems with the grammar implementation (since I have no experience in Lex) and handling the errors. Here is my grammar, and implementation for the error.
grammar test;
goal: identifier;
identifier: Alphanum+ Alphanumsymb* Alphanum+;
Alphanum: [a-zA-Z0-9];
Alphanumsymb: [a-zA-Z0-9_];
And my implementation for detecting if the string is valid according to the grammar.
const teststring = "2019_Test_Identifier";
const inputStream = new ANTLRInputStream(teststring);
const lex = new lexer.TestGrammarLexer(inputStream);
const tokenStream = new CommonTokenStream(lex);
const pars = new parser.TestGrammarParser(tokenStream);
pars.goal();
console.log(pars.numberOfSyntaxErrors);
if ( pars.numberOfSyntaxErrors > 0 ) {
return false;
}
return true;
My problem is that even if I get the grammar right, my implementation of the error handling isn't correct and I haven't found material to study the error handling with antlr4ts.
So, if you can help me, I would appreciate feedback about the grammar (how it should be, or the problems it have atm), and about the implementation of the error handling (some info about this, because when testing, I see the ConsoleErrorListener throwing syntax error to console but my function show 0 syntax errors).
Thank you for reading and hope you can help me.
I think using ANTLR is a bit of overkill for your task. ANTLR, or any other parsing tool, is good for constructing the structure of a string, but here, you just want to know if a string is an identifier or not. If you really need ANTLR, please elaborate why and then I can help you with the error handling.
For this task, I'd suggest, you just use a regular expression like the following for testing an identifier:
const regex = /^[a-zA-Z0-9]+|[a-zA-Z0-9][a-zA-Z0-9_]*[a-zA-Z0-9]+$/
And then use it as regex.test(str)
.
It will return false
if the string is not accepted as an identifier.
Please note that your definition of the identifier
in the ANTLR grammar is not correct. It requires at least two characters, because of the two +
quantifiers, and it fails on strings of length 1 such as a
. The regex version also fixes that.