Im implementing a simple language lexer and parser, and I got stuck at defining array for this language. So the requirement is to write grammar for a multi-dimensional arrays so that:
All elements of an array must have the same type which can be number, string,boolean.
In an array declaration, a number literals must be used to represent the number (or the length) of one dimension of that array. If the array is multi-dimensional, there will be more than one number literals. These literals will be separated by comma and enclosed by square bracket ([ ]).
For example:
number a[5] <- [1, 2, 3, 4, 5]
or
number b[2, 3] <- [[1, 2, 3], [4, 5, 6]]
An array value is a comma-separated list of literals enclosed in ’[’ and ’]’. The literal elements are in the same type. For example, [1, 5, 7, 12] or [[1, 2], [4, 5], [3, 5]].
grammar ABC;
@lexer::header {
from lexererr import *
}
options {
language=Python3;
}
program : arraydecl+ EOF;
INT: [0-9]+;
arraydecl : prim_type ID (ARRAY_START arrayliteral_list ARRAY_END) (ASSIGN arrayliteral_list)?;
arrayliteral : ARRAY_START INT (COMMA INT)* ARRAY_END ; // ex: [1,2,3,4]
arrayliteral_list: ARRAY_START (arrayliteral (COMMA arrayliteral)*) ARRAY_END;
prim_type: NUMBER | BOOL | STRING;
NUMBER: 'number';
BOOL: 'bool';
STRING: 'string';
ASSIGN : '<-';
EQ : '=';
ARRAY_START : '[';
ARRAY_END : ']';
LP : '(';
RP : ')';
COMMA : ',';
SEMI : ';';
TYPES: ('number' | 'string' | 'boolean');
prim_types: TYPES;
ID: [A-Za-z_][A-Za-z0-9_]*;
// newline character
NEWLINE: '\n' | '\r\n';
/* COMMENT */
LINECMT : '##' ~[\n\r]*;
WS : [ \t\r\n\f\b]+ -> skip ; // skip spaces, tabs, newlines
ERROR_CHAR: . {raise ErrorToken(self.text)};
UNCLOSE_STRING: . {raise UncloseString(self.text)};
This is my code, and it does not work as i expected Even for the simple testcase like this:
def test_simple_program(self):
"""Test array declaration """
input = """number a[5]
"""
expect = "successful"
self.assertTrue(TestParser.test(input,expect,204))
It returns : "Error on line 1 col 9: 5" \
any help will be greatly appreciated
You need to recursively use an arrayliteral: such a literal contains zero or more expressions. An expression can be an arrayliteral.
Something like this:
program
: arraydecl+ EOF
;
arraydecl
: prim_type ID ARRAY_START expression ARRAY_END (ASSIGN arrayliteral)?
;
prim_type
: NUMBER | BOOL | STRING
;
prim_types
: TYPES
;
expression
: arrayliteral
| ID
| INT
;
arrayliteral
: ARRAY_START expressions? ARRAY_END
;
expressions
: expression (',' expression)*
;