I am trying to make a parser for strings such as "Union[Dict[str,str],Dict[str,str]]" with antlr3. Below is the parser grammar that I use to generate the parser.
grammar PyType;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
OPEN_SQ_BR = '[';
CLOSE_SQ_BR = ']';
LIST = 'List';
SET = 'Set';
UNION = 'Union';
DICT = 'Dict';
TUPLE = 'Tuple';
COMMA = ',';
/* Nothing = 'nothing'; */
OPTIONAL = 'Optional';
HYPHEN = '-' ;
UNDERSCORE = '_' ;
DOT = '\.';
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
parse
: expr
;
list_element
: OPEN_SQ_BR expr CLOSE_SQ_BR -> expr
;
union_element
: OPEN_SQ_BR (expr COMMA)+ CLOSE_SQ_BR -> expr+;
list_expr
: LIST^ list_element*;
set_expr
: SET^ list_element*;
union_expr
: UNION^ union_element;
dict_expr
: DICT^ union_element;
tuple_expr
: TUPLE^ union_element;
optional_expr
: OPTIONAL^ union_element;
DIGIT : '0'..'9' ;
LETTER : 'a'..'z' |'A'..'Z'|'0'..'9'|'_' ;
NUMBER : DIGIT+ ;
SimpleType : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.')*('a'..'z'|'A'..'Z'|'_'|'0'..'9')
;
expr : list_expr
| set_expr
| SimpleType
| union_expr
| dict_expr
| tuple_expr
| optional_expr;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
Following strings are parsed correctly with the above grammer.
However, when I have more than one Union, Dict, or Tuple inside Union, Dict, or Tuple it does not parse correctly. For example Union[Dict[str,str],Dict[str,str]] does not parse correctly.
Could someone please help me to spot the error in the gramar?
Your rule:
union_element
: OPEN_SQ_BR (expr COMMA)+ CLOSE_SQ_BR -> expr+
;
can't be right: it says the expr
must always end with a ,
, causing it not to match Union[Dict[str,str]]
(and all other input examples you mentioned as far as I can see) but matches things like Union[Dict[str,str,],]
instead.
You should do:
union_element
: OPEN_SQ_BR expr (COMMA expr)* CLOSE_SQ_BR -> expr+
;
With that change, I think input like Union[Dict[str,str],Dict[str,str]]
will also be matched properly.