tatsu

How do I get Tatsu to not consume the right bracket in the identifier name?


I have identifier defined as:

identifier = /[A-zA-Z][A-zA-Z0-9_]*/ ;

and arrayType as:

arrayType = ARRAY LBRACK ~ typeList RBRACK OF componentType;

so why is Tatsu deciding that 'ASCIIcode]' is an identifier and not an identity + right bracket in the logs below?

≡'[' 
ASCIIcode] Of ASCIIcode;
≡LBRACK↙arrayType↙unpackedStructuredType↙structuredType↙type↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙typeList↙arrayType↙unpackedStructuredType↙structuredType↙type↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙indexType↙typeList↙arrayType↙unpackedStructuredType↙structuredType↙type↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙simpleType↙indexType↙typeList↙arrayType↙unpackedStructuredType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙scalarType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙LPAREN↙scalarType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
≢'(' 
ASCIIcode] Of ASCIIcode;
≢LPAREN↙scalarType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
≢scalarType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙subrangeType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙constant↙subrangeType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙unsignedNumber↙constant↙subrangeType↙simpleType↙indexType↙typeList↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙unsignedInteger↙unsignedNumber↙constant↙subrangeType↙simpleType↙ ~99:15
ASCIIcode] Of ASCIIcode;
≢'' /\d+/
ASCIIcode] Of ASCIIcode;
↙unsignedReal↙unsignedNumber↙constant↙subrangeType↙simpleType↙indexType↙ ~99:15
ASCIIcode] Of ASCIIcode;
≢'' /\d+/
ASCIIcode] Of ASCIIcode;
≢unsignedNumber↙constant↙subrangeType↙simpleType↙indexType↙typeList↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙sign↙constant↙subrangeType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙PLUS↙sign↙constant↙subrangeType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
≢'+' 
ASCIIcode] Of ASCIIcode;
≢PLUS↙sign↙constant↙subrangeType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙MINUS↙sign↙constant↙subrangeType↙simpleType↙indexType↙typeList↙ ~99:15
ASCIIcode] Of ASCIIcode;
≢'-' 
ASCIIcode] Of ASCIIcode;
≢MINUS↙sign↙constant↙subrangeType↙simpleType↙indexType↙typeList↙ ~99:15
ASCIIcode] Of ASCIIcode;
≢sign↙constant↙subrangeType↙simpleType↙indexType↙typeList↙arrayType↙ ~99:15
ASCIIcode] Of ASCIIcode;
↙identifier↙constant↙subrangeType↙simpleType↙indexType↙typeList↙ ~99:15
ASCIIcode] Of ASCIIcode;
≡'ASCIIcode]' /[A-zA-Z][A-zA-Z0-9_]*/
 Of ASCIIcode;

Solution

  • The regular expression is incorrect respect what you intend (it's a good idea to test regular expressions on sites like https://pythex.org).

    The regex is using an upper case "A" when trying to define the lower case letter range.

    You can try using:

    identifier = /[a-zA-Z][a-zA-Z0-9_]*/ ;
    

    or even better:

    identifier = /\w[\w\d_]*/ ;