Simplified grammar down to this
class_var = { kind ~ type ~ name ~ ";" }
kind = { "static" | "field" }
type = { "int" | "char" | "bool" | class_name }
class_name = {id}
name = { id }
id = { ASCII_ALPHA ~ ASCII_ALPHA* }
WHITESPACE = _{ " " | "\t" | "\n" }
trying to parse this (its a field declaration inside a class, it can either be a known type or a user defined class type)
field x f;
produces
--> 1:10
|
1 | field x f;
| ^---
|
= expected id
Works fine with
field int f;
This happens because ASCII_ALPHA
matches 'a'..'z' | 'A'..'Z'
. So while the first character of f1
is valid for id
, the second is not. You likely want to use ASCII_ALPHANUMERIC
instead for the remaining characters.
id = { ASCII_ALPHA ~ ASCII_ALPHANUMERIC* }
Additionally, you should consider making this rule atomic.
If you want to be extra complete, you might even want to consider using the XID_START
and XID_CONTINUE
Unicode character groups instead. They were created for this exact purpose and distinguish between all of the non-ascii characters.
id = @{ ( XID_START ~ XID_CONTINUE* ) | ( "_" ~ XID_CONTINUE+ ) }