My language has single-quoted Unicode character literals like:
'h'
'🙂'
etc.
I'm using the following rule to parse this:
CHAR = "'" (!"'" c:.) "'" { return c; }
This works for ASCII characters, but unfortunately not for Unicode.
How can I modify this to match a single Unicode character like the emoji above?
I solved this by parsing character literals as strings.
Then, in JS, I spread the string into individual unicode codepoints.
If there are more than 1 codepoints, I throw a parse error.
Otherwise, I pick the first codepoint.