javascriptunicodegrammarpegpegjs

How to match a single Unicode character in single quotes


My language has single-quoted Unicode character literals like:

'h'
'🙂'

etc.

I'm using the following rule to parse this:

CHAR = "'" (!"'" c:.) "'" { return c; }

This works for ASCII characters, but unfortunately not for Unicode.

How can I modify this to match a single Unicode character like the emoji above?


Solution

  • I solved this by parsing character literals as strings.

    Then, in JS, I spread the string into individual unicode codepoints.

    If there are more than 1 codepoints, I throw a parse error.

    Otherwise, I pick the first codepoint.