parsingrustpest

Why does Pest parser fail when requiring whitespace between tokens?


I am trying to build a simple parser using Rust and pest. My pest rules are:

WHITESPACE = { " " | "\t" | "\n" | "\r"  }
STRING = @{"\"" ~ (!("\"" | "\n") ~ ANY)* ~ "\"" }
Name = { ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

This is the LET rule:

LET = { "let" ~ WHITESPACE+ ~ Name ~ WHITESPACE* ~ "=" ~ WHITESPACE* ~ STRING ~ WHITESPACE* ~ ";" }

The problem here is the parser fails at this line:

let var_name = "string";

If I replace the WHITESPACE+ from the LET rule by WHITESPACE* it works but the issue I want to force the space between the let keyword and var_name in my input.

The questions are:


Solution

  • WHITESPACE is a special rule, and WHITESPACE* is inserted implicitly for each ~. So in your case "let" ~ WHITESPACE+ is actually equivalent to "let" ~ WHITESPACE* ~ WHITESPACE+, meaning that the space has already been consumed by the implicit WHITESPACE* when it tries to match WHITESPACE+. If you want to control where white space is allowed, you need to use an atomic rule:

    LET_KW = @{ "let" ~ WHITESPACE }
    LET = { LET_KW ~ Name ~ "=" ~ STRING ~ ";" }