rubyparslet

Parslet Alternatives Not parsing whole string


I have the following specs

  it "parses a document with only an expression" do
    puts parser.document.should parse("[b]Hello World[/b]")
  end
  it "parses a document with only text" do
    puts parser.document.should parse(" Hello World")
  end
  it "parses a document with both an expression and text" do
    puts parser.document.should parse("[b]Hello World[/b] Yes hello")
  end

For the following Parslet Parser

class Parser < Parslet::Parser

rule(:open_tag) do
  parslet = str('[')
  parslet = parslet >> (str(']').absent? >> match("[a-zA-Z]")).repeat(1).as(:open_tag_name)
  parslet = parslet >> str(']')
  parslet
end

rule(:close_tag) do
  parslet = str('[/')
  parslet = parslet >> (str(']').absent? >> match("[a-zA-Z]")).repeat(1).as(:close_tag_name)
  parslet = parslet >> str(']')
  parslet
end

rule(:text) { any.repeat(1).as(:text) }

rule(:expression) do
  # [b]Hello World[/b]
  # open tag, any text up until closing tag, closing tag
  open_tag.present?
  close_tag.present?
  parslet = open_tag >> match("[a-zA-Z\s?]").repeat(1).as(:enclosed_text) >> close_tag
  parslet
end

rule(:document) do
  expression | text
end

The first two tests pass just fine, and I can see by puting them out to the command line that the atoms are of the correct type. However, when I try to parse a document with both an expression and plain text, it fails to parse the plain text, failing with the following error

Parslet::UnconsumedInput: Don't know what to do with " Yes hello" at line 1 char 19.

I think I'm missing something regarding defining the :document rule. What I want is something that will consume any number of in sequence expressions and plain text, and while the rule I have will consume each atom individual, using them both in the same string causes failure.


Solution

  • For your document rule you want to use repeat:

    rule(:document) do
      (expression | text).repeat
    end
    

    You’ll also need to change your text rule; currently if it starts matching it will consume everything including any [ that should start a new expression. Something like this should work:

    rule(:text) { match['^\['].repeat(1).as(:text) }