rubyparsingindentationparslet

Indentation sensitive parser using Parslet in Ruby?


I am attempting to parse a simple indentation sensitive syntax using the Parslet library within Ruby.

The following is an example of the syntax I am attempting to parse:

level0child0
level0child1
  level1child0
  level1child1
    level2child0
  level1child2

The resulting tree would look like so:

[
  {
    :identifier => "level0child0",
    :children => []
  },
  {
    :identifier => "level0child1",
    :children => [
      {
        :identifier => "level1child0",
        :children => []
      },
      {
        :identifier => "level1child1",
        :children => [
          {
            :identifier => "level2child0",
            :children => []
          }
        ]
      },
      {
        :identifier => "level1child2",
        :children => []
      },
    ]
  }
]

The parser that I have now can parse nesting level 0 and 1 nodes, but cannot parse past that:

require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  rule(:indent) { str('  ') }
  rule(:newline) { str("\n") }
  rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }

  rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }

  rule(:document) { node.repeat }

  root :document

end

require 'ap'
require 'pp'

begin
  input = DATA.read

  puts '', '----- input ----------------------------------------------------------------------', ''
  ap input

  tree = IndentationSensitiveParser.new.parse(input)

  puts '', '----- tree -----------------------------------------------------------------------', ''
  ap tree

rescue IndentationSensitiveParser::ParseFailed => failure
  puts '', '----- error ----------------------------------------------------------------------', ''
  puts failure.cause.ascii_tree
end

__END__
user
  name
  age
recipe
  name
foo
bar

It's clear that I need a dynamic counter that expects 3 indentation nodes to match a identifier on the nesting level 3.

How can I implement an indentation sensitive syntax parser using Parslet in this way? Is it possible?


Solution

  • There are a few approaches.

    1. Parse the document by recognising each line as a collection of indents and an identifier, then apply a transformation afterwards to reconstruct the hierarchy based on the number of indents.

    2. Use captures to store the current indent and expect the next node to include that indent plus more to match as a child (I didn't dig into this approach much as the next one occurred to me)

    3. Rules are just methods. So you can define 'node' as a method, which means you can pass parameters! (as follows)

    This lets you define node(depth) in terms of node(depth+1). The problem with this approach, however, is that the node method doesn't match a string, it generates a parser. So a recursive call will never finish.

    This is why dynamic exists. It returns a parser that isn't resolved until the point it tries to match it, allowing you to now recurse without problems.

    See the following code:

    require 'parslet'
    
    class IndentationSensitiveParser < Parslet::Parser
    
      def indent(depth)
        str('  '*depth)
      end
    
      rule(:newline) { str("\n") }
    
      rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
    
      def node(depth) 
        indent(depth) >> 
        identifier >> 
        newline.maybe >> 
        (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
      end 
    
      rule(:document) { node(0).repeat }
    
      root :document
    end
    

    This is my favoured solution.