rubyparslet

in Parslet, how to reconstruct substrings from parse subtrees?


I'm writing a parser for strings with interpolated name-value arguments, e.g.: 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.' The argument values are code, which has its own set of parse rules.

Here's a version of my parser, simplified to only allow basic arithmetic as code:

require 'parslet'
require 'ap'
class TestParser < Parslet::Parser
  rule :integer do match('[0-9]').repeat(1).as :integer end
  rule :space do match('[\s\\n]').repeat(1) end
  rule :parens do str('(') >> code >> str(')') end
  rule :operand do integer | parens end
  rule :addition do (operand.as(:left) >> space >> str('+') >> space >> operand.as(:right)).as :addition end
  rule :code do addition | operand end
  rule :name do match('[a-z]').repeat 1 end
  rule :argument do name.as(:name) >> str(':') >> space >> code.as(:value) end
  rule :arguments do argument >> (str(',') >> space >> argument).repeat end
  rule :interpolation do str('#{') >> arguments.as(:arguments) >> str('}') end
  rule :text do (interpolation.absent? >> any).repeat(1).as(:text) end
  rule :segments do (interpolation | text).repeat end
  root :segments
end
string = 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
ap TestParser.new.parse(string), index: false

Since the code has its own parse rules (to ensure valid syntax), the argument values are parsed into a subtree (with parentheses etc. replaced by nesting within the subtree):

[
    {
        :text => "This sentence "@0
    },
    {
        :arguments => [
            {
                 :name => "x"@16,
                :value => {
                    :integer => "2"@19
                }
            },
            {
                 :name => "y"@22,
                :value => {
                    :addition => {
                         :left => {
                            :addition => {
                                 :left => {
                                    :integer => "2"@26
                                },
                                :right => {
                                    :integer => "5"@30
                                }
                            }
                        },
                        :right => {
                            :integer => "3"@35
                        }
                    }
                }
            }
        ]
    },
    {
        :text => " has stuff in it."@37
    }
]

However, I want to store the argument values as strings, so this would be the ideal result:

[
    {
        :text => "This sentence "@0
    },
    {
        :arguments => [
            {
                 :name => "x"@16,
                :value => "2"
            },
            {
                 :name => "y"@22,
                :value => "(2 + 5) + 3"
            }
        ]
    },
    {
        :text => " has stuff in it."@37
    }
]

How can I use the Parslet subtrees to reconstruct the argument-value substrings? I could write a code generator, but that seems overkill -- Parslet clearly has access to the substring position information at some point (although it might discard it).

Is it possible to leverage or hack Parslet to return the substring?


Solution

  • Here's the hack I ended up with. There are better ways to accomplish this, but they'd require more extensive changes. Parser#parse now returns a Result. Result#tree gives the normal parse result, and Result#strings is a hash that maps subtree structures to source strings.

    module Parslet
    
      class Parser
        class Result < Struct.new(:tree, :strings); end
        def parse(source, *args)
          source = Source.new(source) unless source.is_a? Source
          value = super source, *args 
          Result.new value, source.value_strings
        end
      end
    
      class Source
        prepend Module.new{
          attr_reader :value_strings
          def initialize(*args)
            super *args
            @value_strings = {}
          end
        }
      end
    
      class Atoms::Base
        prepend Module.new{
          def apply(source, *args)
            old_pos = source.bytepos
            super.tap do |success, value|
              next unless success
              string = source.instance_variable_get(:@str).string.slice(old_pos ... source.bytepos)
              source.value_strings[flatten(value)] = string
            end
          end    
        }
      end
    
    end