I'm writing a parser for strings with interpolated name-value arguments, e.g.: 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
The argument values are code, which has its own set of parse rules.
Here's a version of my parser, simplified to only allow basic arithmetic as code:
require 'parslet'
require 'ap'
class TestParser < Parslet::Parser
rule :integer do match('[0-9]').repeat(1).as :integer end
rule :space do match('[\s\\n]').repeat(1) end
rule :parens do str('(') >> code >> str(')') end
rule :operand do integer | parens end
rule :addition do (operand.as(:left) >> space >> str('+') >> space >> operand.as(:right)).as :addition end
rule :code do addition | operand end
rule :name do match('[a-z]').repeat 1 end
rule :argument do name.as(:name) >> str(':') >> space >> code.as(:value) end
rule :arguments do argument >> (str(',') >> space >> argument).repeat end
rule :interpolation do str('#{') >> arguments.as(:arguments) >> str('}') end
rule :text do (interpolation.absent? >> any).repeat(1).as(:text) end
rule :segments do (interpolation | text).repeat end
root :segments
end
string = 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
ap TestParser.new.parse(string), index: false
Since the code has its own parse rules (to ensure valid syntax), the argument values are parsed into a subtree (with parentheses etc. replaced by nesting within the subtree):
[
{
:text => "This sentence "@0
},
{
:arguments => [
{
:name => "x"@16,
:value => {
:integer => "2"@19
}
},
{
:name => "y"@22,
:value => {
:addition => {
:left => {
:addition => {
:left => {
:integer => "2"@26
},
:right => {
:integer => "5"@30
}
}
},
:right => {
:integer => "3"@35
}
}
}
}
]
},
{
:text => " has stuff in it."@37
}
]
However, I want to store the argument values as strings, so this would be the ideal result:
[
{
:text => "This sentence "@0
},
{
:arguments => [
{
:name => "x"@16,
:value => "2"
},
{
:name => "y"@22,
:value => "(2 + 5) + 3"
}
]
},
{
:text => " has stuff in it."@37
}
]
How can I use the Parslet subtrees to reconstruct the argument-value substrings? I could write a code generator, but that seems overkill -- Parslet clearly has access to the substring position information at some point (although it might discard it).
Is it possible to leverage or hack Parslet to return the substring?
Here's the hack I ended up with. There are better ways to accomplish this, but they'd require more extensive changes. Parser#parse
now returns a Result
. Result#tree
gives the normal parse result, and Result#strings
is a hash that maps subtree structures to source strings.
module Parslet
class Parser
class Result < Struct.new(:tree, :strings); end
def parse(source, *args)
source = Source.new(source) unless source.is_a? Source
value = super source, *args
Result.new value, source.value_strings
end
end
class Source
prepend Module.new{
attr_reader :value_strings
def initialize(*args)
super *args
@value_strings = {}
end
}
end
class Atoms::Base
prepend Module.new{
def apply(source, *args)
old_pos = source.bytepos
super.tap do |success, value|
next unless success
string = source.instance_variable_get(:@str).string.slice(old_pos ... source.bytepos)
source.value_strings[flatten(value)] = string
end
end
}
end
end