Is there a way to backreference a previous string in parslet similarly to the \1
functionality in typical regular expressions ?
I want to extract the characters within a block such as:
Marker SomeName
some random text, numbers123
and symbols !#%
SomeName
in which "Marker" is a known string but "SomeName" is not known a-priori, so I believe I need something like:
rule(:name) { ( match('\w') >> match('\w\d') ).repeat(1) }
rule(:text_within_the_block) {
str('Marker') >> name >> any.repeat.as(:text_block) >> backreference_to_name
}
What I don't know is how to write the backreference_to_name rule using Parslet and/or Ruby language.
From http://kschiess.github.io/parslet/parser.html
Capturing input
Sometimes a parser needs to match against something that was already matched against. Think about Ruby heredocs for example:
str = <-HERE This is part of the heredoc. HERE
The key to matching this kind of document is to capture part of the input first and then construct the rest of the parser based on the captured part. This is what it looks like in its simplest form:
match['ab'].capture(:capt) >> # create the capture dynamic { |s,c| str(c.captures[:capt]) } # and match using the capture
The key here is that the dynamic
block returns a lazy parser. It's only evaluated at the point it's being used and gets passed it's current context to reference at the point of execution.
-- Updated : To add a worked example --
So for your example:
require 'parslet'
require 'parslet/convenience'
class Mini < Parslet::Parser
rule(:name) { match("[a-zA-Z]") >> match('\\w').repeat }
rule(:text_within_the_block) {
str('Marker ') >>
name.capture(:namez).as(:name) >>
str(" ") >>
dynamic { |_,scope|
(str(scope.captures[:namez]).absent? >> any).repeat
}.as(:text_block) >>
dynamic { |src,scope| str(scope.captures[:namez]) }
}
root (:text_within_the_block)
end
puts Mini.new.parse_with_debug("Marker BOB some text BOB") .inspect
#=> {:name=>"BOB"@7, :text_block=>"some text "@11}
This required a couple of changes.
str(" ")
to detect that word had ended. (Note: \w is short for [A-Za-z0-9_] so it includes digits):name
text. (otherwise it consumes the 'BOB' and then fails to match, ie. it's greedy!)