How to parse single quoted string using Marpa:r2? In my below code, the single quoted strings appends '\' on parsing.
Code:
use strict;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new(
{ default_action => '[values]',
source => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start ::= Expression
# include begin
Expression ::= Param
Param ::= Unquoted
| ('"') Quoted ('"')
| (') Quoted (')
:discard ~ whitespace
whitespace ~ [\s]+
Unquoted ~ [^\s\/\(\),&:\"~]+
Quoted ~ [^\s&:\"~]+
END_OF_SOURCE
});
my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
Output's:
Trying to parse:
foo
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
"foo"
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
'foo'
Output:
$VAR1 = [
[
'\'foo\''
]
]; (don't want it to be parsed like this)
Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.
Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')
The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.
When you parse the input 'foo'
, Marpa will consider the three Param
alternatives. The predicted lexemes at that position are:
Unquoted ~ [^\s\/\(\),&:\"~]+
'"'
') Quoted ('
Yes, the last is literally ) Quoted (
, not anything containing a single quote.
Even if it were ([']) Quoted (['])
: Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.
What would happen for an input like " foo "
(with double quotes)? Now, only the '"'
lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing "
is matched.
To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.
Since lexemes cannot have actions, it is usually best to add a proxy production:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
The action could then do something like:
sub process_quoted {
my (undef, $s) = @_;
# remove delimiters from double-quoted string
return $1 if $s =~ /^"(.*)"$/s;
# remove delimiters from single-quoted string
return $1 if $s =~ /^'(.*)'$/s;
die "String was not delimited with single or double quotes";
}