I'm implementing a new DSL in Marpa and (coming from Regexp::Grammars) I'm more than satisfied. My language supports a bunch of unary and binary operators, objects with C-style identifiers and method calls using the familiar dot notation. For example:
foo.has(bar == 42 AND baz == 23)
I found the prioritized rules feature offered by Marpa's grammar description language and have come to rely on that a lot, so I have nearly only one G1 rule Expression
. Excerpt (many alternatives, and semantic actions omitted for brevity):
Expression ::=
NumLiteral
| '(' Expression ')' assoc => group
|| Expression ('.') Identifier
|| Expression ('.') Identifier Args
| Expression ('==') Expression
|| Expression ('AND') Expression
Args ::= ('(') ArgsList (')')
ArgsList ::= Expression+ separator => [,]
Identifier ~ IdentifierHeadChar IdentifierBody
IdentifierBody ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]
NumLiteral ~ [0-9]+
As you can see, I'm using the Scanless interface (SLIF). My problem is that this also parses, for example:
foo.AND(5)
Marpa knows that there can only be an identifier after a dot, so it doesn't even consider the fact that AND
might be a keyword. I know that I can avoid that problem by doing a separate lexing stage that identifies AND
as a keyword explicitly, but that tiny papercut is not quite worth the effort.
Is there a way in SLIF to restrict the Identifier
rule to non-keyword identifiers only?
I don't know how to express such a thing in the grammar. You can introduce an intermediate non-terminal for Identifier which would check the condition, though:
#!/usr/bin/perl
use warnings;
use strict;
use Syntax::Construct qw{ // };
use Marpa::R2;
my %reserved = map { $_ => 1 } qw( AND );
my $grammar = 'Marpa::R2::Scanless::G'->new(
{ bless_package => 'main',
source => \( << '__GRAMMAR__'),
:default ::= action => store
:start ::= S
S ::= Id
| Id NumLiteral
Id ::= Identifier action => allowed
Identifier ~ IdentifierHeadChar IdentifierBody
IdentifierBody ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]
NumLiteral ~ [0-9]+
:discard ~ whitespace
whitespace ~ [\s]+
__GRAMMAR__
});
for my $value ('ABC', 'ABC 42', 'AND 1') {
my $value = $grammar->parse(\$value, 'main');
print $$value, "\n";
}
sub store {
my (undef, $id, $arg) = @_;
$arg //= 'null';
return "$id $arg";
}
sub allowed {
my (undef, $id) = @_;
die "Reserved keyword $id" if $reserved{$id};
return $id
}