I'm trying to represent in LBNF that C/C++ function declarations have the following (approximate) form (<sym>
denotes optionallity, and [rule] is a zero-or-more list):
type ident ( [type <id>] );
While function definitions have the form:
type ident ( [type id] ) { [stmt] }
Currently, I have the following LBNF:
entrypoints Program ;
Program. Program ::= [TopDef] ;
FnDef. TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl. TopDef ::= Type Ident "(" [Par] ")" ";" ;
separator nonempty TopDef "" ;
Param. Par ::= Type ;
NParam. Par ::= Type Ident ;
separator Par "," ;
Arg. Arg ::= Type Ident;
separator Arg "," ;
-- Types ---------------------------------------------------
Int. Type ::= "int" ;
--- more types...
separator Type "," ;
Which causes reduce/reduce conflicts as expected. Is there a way to solve this in the parser / lexer?
I could solve it with the following grammar:
FnDef. TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl. TopDef ::= Type Ident "(" [Arg] ")" ";" ;
separator nonempty TopDef "" ;
Arg. Arg ::= Type Ident;
Arg. Arg ::= Type;
separator Arg "," ;
And then check in the type checker that function definitions have identifiers for each argument, but this feels less satisfying...
How is this typically handled in a language like C ?
The way you mention which you find unsatisfying is, in fact, the way it's usually accomplished.
But you can produce an LALR(1) grammar which is precise. Here's a complete bison specification without conflicts:
%token TYPE ID
%%
prog :
| prog decl ';'
decl : TYPE ID def_list block
| TYPE ID def_list ';'
| TYPE ID dec_list ';'
block : '{' prog '}'
def_list : '(' ')'
| '(' type_ids ')'
dec_list : '(' type_opt_ids ')'
type_opt_id: type_only
| type_id
type_ids : type_id
| type_ids ',' type_id
type_opt_ids
: type_only
| type_ids ',' type_only /* SEE BELOW */
| type_opt_ids ',' type_opt_id
type_id : TYPE ID
type_only : TYPE
The key is to avoid forcing the parser to make a decision. As it is passing over the list of parameters, it can keep reducing a type_opt_ids
as long as it doesn't hit an anonymous parameter. If it hits one, it reduces a type_ids
and continues for the rest of the parameters, whether they are anonymous or not. At the end, the definition only allows type_ids
but the declaration (explicitly) accepts either.
To make that work, the semantic type for type_id
and type_only
would need to be the same, since both type_ids
and type_opt_ids
must be lists of that type. Otherwise, you would need to convert type_ids
to type_opt_ids
in the production marked with /* SEE BELOW */
(I apologize for not converting that to your formalism, but I wanted to test it with bison to verify that it is, in fact, conflict-free. It should be easy enough to convert.)
Note that C++ is happy to allow function definitions without parameter names, but C is not. On the other hand, C allows function definitions without types, or with the type declarations between the parameter-name list and the body. But that's a side issue, really.