c++ccontext-free-grammarbnfbnfc

LBNF, C function declaration / definition, reduce reduce conflict


I'm trying to represent in LBNF that C/C++ function declarations have the following (approximate) form (<sym> denotes optionallity, and [rule] is a zero-or-more list):

type ident ( [type <id>] );

While function definitions have the form:

type ident ( [type id] ) { [stmt] }

Currently, I have the following LBNF:

entrypoints Program ;
Program.   Program ::= [TopDef] ;

FnDef.     TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl.    TopDef ::= Type Ident "(" [Par] ")" ";" ;
separator nonempty TopDef "" ;

Param.     Par ::= Type ;
NParam.    Par ::= Type Ident ;
separator Par "," ;

Arg.       Arg ::= Type Ident;
separator  Arg "," ;

-- Types ---------------------------------------------------

Int.       Type ::= "int" ;
--- more types...
separator  Type "," ;

Which causes reduce/reduce conflicts as expected. Is there a way to solve this in the parser / lexer?

I could solve it with the following grammar:

FnDef.     TopDef ::= Type Ident "(" [Arg] ")" Block ;
FnDecl.    TopDef ::= Type Ident "(" [Arg] ")" ";" ;
separator nonempty TopDef "" ;

Arg.       Arg ::= Type Ident;
Arg.       Arg ::= Type;
separator  Arg "," ;

And then check in the type checker that function definitions have identifiers for each argument, but this feels less satisfying...

How is this typically handled in a language like C ?


Solution

  • The way you mention which you find unsatisfying is, in fact, the way it's usually accomplished.

    But you can produce an LALR(1) grammar which is precise. Here's a complete bison specification without conflicts:

    %token TYPE ID
    %%
    prog       : 
               | prog decl ';'
    decl       : TYPE ID def_list block
               | TYPE ID def_list ';'
               | TYPE ID dec_list ';'
    block      : '{' prog '}'
    def_list   : '(' ')'
               | '(' type_ids ')'
    dec_list   : '(' type_opt_ids ')'
    type_opt_id: type_only
               | type_id
    type_ids   : type_id
               | type_ids ',' type_id
    type_opt_ids
               : type_only
               | type_ids ',' type_only  /* SEE BELOW */
               | type_opt_ids ',' type_opt_id
    type_id    : TYPE ID
    type_only  : TYPE        
    

    The key is to avoid forcing the parser to make a decision. As it is passing over the list of parameters, it can keep reducing a type_opt_ids as long as it doesn't hit an anonymous parameter. If it hits one, it reduces a type_ids and continues for the rest of the parameters, whether they are anonymous or not. At the end, the definition only allows type_ids but the declaration (explicitly) accepts either.

    To make that work, the semantic type for type_id and type_only would need to be the same, since both type_ids and type_opt_ids must be lists of that type. Otherwise, you would need to convert type_ids to type_opt_ids in the production marked with /* SEE BELOW */

    (I apologize for not converting that to your formalism, but I wanted to test it with bison to verify that it is, in fact, conflict-free. It should be easy enough to convert.)


    Note that C++ is happy to allow function definitions without parameter names, but C is not. On the other hand, C allows function definitions without types, or with the type declarations between the parameter-name list and the body. But that's a side issue, really.