pythonparsingbnffirst-order-logiclark-parser

How do I write the lark grammar for First Order Logic with Equality?


According to AIMA (Russell & Norvig, 2010) this is the BNF grammar for FOL with Equality:

bnf

How do I convert this to a lark grammar? Specifically, how do I represent n-ary predicates using lark grammar?


Solution

  • I'm going to take this question as asking how to specify the syntax of an application of an identifier to a parenthesised, comma-separated list of terms.

    In syntactic terms, that's similar enough to JSON list syntax to make it worthwhile looking at the first sample grammar (for JSON) in the Lark documentation site. Functions and predicates in your FOL grammar differ from JSON lists: they use round parentheses (()) instead of square brackets ([]) and they need to additionally specify the name of a function or predicate, but the JSON grammar shows how to write a comma-separated list of things, and we can easily apply that exact same grammar syntax:

    AtomicSentence
        : Predicate ['(' Term (',' Term)* ')']
        | Term '=' Term
    
    Term: Function '(' Term (',' Term)* ')'
        | Constant
        | Variable
    

    (I left out the rest of the supplied grammar, since it isn't relevant to the question about Predicates.)

    In that grammar syntax, parentheses are used for grouping and * is a postfix operator indicating the Kleene star; that is, "any number of repetitions, including zero, of the operand". Square brackets ([]) are used to enclose an optional syntactic sequence.

    That's not BNF, since BNF doesn't have syntactic operators like optionality or Kleene Star. It's an example of what's often called "Extended BNF" (EBNF), which comes in a huge number of varieties with subtly different syntaxes. But it can be mechanically desugared into BNF; one BNF equivalent for the above would be:

    AtomicSentence
        : Predicate '(' TermList ')'
        | Predicate
        | Term '=' Term
    
    Term: Function '(' TermList ')'
        | Constant
        | Variable
    
    TermList
        : Term
        | TermList ',' Term
    

    That grammar does not provide a way to specify the arity (that is, the number of arguments) of each function and predicate, and consequently will generate invalid sentences. The abstract FOL grammar has no evident way of defining new predicates or functions, presumably because functions and predicates are not first-order objects. So every function, predicate and constant must be individually defined in some extra-grammatic way. Thus, the grammar might be considered short-hand for a class of more specific grammars, each with an associated finite set of defined symbols. Those definitions presumably specify the arity of each symbol, as well as indicating which are predicates and which are functions.

    In order for the concrete grammar to restrict itself to predicates and functions written with correct arity (that is, with the correct number of arguments), it would need to be modified (for each specific collection of predicates and functions), according to a model like this:

    AtomicSentence
        : Predicate/0
        | Predicate/1 '(' TermList/1 ')'
        | Predicate/2 '(' TermList/2 ')'
        | Predicate/3 '(' TermList/3 ')'
        | Predicate/4 '(' TermList/4 ')'
        ...
        | Term '=' Term
    
    Term: Constant
        | Function/1 '(' TermList/1 ')'
        | Function/2 '(' TermList/2 ')'
        | Function/3 '(' TermList/3 ')'
        | Function/4 '(' TermList/4 ')'
        ...
        | Variable
    
    TermList/1 : Term
    TermList/2 : TermList/1 ',' Term
    TermList/3 : TermList/2 ',' Term
    TermList/4 : TermList/3 ',' Term
    ...
    Predicate/0 : "True" | "False"
    Predicate/2 : "After" | "Loves"
    ...
    Function/1  : "Mother" | "LeftLeg"
    Function/2  : "Sum" | "Product"
    ...
    

    (The /n suffixes are part of the individual names; their semantic significance is external to the grammar. I take that particular sylistic convention from Prolog; it's also used by Erlang and some ML derivatives.) The ellipses represent other concrete definitions, which I didn't happen to write out; they are not intended to be thought of as lists of arbitrary length. The number of aritys actually defined by the grammar will be limited to the aritys actually used by the specific functions and predicates for which the concrete grammar is being defined. So each concrete grammar produced according to that model will have a finite number of productions.

    Note that a context-free grammar is not able to accurately represent a language in which functions are defined with a specific arity and elsewhere used only with exactly the same arity, unless there is a prespecified maximum arity. Grammatical concordance of that form (as with mandatory declaration of used symbols) requires a context-sensitive grammar formalism.

    This answer deliberately does not discuss operator precedence (for the operators defined in ComplexSentence), because it's not part of the original question. Without that specification, the grammar is ambiguous, but there is certainly an operator binding precedence hierarchy, presumably defined in the narrative surrounding the FOL grammar.