parsingprologdcg

Prolog - Parsing


I'm new to the language prolog and have been given an assignment regarding parsing in prolog. I need some help in solving the problem.

In the assingment we have the grammar:

Expr ::= + Expr Expr | * Expr Expr | Num | Xer  
Xer  ::= x | ^ x Num  
Num  ::= 2 | 3 | .... a Integer (bigger than 1) ...

The token ^ is the same as in math. 5^5 equals 25.

Parse needs to work both ways: a call with an instantiated list to generate an Ast, while a call with an instantiated Ast should generate similar prefix list.

My assingment says that I need to make a prefix parse that does this:
Example(with the value of Ast removed):

?- parse([+, *, 2, x, ^, x, 5 ], Ast), parse(L, Ast).  
X = ...,  
L = [+, *, 2, x, ^, x, 5]  

I would also like to know how the parse tree will look like.


Solution

  • Prolog has a particular formalism to handle context-free grammars directly: DCGs (Definite Clause Grammars). Your example translates almost immediately into a DCG:

    expr --> [+], expr, expr | [*], expr, expr | num | xer.
    
    xer --> [x] | [^], [x], num.
    
    num --> [2] | [3] | [4] | [5].
    

    Now, you already can test sentences:

    ?- phrase(expr, [+,*,2,x,^,x,5]).
       true
    ;  false.
    ?- phrase(expr, [+,*,*,2,x,^,x,5]).
       false.
    

    You can even generate all possible sentences like so:

    ?- length(L, N), phrase(expr, L).
       L = [2], N = 1
    ;  L = [3], N = 1
    ;  ... .
    

    And, finally, you can add the abstract syntax tree to your definition.

    expr(plus(A,B)) --> [+], expr(A), expr(B).
    expr(mul(A,B)) --> [*], expr(A), expr(B).
    expr(Num) --> num(Num).
    expr(Xer) --> xer(Xer).
    
    xer(var(x)) --> [x].
    xer(pow(var(x),N)) --> [^], [x], num(N).
    
    num(num(2)) --> [2].
    num(num(3)) --> [3].
    num(num(4)) --> [4].
    num(num(5)) --> [5].
    

    So now you can use it as desired:

    ?- phrase(expr(AST), [+,*,2,x,^,x,5]), phrase(expr(AST),L).
       AST = plus(mul(num(2),var(x)),pow(var(x),num(5))),
       L = [+,*,2,x,^,x,5]
    ;  false.
    

    Just a nitpick: The interface predicate to DCGs is phrase/2 not parse/2.