prologdcg

How to express "not" in Prolog DCG without using a cut


I'm working on a Prolog DCG parser to tokenize a string into specific patterns. My goal is to parse tokens like mul(Number,Number), do(), and don't(), while ignoring every other patterns.

Here's my current implementation:

parser([Token|Rest]) --> not_token, token(Token), not_token, parser(Rest).
parser([]) --> [].

token(mul(N1, N2)) -->
  "mul(", number(N1), ",", number(N2), ")".
token(do) --> "do()".
token(dont) --> "don't()".

not_token -->
  string(Codes),
  { \+ phrase(token(_), Codes) }.

When I run the following query:

?- phrase(parser(Tokens), `mul(mul(123,456)dommmmulmul(1,2))`).

it does find a correct solution but also a wrong solutions because of how not_token is defined.

Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456), mul(1, 2)] ;
Tokens = [mul(123, 456)] ; -- Incorrect
Tokens = [mul(1, 2)] ; -- Incorrect
false.

I can use the cut to stop at the first solution, but is there a way to express not_token in DCG so that it only returns a correct result.

For more complicated test case, it's actually from https://adventofcode.com/2024/day/3. The current solution can solve the exact problem with a cut, but I want to be able to have my DCG written more strict so that it only finds a correct solution.


Solution

  • You could instead find every token on backtracking and use findall/3 to collect the tokens. This would be all the code:

    :- use_module(library(dcg/basics)).
    
    find_token(T) --> string(_), token(T), string(_).
    
    token(mul(N1, N2)) -->
      "mul(", number(N1), ",", number(N2), ")".
    token(do) --> "do()".
    token(dont) --> "don't()".
    

    and this is how you query it with findall:

    ?- findall(T, phrase(find_token(T), `mul(mul(123,456)dommmmulmul(1,2))`), Tokens).
    Tokens = [mul(123, 456), mul(1, 2)].