prologdcgiso-prolog

DCG for idiomatic phrase preference


I have a manually made DCG rule to select idiomatic phrases over single words. The DCG rule reads as follows:

seq(cons(X,Y), I, O) :- noun(X, I, H), seq(Y, H, O), \+ noun(_, I, O).
seq(X) --> noun(X).

The first clause is manually made, since (:-)/2 is used instead of (-->)/2. Can I replace this manually made clause by some clause that uses standard DCG?

Best Regards

P.S.: Here is some test data:

noun(n1) --> ['trojan'].
noun(n2) --> ['horse'].
noun(n3) --> ['trojan', 'horse'].
noun(n4) --> ['war'].

And here are some test cases, the important test case is the first test case, since it does only deliver n3 and not cons(n1,n2). The behaviour of the first test case is what is especially desired:

?- phrase(seq(X),['trojan','horse']).
X = n3 ;
No
?- phrase(seq(X),['war','horse']).
X = cons(n4,n2) ;
No
?- phrase(seq(X),['trojan','war']).
X = cons(n1,n4) ;
No

Solution

  • (To avoid collisions with other non-terminals I renamed your seq//1 to nounseq//1)

    Can I replace this manually made clause by some clause that uses standard DCG?

    No, because it is not steadfast and it is STO (details below).

    Intended meaning

    But let me start with the intended meaning of your program. You say you want to select idiomatic phrases over single words. Is your program really doing this? Or, to put it differently, is your definition really unique? I could now construct a counterexample, but let Prolog do the thinking:

    nouns --> [] | noun(_), nouns.
    
    ?- length(Ph, N), phrase(nouns,Ph),
          dif(X,Y), phrase(nounseq(X),Ph), phrase(nounseq(Y),Ph).
       Ph = [trojan,horse,trojan], N = 3, X = cons(n1,cons(n2,n1)), Y = cons(n3,n1)
    ;  ...
    ;  Ph = [trojan,horse,war], N = 3, X = cons(n3,n4), Y = cons(n1,cons(n2,n4))
    ;  ... .
    

    So your definition is ambiguous. What you essentially want (probably) is some kind of rewrite system. But those are rarely defined in a determinate manner. What, if two words overlap like an additional noun(n5) --> [horse, war]. etc.


    Conformance

    A disclaimer up-front: Currently, the DCG document is still being developed — and comments are very welcome! You find all material in this place. So strictly speaking, there is at the current point in time no notion of conformance for DCG.

    Steadfastness

    One central property a conforming definition must maintain is the property of steadfastness. So before looking into your definition, I will compare two goals of phrase/3 (running SWI in default mode).

    ?- Ph = [], phrase(nounseq(cons(n4,n4)),Ph0,Ph).
       Ph = [], Ph0 = [war,war]
    ;  false.
    ?- phrase(nounseq(cons(n4,n4)),Ph0,Ph), Ph = [].
       false.
    ?- phrase(nounseq(cons(n4,n4)),Ph0,Ph).
       false.
    

    Moving the goal Ph = [] at the end, removes the only solution. Therefore, your definition is not steadfast. This is due to the way how you handle (\+)/1: The variable O must not occur within the (\+)/1. But on the other hand, if it does not occur within (\+)/1 you can only inspect the beginning of a sentence. And not the entire sentence.

    Subject to occurs-check property

    But the situation is worse:

    ?- set_prolog_flag(occurs_check,error).
       true.
    ?- phrase(nounseq(cons(n4,n4)),Ph0,Ph).
    ERROR: noun/3: Cannot unify _G968 with [war|_G968]: would create an infinite tree
    

    So your program relies on STO-unifications (subject-to-occurs-check unifications) whose outcome is explicitly undefined in

    ISO/IEC 13211-1 Subclause 7.3.3 Subject to occurs-check (STO) and not subject to occurs-check (NSTO)

    This is rather due to your intention to define the intersection of two non-terminals. Consider the following way to express it:

    :- op(  950,  xfx, //\\).  % ASCII approximation for ∩ - 2229;INTERSECTION
    
    (NT1 //\\ NT2) -->
       call(Xs0^Xs^(phrase(NT1,Xs0,Xs),phrase(NT2,Xs0,Xs))).
    

    % The following is predefined in library(lambda):

    ^(V0, Goal, V0, V) :-
       call(Goal,V).
    
    ^(V, Goal, V) :-
       call(Goal).
    

    Already with this definition we can get into STO situations:

    ?- phrase(([a]//\\[a,b]), Ph0,Ph).
    ERROR: =/2: Cannot unify _G3449 with [b|_G3449]: would create an infinite tree
    

    In fact, when using rational trees we get:

    ?- set_prolog_flag(occurs_check,false).
       true.
    ?- phrase(([a]//\\[a,b]), Ph0,Ph).
       Ph0 = [a|_S1], % where
          _S1 = [b|_S1],
       Ph = [b|_S1].
    

    So there is an infinite list which certainly has not much meaning for natural language sentences (except for persons of infinite resource and capacity...).