listprologprolog-toplevel

Implementing "last" in Prolog


I am trying to get a feel for Prolog programming by going through Ulle Endriss' lecture notes. When my solution to an exercise does not behave as expected, I find it difficult to give a good explanation. I think this has to do with my shaky understanding of the way Prolog evaluates expressions.

Exercise 2.6 on page 20 calls for a recursive implementation of a predicate last1 which behaves like the built-in predicate last. My attempt is as follows:

last1([_ | Rest], Last) :- last1(Rest, Last).
last1([Last], Last).

It gives the correct answer, but for lists with more than one element, I have to key in the semicolon to terminate the query. This makes last1 different from the built-in last.

?- last1([1], Last).
Last = 1.

?- last1([1, 2], Last).
Last = 2 ;
false.

If I switch the order in which I declared the rule and fact, then I need to key in the semicolon in both cases.

I think I know why Prolog thinks that last1 may have one more solution (thus the semicolon). I imagine it follows the evaluation sequence

last1([1, 2], Last).
==>  last1([2], Last).
==>  last1([], Last).    OR    Last = 2.
==>  false    OR    Last = 2.

That seems to suggest that I should look for a way to avoid matching Rest with []. Regardless, I have no explanation why switching the order of declaration ought to have any effect at all.

Question 1: What is the correct explanation for the behavior of last1?

Question 2: How can I implement a predicate last1 which is indistinguishable from the built-in last?


Solution

  • Question 1:

    Prolog systems are not always able to decide whether or not a clause will apply prior to executing it. The precise circumstances are implementation dependent. That is, you cannot rely on that decision in general. Systems do improve here from release to release. Consider as the simplest case:

    ?- X = 1 ; 1 = 2.
       X = 1
    ;  false.
    

    A very clever Prolog could detect that 1 = 2 always fails, and thus simply answer X = 1. instead. On the other hand, such "cleverness" is very costly to implement and time is better spent for optimizing more frequent cases.

    So why do Prologs show this at all? The primary reason is to avoid asking meekly for another answer, if Prolog already knows that there is no further answer. So prior to this improvement, you were prompted for another answer for all queries containing variables and got the false or "no" on each and every query with exactly one answer. This used to be so cumbersome that many programmers never asked for the next answer and thus were not alerted about unintended answers.

    And the secondary reason is to keep you aware of the limitations of the implementation: If Prolog asks for another answer on this general query, this means that it still uses some space which might accumulate and eat up all your computing resources.

    In your example with last1/2 you encounter such a case. And you already did something very smart, BTW: You tried to minimize the query to see the first occurrence of the unexpected behavior.

    In your example query last1([1,2],X) the Prolog system does not look at the entire list [1,2] but only looks at the principal functor. So for the Prolog system the query looks the same as last1([_|_],X) when it decides which clauses to apply. This goal now fits to both clauses, and this is the reason why Prolog will remember the second clause as an alternative to try out.

    But, think of it: This choice is now possible for all elements but the last! Which means that you pay some memory for each element! You can actually observe this by using a very long list. This I get on my tiny 32-bit laptop — you might need to add another zero or two on a larger system:

    ?- length(L,10000000), last1(L,E).
       resource_error(_). % ERROR: Out of local stack
    

    On the other hand, the predefined last/2 works smoothly:

    ?- length(L,10000000), last(L,E).
       L = [_A,_B,_C,_D,_E,_F,_G,_H,_I|...].
    

    In fact, it uses constant space!

    There are now two ways out of this:

    1. Try to optimize your definition. Yes, you can do this, but you need to be very smart! The definition by @back_dragon for example is incorrect. It often happens that beginners try to optimize a program when in fact they are destroying its semantics.

    2. Ask yourself if you are actually defining the same predicate as last/2. In fact, you're not.

    Question 2:

    Consider:

    ?- last(Xs, X).
       Xs = [X]
    ;  Xs = [_A,X]
    ;  Xs = [_A,_B,X]
    ;  Xs = [_A,_B,_C,X]
    ;  Xs = [_A,_B,_C,_D,X] 
    ;  ... .
    

    and

    ?- last1(Xs, X).
       loops.
    

    So your definition differs in this case with SWI's definition. Exchange the order of the clauses.

    ?- length(L,10000000), last2(L,E).
       L = [_A,_B,_C,_D,_E,_F,_G,_H,_I|...]
    ;  false.
    

    Again, this false! But this time, the big list works. And this time, the minimal query is:

    ?- last2([1],E).
       E = 1
    ;  false.
    

    And the situation is quite similar: Again, Prolog will look at the query in the same way as last2([_|_],E) and will conclude that both clauses apply. At least, we now have constant overhead instead of linear overhead.

    There are several ways to overcome this overhead in a clean fashion - but they all very much depend on the innards of an implementation.