[SOLVED] Efficient solution for the same-fringe problem for binary trees

Efficient solution for the same-fringe problem for binary trees

The fringe of a binary tree is the sequence composed by its leaves, from left to right. The same-fringe problem [Hewitt & Patterson, 1970] consists of determining whether two binary trees have the same fringe. For example, the first two trees below have the same fringe, but the last two do not:

%         .         .         .
%        / \       / \       / \
%       .   3     1   .     1   .
%      / \           / \       / \
%     1   2         2   3    -2   3

example(1, fork(fork(leaf(1), leaf(2)), leaf(3))).
example(2, fork(leaf(1), fork(leaf(2), leaf(3)))).
example(3, fork(leaf(1), fork(leaf(-2), leaf(3)))).

A simple solution is to collect the leaves of one tree into a list and then compare them with the leaves of the other tree.

/*
 * SIMPLE SOLUTION
 */

sf_1(T1, T2) :-
    walk(T1, [], Xs),
    walk(T2, [], Xs).

walk(leaf(X), A, [X|A]).
walk(fork(L, R), A0, Xs) :-
    walk(R, A0, A1),
    walk(L, A1, Xs).

Although simple, this solution is considered inelegant: first, because it can be impractical when the first tree is very large; and, second, because it is very inefficient when the trees differ in the first few leaves. Thus, a better solution would be to stop the comparison as soon as the first difference is found, without completely generating the list containing the fringe of the first tree.

/*
 * SUPPOSEDLY BETTER SOLUTION
 */

sf_2(T1, T2) :-
    step([T1], [T2]).

step([], []).
step([T1|S1], [T2|S2]) :-
    next(T1, S1, X, R1),
    next(T2, S2, X, R2),
    step(R1, R2).

next(leaf(X), S, X, S).
next(fork(L, R), S0, X, S) :-
    next(L, [R|S0], X, S).

To compare the performance of these two solutions, I implemented some predicates to run automated experiments (by using SWI-prolog, version 9.0.4):

/*
 * EMPIRICAL COMPARISON
 */

comp(Case) :-
    format('fsize sf-1 sf-2\n'),
    forall( between(1, 10, I),
            (   N is 100000 * I,
                tree(1, N, A),
                (   Case = true         % trees with same fringes
                ->  tree(1, N, B)
                ;   M is random(N//10), % trees with different fringes
                    flip(A, M, B) ),
                time(10, sf_1(A, B), T1),
                time(10, sf_2(A, B), T2),
                format('~0e ~2f ~2f\n', [N, T1, T2]) ) ).

time(N, G, T) :-
    garbage_collect,
    S is cputime,
    forall(between(1, N, _), ignore(call(G))),
    T is (cputime - S) / N.

/*
 * RANDOM TREE GENERATION AND MODIFICATION
 */

tree(X1, Xn, leaf(X1)) :-
    X1 = Xn,
    !.
tree(X1, Xn, fork(L, R)) :-
    X1 < Xn,
    random(X1, Xn, Xi),
    Xj is Xi + 1,
    tree(X1, Xi, L),
    tree(Xj, Xn, R).


flip(leaf(X), Y, leaf(Z)) :-
    (  X = Y
    -> Z is -X
    ;  Z is  X ).
flip(fork(L0, R0), X, fork(L, R)) :-
    flip(L0, X, L),
    flip(R0, X, R).

The empirical results show that the second solution is, in fact, faster than the first when the trees do not have the same fringes:

?- comp(false).
fsize sf-1 sf-2
1e+05 0.01 0.00
2e+05 0.03 0.00
3e+05 0.05 0.00
4e+05 0.07 0.01
5e+05 0.09 0.01
6e+05 0.11 0.00
7e+05 0.12 0.01
8e+05 0.14 0.01
9e+05 0.17 0.00
1e+06 0.18 0.00
true.

However, when the trees do have the same fringe, the second solution is a little slower than the first:

?- comp(true).
fsize sf-1 sf-2
1e+05 0.02 0.03
2e+05 0.04 0.05
3e+05 0.06 0.08
4e+05 0.08 0.11
5e+05 0.10 0.12
6e+05 0.12 0.14
7e+05 0.12 0.16
8e+05 0.14 0.18
9e+05 0.17 0.19
1e+06 0.18 0.22
true.

QUESTION: Is it possible to implement a solution (in Prolog) that is faster than the simple solution (by a constant factor, not necessarily asymptotically faster) when the fringes are distinct, yet is not slower when the fringes are the same? In other words, can we achieve the efficient comparison without the overhead? If so, how?

Solution

Merge the two approaches into one. Always better than sf_2. Spacewise should be better than or equal to sf_1, because the first list is not generated. In SWI the next/3 goal needs to be unfolded to get always better or equal runtime.

sf_3(T1, T2) :-
   stepping(T1, [T2],[]).

next(X, [T|Ts0],Ts) :-
   t_next(T, X, Ts0,Ts).

t_next(leaf(X), X, Ts,Ts).
t_next(fork(L, R), X, Ts0,Ts) :-
   t_next(L, X, [R|Ts0],Ts).

stepping(leaf(X), T2s0,T2s):-
   next(X, T2s0,T2s).
stepping(fork(L, R), T2s0,T2s) :-
   stepping(L, T2s0,T2s1),
   stepping(R, T2s1,T2s).