prologrun-length-encoding

Prolog program to create a list with first duplicate elements


I need to implement this function:

cod_first(X, L, Lrem, Lfront). 

Lfront contains all the copies of X that are at the beginning of L, including X; Lrem is the list of the rest of the elements.

I've tried to implement it using append but I'm quite new in Prolog and I'm a bit lost.

The expected output for the program is something like this:

?- cod_first(1, [1, 1, 2, 3], Lrem, Lfront) 
Lrem = [2, 3],
Lfront = [1, 1, 1];
false.

?- cod_first(1, [2, 3, 4], Lrem, Lfront)
Lrem = [2, 3, 4],
Lfront = [1];
false.

Update: I've found this function that packs the same elements into a list:

pack([], []).
pack([X], [[X]]).
pack([X, X| L], [[X| Xs]| R]) :-
    pack([X| L], [Xs| R]).
pack([X, Y| L], [[X]| R]) :-
    X \= Y,
    pack([Y| L], R).

I think this function could be adaptable to the one I'm looking for, any help?


Solution

  • First let's check the code you found! I will test it by considering all lists, starting with the shortest one:

    ?- N=N, length(Xs,N), pack(Xs, Xss).
       N = 0, Xs = [], Xss = []
    ;  N = 1, Xs = [_A], Xss = [[_A]]
    ;  N = 2, Xs = [_A,_A], Xss = [[_A,_A]]
    ;  N = 3, Xs = [_A,_A,_A], Xss = [[_A,_A,_A]]
    ;  N = 4, Xs = [_A,_A,_A,_A], Xss = [[_A,_A,_A,_A]]
    ;  ... .
    

    So, according to this query, your code only works for lists where all elements are the same. In fact, the goal X \= Y is responsible for this. Better express inequality with dif(X, Y). With this little change we get:

     ?- N=N, length(Xs,N), pack(Xs, Xss).
        N = 0, Xs = [], Xss = []
     ;  N = 1, Xs = [_A], Xss = [[_A]]
     ;  N = 2, Xs = [_A,_A], Xss = [[_A,_A]]
     ;  N = 2, Xs = [_A,_B], Xss = [[_A],[_B]], dif(_A,_B)
     ;  N = 3, Xs = [_A,_A,_A], Xss = [[_A,_A,_A]]
     ;  N = 3, Xs = [_A,_A,_B], Xss = [[_A,_A],[_B]], dif(_A,_B)
     ;  N = 3, Xs = [_A,_B,_B], Xss = [[_A],[_B,_B]],  dif(_A,_B)
     ;  N = 3, Xs = [_A,_B,_C], Xss = [[_A],[_B],[_C]], dif(_A,_B), dif(_B,_C)
     ;  N = 4, Xs = [_A,_A,_A,_A], Xss = [[_A,_A,_A,_A]]
     ;  ... .
    

    Now we get really all solutions. Let's consider the two answers for N = 2. The first says that for Xs's elements being all equal, Xss contains just one element. The second says that when Xs's elements are different, they show in separate elements of Xss. Note the dif(_A,_B) which ensures that only terms that are different are chosen.

    However, you are only interested in a single such split:

    cod_first(X, [], [], [X]).
    cod_first(X, [X|Es], Lrem, [X|Xs]) :-
       cod_first(X, Es, Lrem, Xs).
    cod_first(X, [E|Es], [E|Es], [X]) :-
       dif(X,E).
    
    ?- N=N, length(Xs, N), cod_first(X, Xs, Lrem, Lfront).
       N = 0, Xs = [], Lrem = [], Lfront = [X]
    ;  N = 1, Xs = [X], Lrem = [], Lfront = [X,X]
    ;  N = 1, Xs = [_A], Lrem = [_A], Lfront = [X], dif(_A,X)
    ;  N = 2, Xs = [X,X], Lrem = [], Lfront = [X,X,X]
    ;  N = 2, Xs = [X,_A], Lrem = [_A], Lfront = [X,X], dif(_A,X)
    ;  N = 2, Xs = [_A,_B], Lrem = [_A,_B], Lfront = [X], dif(_A,X)
    ;  N = 3, Xs = [X,X,X], Lrem = [], Lfront = [X,X,X,X]
    ;  N = 3, Xs = [X,X,_A], Lrem = [_A], Lfront = [X,X,X], dif(_A,X)
    ;  N = 3, Xs = [X,_A,_B], Lrem = [_A,_B], Lfront = [X,X], dif(_A,X)
    ;  N = 3, Xs = [_A,_B,_C], Lrem = [_A,_B,_C], Lfront = [X], dif(_A,X)
    ;  N = 4, Xs = [X,X,X,X], Lrem = [], Lfront = [X,X,X,X,X]
    ;  ... .
    

    Here is another version which I prefer using library(reif) available for SICStus and SWI.

    cod_first2(X, Es, Lrem, [X|Xs]) :-
       cod_first2i(Es, X, Xs, Lrem).
    
    cod_first2i([], _, [], []).
    cod_first2i([E|Es], X, Xs0, Ys) :-
       if_( E = X
          , ( Xs0 = [X|Xs], cod_first2i(Es, X, Xs, Ys) )
          , ( Xs0 = [], Ys = [E|Es] )
          ).
    

    This is much more efficient, but gives exactly the same answers.