prologpattern-matchingsliding-windowdcggomoku

Sliding window method for pattern recognition in SWI-Prolog


I would like to find the most eloquent and efficient method, algorithmically speaking, to count occurrences of some patterns in SWI-Prolog.

For now, my solution uses DCG and looks like this:

count_occurrences(Pattern, List, Count) :-
    phrase(Pattern, PatternList),
    length(PatternList, PatternLength),
    length(List, ListLength),
    MaxIndex is ListLength - PatternLength,
    findall(1, (
        between(0, MaxIndex, Index),
        sublist(List, Index, PatternLength, PatternList)
    ), Counts),
    sum_list(Counts, Count).

sublist(List, Index, Length, Sublist) :-
    length(Sublist, Length),
    append(Prefix, Rest, List),
    length(Prefix, Index),
    append(Sublist, _, Rest).

rep(S, L) --> [], {L is 0} | [S], {L is 1} | [S], { L_1 is L - 1 }, rep(S, L_1), !.
open_rep(S, L) --> [v], rep(S, L), [v], !.

The result is this:

1 ?- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
Count = 3.

2 ?- count_occurrences(open_rep(n, 3), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
Count = 1.

3 ?- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
Count = 2.

I am satisfied with this result, but I don't think this is the most efficient method to achieve it and would like some help making it faster to compute. I think that using append and findall is computationally expensive, but can't figure how to do the same without them...

The DCG shown here is just an example, but I have to count the occurrences of patterns in a list and give them a score. This is in the context of implementing an AI for Gomoku using Alpha-Beta Pruning. Since the board is evaluated frequently, algorithmic complexity matters in order to reduce the time it takes for the AI to take an action.

I have tried multiple variations of the code shown above, but they all use the findall predicate and the best solution I have found to reduce the compute time is to implement early fails.


Solution

  • IMO your approach is too much specific, and it will be sub optimal from the (re)usability viewpoint. SWI-Prolog offers a library predicate that performs RLE (run length encoding), as I discovered from this interesting topic, and is worth to try: here I'll post a module, where I copied your code, and an alternative that uses clumped/2:

    /*  File:    x_so_sliding_window.pl
        Author:  Carlo
        Created: Mar  4 2023
        Purpose: https://stackoverflow.com/q/75630809/874024
    */
    
    :- module(x_so_sliding_window,
              [count_occurrences/3
              ,open_rep//2
              ,count_by_clumped/3
              ]).
    
    
    count_occurrences(Pattern, List, Count) :-
        phrase(Pattern, PatternList),
        length(PatternList, PatternLength),
        length(List, ListLength),
        MaxIndex is ListLength - PatternLength,
        findall(1, (
            between(0, MaxIndex, Index),
            sublist(List, Index, PatternLength, PatternList)
        ), Counts),
        sum_list(Counts, Count).
    
    sublist(List, Index, Length, Sublist) :-
        length(Sublist, Length),
        append(Prefix, Rest, List),
        length(Prefix, Index),
        append(Sublist, _, Rest).
    
    rep(S, L) --> [], {L is 0} | [S], {L is 1} | [S], { L_1 is L - 1 }, rep(S, L_1), !.
    open_rep(S, L) --> [v], rep(S, L), [v], !.
    
    
    count_by_clumped(Pattern,List,Count) :-
        clumped(List, R),
        aggregate_all(count, member(Pattern,R), Count).
    
    

    then I had this code in x_so_sliding_window.plt

    :- [x_so_sliding_window].
    
    t(Count) :- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
    t(Count) :- count_occurrences(open_rep(n, 3), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
    t(Count) :- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
    
    c(Count) :- count_by_clumped(n-1, [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
    c(Count) :- count_by_clumped(n-3, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
    c(Count) :- count_by_clumped(n-1, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
    
    

    that shows at first glance the equivalence between t/1 and c/1 calls:

    ?- t(Count).
    Count = 3 ;
    Count = 1 ;
    Count = 2.
    
    ?- c(Count).
    Count = 3 ;
    Count = 1 ;
    Count = 2.
    
    

    and, at last, the 'benchmark' directly coded in the REPL:

    ?- time((between(1,1000,_),(t(_),fail))).
    % 1,953,001 inferences, 0.234 CPU in 0.234 seconds (100% CPU, 8332804 Lips)
    false.
    ?- time((between(1,1000,_),(c(_),fail))).
    % 123,001 inferences, 0.000 CPU in 0.014 seconds (0% CPU, Infinite Lips)
    false.