prolog artificial-intelligence datalog knowledge-management

What is the difference between naive and semi naive evaluation?

I have been trying to implement an algorithm for the semi naive evaluation of a datalog program but couldn't get a straightforward answer anywhere that explains the difference in simple words.

According to my understanding naive is a bottom up evaluation technique so is semi-naive.

In the first iteration both evaluation techniques start with an empty set.

As the iterations proceed further both end up having iterations and producing tuples until a new tuple is reached.

So the semi-naive starts from head or body of the rule?

path (X,Y) :- edge(X,Y).

path (X,Y) :- edge(X,Z), path(Z,Y).

Can someone please explain how the EDB and IDB gets updated at the end of each iteration for the above program. Is the tuples stored under each predicate. Like a separate column for edge and a separate column for path or they get stored as a collection.

Also what is the difference between global and local unification?

Solution

The difference between the naîve and semi-naîve evaluation in Datalog is that when you're evaluating using the naïve implementation you take all the initial dataset (existing EDBs) plus the news ones (inferred EDBs) for each iteration. For example, if you have the IDBs like this:

reachable(X,Y) :- link(X,Y).
reachable(X,Y) :- link(X,Z), reachable(Z,Y).

And a set of EDBs like this: link = {(a,b), (b,c), (c,c), (c,d)} The procedure to execute the evaluation is:

Begin by assuming all IDB relations are empty.
Repeatedly evaluate the rules using the EDB and the previous IDB to get a new IDB.
End when there is no change to IDB.

When you're using a naîve approach in every step you'll have the follow data as input and output:

 | Iteration |Input for the current iteration I_{i}            | New facts inferred           |
 |-----------|-------------------------------------------------|------------------------------|
 |  1        | {}                                              | {(a,b), (b,c), (c,c), (c,d)} |
 |  2        | {(a,b), (b,c), (c,c), (c,d)}                    | {(a,c),(b,c),(b,d),(c,d)}    |
 |  3        | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d)}      | {(a,d)}                      |
 |  4        | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d),(a,d)}| {}                           |

At the 4th iteration, you'll stop because the fixpoint is reached, and no new facts could be inferred. However, in the semi-naïve approach, you apply an optimization, instead of using all derived facts as input to rules at each iteration, it's possible to send to each iteration only the tuples already learned in prior iterations, for avoid duplicates tuples.

 | Iteration |Input for the current iteration I_{i}  | New facts inferred           |
 |-----------|---------------------------------------|------------------------------|
 |  1        | {}                                    | {(a,b), (b,c), (c,c), (c,d)} |
 |  2        | {(a,b), (b,c), (c,c), (c,d)}          | {(a,c),(b,c),(b,d),(c,d)}    |
 |  3        | {(a,c), (b,d)}                        | {(a,d)}                      |
 |  4        | {(a,d)}                               | {}                           |

Source: Datalog and Recursive Query Processing