pythonalgorithmlogic-programmingdatalogpydatalog

Dependency Graph Resolving with pyDatalog


I am striving to write more readable, declarative programs. So I decided to implement a simple algorithm we currently use. The procedural implementation is the following:

  1. There are commands and resources
  2. Each command can provide and require multiple resources
  3. The algorithm will loop over all commands and schedule the commands that all their requirements are provided for.
  4. All resources the command provides are now provided for
  5. We are finished if all commands are scheduled
  6. We can't satisfy the dependencies if there are commands left and we can't schedule new commands for an iteration of the algorithm

So the datalog variant I came up with looks nice but has two problems:

  1. It is WRONG
  2. I need a loop to read the result

You can find the complete source here.

It depends on hypothesis and you can easily run it with pytest.

Below the test that fails: If we require resources provided by a previous "rank" or order. It can't find it. I tried to make follows recursive, but then it fails even in the simple examples.

def test_graph_multirequire():
    """Test if the resolver can handle a graph with multiple requires"""
    tree = [
        [('A'), ('B'), ('C'), ('D'), ('E'), ('F'), ('G'), ()],
        [(), ('A'), ('B'), ('C', 'A'), ('D'), ('E'), ('F'), ('G')]
    ]
    run_graph(tree)


def run_graph(tree):
    """Run an example"""
    try:
        tree_len = len(tree[0])
        index = list(range(tree_len))
        random.shuffle(index)
        for i in index:
            + is_command(i)
            for provide in tree[0][i]:
                + provides(i, provide)
            for require in tree[1][i]:
                + requires(i, require)
        ##############################
        is_root(X) <= is_command(X) & ~requires(X, Y)
        follows(X, Z) <= (
            provides(X, Y) & requires(Z, Y) & (X != Z)
        )
        order(0, X) <= is_root(X)
        order(N, X) <= (N > 0) & order(N - 1, Y) & follows(Y, X)
        ##############################
        ordered = []
        try:
            for x in range(tree_len):
                nodes = order(x, N)
                if not nodes:
                    break
                ordered.extend([x[0] for x in nodes])
        except AttributeError:
            ordered = index
        assert len(ordered) >= tree_len
        print(ordered)
        provided = set()
        for command in ordered:
            assert set(tree[1][command]).issubset(provided)
            provided.update(tree[0][command])
    finally:
        pd.clear()

My questions:

  1. Am I using the wrong tool?
  2. Is anybody aware of comprehensive datalog how-tos?
  3. How do you actually solve the problem above?

Edit:

  1. I am missing quantifiers like all() and exists(), how can I express this in pyDatalog?

Solution

  • pyDatalog (and prolog) is well adapted for such problem. The challenge is to depart from the traditional procedural programming mindset. You may want to search the web for a course on prolog : many principles will apply to pyDatalog too.

    Writing a loop in a declarative language involves 3 steps :

    1) define a predicate with all the variables that change while looping.

    In this case, you want to keep track of a partial plan, what has been produced already, and what is left to be planned :

    partial_plan(Planned, Produced, Todo)
    

    For example, partial_plan([0,], ['A',], [1,2,3,4,5,6,7]) is true. To find a plan, you would query :

    partial_plan([C0,C1,C2,C3,C4,C5,C6,C7], L, [])
    

    2) describe the base (= simplest) case. Here, the starting point is when all commands still have to be planned.

    + partial_plan([], [], [0,1,2,3,4,5,6,7])
    

    3) describe the iteration rule. Here, you want to pick a command to be done, and whose requirements have already been produced, and add it to Planned :

    partial_plan(Planned1, Produced1, Todo1) <= (
        # find a smaller plan first
        (Planned0 == Planned1[:-1]) &
        partial_plan(Planned0, Produced0, Todo0) &
        # pick a command to be done, reducing the list of todos, 
        pick(Command, Todo0, Todo1) &
        # verify that it can be done with what has been produced already
        subset(requirement[Command], Produced0) &
        # add it to the plan 
        (Planned1 == Planned0 + [Command, ]) &
        (Produced1 == Produced0 + result[Command])
        )
    

    We now have to define the predicates introduced in the 3rd step.

    # pick
    pick(Command, Todo0, Todo1) <= (
        (X._in(enumerate(Todo0))) & # X has the form [index, value]
        (Command == X[1]) &
        (Todo1 == Todo0[:X[0]] + Todo0[X[0]+1:]) # remove the command from Todo0
        )
    
    # result and requirement are defined as logic functions, 
    # so that they can be used in expressions
    for i in range(len(tree[0])):
        result[i] = tree[0][i]
        requirement[i] = tree[1][i]
    

    subset checks that all elements of L1 are in L2 (notice the all quantifier). It is also defined as a loop:

    subset([], List2) <= (List2 == List2) # base case
    subset(L1, L2) <= (
        (0 < len(L1)) &
        (L1[0]._in(L2)) & # first element in L2
        subset(L1[1:], L2) # remainder is a subset of L2
        )
    

    You would then have to declare all pyDatalog variables and predicates above, and the 'enumerate', 'result' and 'requirement' functions.

    Note: this has not been tested; some minor changes may be needed.