algorithmdoubly-linked-listknuth

Knuth Dancing Links with Secondary Columns


I have implemented Knuth's "Dancing Links" algorithm to explore the generalized exact cover problem (i.e., with secondary columns). The code works as expected for an exact cover (i.e., all columns are primary columns) and so for a simple sparse matrix:

[0, 1, 0]
[0, 0, 1]
[1, 0, 0]
[1, 1, 0]
[0, 1, 1]
[1, 1, 1]

My code returns the following set of rows as solutions:

[0, 1, 2]
[1, 3]
[2, 4]
[5]

And I have also tested this with many other exact cover examples and it checks out. However, the details regarding the secondary columns is a bit vague. From what I could gather from various Knuth/non-Knuth resources, it says that what you need to do is:

The only difference is that we initialize the data structure by making a circular list of the column headers for the primary columns only. The header for each secondary column should have L and R fields that simply point to itself. The remainder of the algorithm proceeds exactly as before, so we will still call it algorithm DLX.

After making these changes to how the matrix/nodes/headers is represented and then setting the first column as a secondary column (i.e., only columns 2 and 3 are primary), I end up with the following sets of rows as solutions:

[0, 1]
[1, 3]
[4]
[5]

While all of these are valid solutions and some overlap with exact the exact cover solutions, it appears that other solutions are missing (i.e., some of those from the exact cover solution set):

[0, 1, 2]
[2, 4]

Perhaps this is a misunderstanding on my part but am I missing something conceptually or is Knuth’s explanation incomplete?

It would even be helpful if you could show that your algorithm produces the full set of solutions and this will help me confirm that my algorithm is incomplete!

Unfortunately, even Knuth's "Art of Computer Programming" regarding dancing links doesn't seem to offer too much help.


Solution

  • Definition

    This is how the exact cover problem is extended to non-primary items on page 7 (currently) of Pre-Fascicle 5C, Dancing Links:

    ...an exact cover problem involves N distinct items, of which N1 are primary and N2 = N - N1 are secondary. It is defined by a family of options, each of which is a subset of the items. Every option must include at least one primary item. The task is to find all subsets of the options that (i) contain every primary item exactly one, and (ii) contain every secondary item at most once.

    (Options that are purely secondary are excluded from this new definition, because they will never be chosen by Algorithm X as we've refined it. If for some reason you don't like that rule, you can always go back to the idea of slack options. Exercise 19 discusses another interesting alternative.)

    Your question is answered by the emphasized (by Knuth) sentence in the first paragraph, and by the second paragraph. The exact cover problem being solved by Knuth disallows (or ignores) options that don't help cover the primary items, i.e. are made up purely of secondary items.

    Example

    In your example in the question, let's call the columns A, B, C, where A is secondary, and B and C are primary. The options are:

    [0, 1, 0] -- option 0: [B]
    [0, 0, 1] -- option 1: [C]
    [1, 0, 0] -- option 2: [A]
    [1, 1, 0] -- option 3: [A, B]
    [0, 1, 1] -- option 4: [B, C]
    [1, 1, 1] -- option 5: [A, B, C]
    

    So here the third row [1 0 0], i.e. option 2, contains no primary items.

    You can run either of Knuth's programs DANCE or DLX1 with the following input in a file called (say) foo.dlx:

    B C | A
    B
    C
    A
    A B
    B C
    A B C
    

    The programs find the same four solutions:

    $ ./dance 1 < foo.dlx
    1:
     C (1 of 3)
     B (1 of 2)
    2:
     C (1 of 3)
     B A (2 of 2)
    3:
     C B (2 of 3)
    4:
     C A B (3 of 3)
    Altogether 4 solutions, after 12 updates.
    

    or

    % ./dlx1 m1 < foo.dlx
    Option ignored (no primary items): A
    (5 options, 2+1 items, 14 entries successfully read)
    1:
     C (1 of 3)
     B (1 of 2)
    2:
     C (1 of 3)
     B A (2 of 2)
    3:
     C B (2 of 3)
    4:
     C A B (3 of 3)
    Altogether 4 solutions, 261+231 mems, 12 updates, 360 bytes, 6 nodes.
    

    (Note the explicit warning in the second program, that Option 2, which contains only the secondary item A, is ignored.)

    Solution 1: Change the problem

    If you remove options (rows) that contain no primary items (columns), then the program already works: the solutions you get are indeed exhaustive for the new problem.

    Solution 2: Slack options

    As Knuth says in the second quoted paragraph (ignore the Exercise 19 alternative; it's for solving a different problem), if you really want to include options that contain only secondary items, you can go back to the idea of slack options. In the 2000 paper, this idea is the very next sentence after the paragraph you quoted:

    A generalized cover problem can be converted to an equivalent exact cover problem if we simply append one row for each secondary column, containing a single 1 in that column.

    (That is, for every secondary item, we add an option that contains only that item, and now treat it as an exact cover problem with only primary items.)

    In more detail, I assume you want to solve the following problem:

    To solve this problem, we can do the following:

    In your example in the question: X is the following set:

    [0, 1, 0] -- option 0: [B]
    [0, 0, 1] -- option 1: [C]
    [1, 1, 0] -- option 3: [A, B]
    [0, 1, 1] -- option 4: [B, C]
    [1, 1, 1] -- option 5: [A, B, C]
    

    and Y is the following set:

    [1, 0, 0] -- option 2: [A]
    

    and Z is same as Y, so you don't need to add anything in this case.

    You solve the original exact cover problem (everything is primary), and get the following solutions:

    This gives all six solutions.