pythonmatrixrecursive-backtrackingbranching-strategyknuth

Define contraints for specific exact cover problem


I have a specific problem in mind, that I want to solve using Knuth's Algorithm X. However, I struggle to translate my problem into suitable contraints, that make up the incidence matrix for Algorithm X to operate on.

For my sports club summer tournament, I want to come up with a schedule that will group four players together, without regrouping any pair of players in subsequent playing rounds.

I figured that this translates nicely into an exact cover problem like this:

After setting this up for 20 players, I've got an incidence matrix with 20 columns and 4845 rows (969 rows per player/column).

Algorithm X will find a solution just nicely, but this will cover just one (the first) round. Letting the algorithm continue will spit out more alternative solutions for the same round, which is not of interest to me. So I build an iterator around the algorithm, that will take the solution and remove rows from the incidence matrix based on player overlap: Whenever a group from the solution has an intersection of at least 2 with any row of the matrix, that row is removed. After the first run of the algorithm, the matrix is culled down to 1280 rows. Running Algorithm X will find the next solution, etc. until it doesn't anymore.

Cutting this long story short, this approach isn't a proper application of the exact cover problem - I had to find part solutions iteratively. The correct exact cover problem should include the sequence of playing rounds somehow. Why? Because now I do not explore the full range of possible solutions! The player count of 20 is the best example for that. Algorithm X will find solutions for just 3 successive rounds. Yet, I do know that there are at least 5, when different intermediate solutions are chosen. This is precisely the job that I had hoped Algorithm X could address for me. With the above approach, there is no backtracking between playing rounds.

Even though the question is abstract enough that code shouldn't be necessary, here is my implementation of Knuth's DLX (Algorithm X with Dancing Links) in Python:

from itertools import combinations
def dancing_links (players):
    """
    Implement the dancing links algorithm as described by Donald Knuth, which
    attemts to solve an exact cover problem exhaustively in an efficient way.
    Adapted for my use case, I define the incidence matrix as follows:
        * Columns are players.
        * Rows are groups of players.
        * The intersection of groups and players mean that that player is a
          member of that group.
        * One group contains exactly four players, i.e. each row has
          exactly four 1s.
        * Repeatedly solve the exact cover problem for a reduced set of groups,
          where each round the total set of groups is filtered for illegal
          groups. An illegal group features at least two players that
          have already played together in a round.
    """

    class FoundSolution (Exception):
        "Use the exception to abort recursive stacks"
        pass

    # Dancing links is based on "doubly linked lists" intersecting
    # with each other. Python doesn't have this kind of data structure
    # and implementing it is quite expensive. E.g. each field of the incidence
    # matrix could be a Node which has pointers in all four directions,
    # The Node class with 6 attributes (four pointers, a name and arbitrary
    # data) needs to undergo countless name lookups, which is a slow process
    # in Python. So instead, I represent each node without a class definition
    # as a dict.
    # 
    # Since we're walking over so many doubly linked lists, starting from
    # any of its nodes, we need to remember where we started and iterate
    # through all of them. That clutters our code later on a lot without
    # this iterator function.
    def iter_dll (start, direction='right'):
        next = start[direction]
        # Need to explicitly compare object ids. Otherwise Python
        # would try to do a deep comparison of two dicts. which is impossible
        # due to the circular referencing.
        while id(start) != id(next):
            yield next
            next = next[direction]

    def cover (column):
        """
        Cover a column by removing its head node from the control row and
        removing each of its rows from other columns that intersect.
        """
        column['left']['right'] = column['right']
        column['right']['left'] = column['left']
        for r in iter_dll(column, 'down'):
            for c in iter_dll(r):
                c['up']['down'] = c['down']
                c['down']['up'] = c['up']

    def uncover (column):
        # Undo the changes caused by a call to cover(dll) by injecting the
        # linked nodes with the remembered predecessor and successor.
        for r in iter_dll(column, 'up'):
            for c in iter_dll(r, 'left'):
                c['up']['down'] = c['down']['up'] = c
        else:
            column['left']['right'] = column['right']['left'] = column

    def search (i, root):
        if id(root['right']) == id(root):
            # The only way to exit the complete recursion stack is an exception.
            raise FoundSolution
        for c in iter_dll(root):
            cover(c)
            for r in iter_dll(c, 'down'):
                lineup.append(r)
                for j in iter_dll(r):
                    cover(j['data'])
                search(i+1, root)
                lineup.pop()
                for j in iter_dll(r, 'left'):
                    uncover(j['data'])
            else:
                uncover(c)

    def generate_incidence_matrix (groups):
        # The gateway to our web of doubly linked lists.
        root = {'name': 'root', 'data': None}
        # Close the circle in left and right dirctions, so we can keep the
        # circle closed while injecting new nodes.
        root['right'] = root['left'] = root
        # The control row of column headers is created by attaching each new
        # Header with the previous one.
        for player in players:
            n = {
                'name': 'Headnode {}'.format(player),
                'data': player,
                'right': root,
                'left': root['left'],
            }
            n['up'] = n['down'] = n
            root['left']['right'] = root['left'] = n

        # Now append nodes to each column header node in our control row -
        # one for each player of a distinct group of four players.
        rnmbr = 0
        # Seed for new row nodes
        seed = {'name': 'seed', 'data': None}
        for g in groups:
            rnmbr += 1
            seed['right'] = seed['left'] = seed
            # Iterate through the control nodes for each of the four players.
            for header in (m for m in iter_dll(root) for p in g if m['data'] == p):
                n = {
                    # Chose a name that identifies the row and colum for this
                    # new node properly.
                    'name': 'R-{},C-{}'.format(rnmbr, header['data']),
                    'data': header,
                    'up': header['up'],
                    'down': header,
                    'left': seed,
                    'right': seed['right']
                }
                header['up']['down'] = header['up'] = n
                seed['right']['left'] = seed['right'] = n
            else:
                # Extract the seed from this row
                seed['right']['left'] = seed['left']
                seed['left']['right'] = seed['right']

        return root

    groups = tuple(combinations(players, 4))    
    groups_per_round = len(players)/4
    lineups = []

    while len(groups) >= groups_per_round:
        root = generate_incidence_matrix(groups)
        lineup = []
        try:
            search(0, root)
        except FoundSolution:
            lineup = reduce(list.__add__, ([r['data']['data']] + [n['data']['data'] for n in iter_dll(r)] for r in lineup))
            lineup = tuple(tuple(sorted(lineup[i:i + 4])) for i in xrange(0, len(lineup), 4))
            lineups.append(lineup)
            groups = tuple(group for group in groups if all(len(g.intersection(set(group))) < 2 for g in (set(s) for s in lineup))) 
        else:
            break

    return lineups

Given a list of players, this function will print the intermediate solutions to screen until the options are exhausted. Sadly, it isn't as fast as I'd hoped for, but it was a nice programming exercise for me. :-)

Calling the dancing_links() function as defined above will yield the following output...

>>> pprint.pprint(dancing_links(range(1,21)))
[((1, 2, 3, 4), (5, 6, 7, 8), (9, 10, 11, 12), (13, 14, 15, 16), (17, 18, 19, 20)),
 ((1, 5, 9, 13), (2, 6, 10, 17), (3, 7, 14, 18), (4, 11, 15, 19), (8, 12, 16, 20)),
 ((1, 6, 11, 14), (2, 5, 12, 18), (3, 8, 13, 19), (4, 9, 16, 17), (7, 10, 15, 20))]

What I had expected is more like...

[((1, 2, 3, 4), (5, 6, 7, 8), (9, 10, 11, 12), (13, 14, 15, 16), (17, 18, 19, 20)),
 ((1, 5, 9, 13), (2, 6, 10, 17), (3, 7, 14, 18), (4, 11, 15, 19), (8, 12, 16, 20)),
 ((1, 12, 15, 18), (2, 5, 16, 19), (3, 6, 9, 20), (4, 7, 10, 13), (8, 11, 14, 17)),
 ((1, 7, 11, 20), (2, 8, 13, 18), (3, 5, 10, 15), (4, 9, 16, 17), (6, 12, 14, 19)),
 ((1, 8, 10, 19), (2, 7, 9, 15), (3, 12, 13, 17), (4, 5, 14, 20), (6, 11, 16, 18))]

Note that it doesn't have to be this exact solution. It is just an example solution that I've found during my attempts to eventually generate a schedule for an arbitrary number of players.


Solution

  • (Completely rewrote answer.)

    Although it might be possible to cast this problem as an exact cover problem and (in principle) solve it with Algorithm X, in practice that turns out to be impractical and there is a better way.


    First to answer your question as asked: Recall the exact cover problem: given a bunch of “items” (to be covered) and a bunch of “options” (each of which covers a set of items), the problem is to find a set of options such that each item is covered exactly once.

    In your case, if you pick the (20) players as items and groups-of-four-players as options, then, as you found, any algorithm for the exact cover problem will find ways of scheduling one round of the tournament.

    But in fact you don't need that at all, as (in the absence of further constraints) you can write down all solutions explicitly: there are (20 choose 4)=4845 ways of choosing the first group-of-4, then (16 choose 4)=1820 ways of choosing another group-of-4 from the remaining, and so on, and finally you don't care about the ordering among the five groups-of-4 you find, so the number of ways you can partition a set of 20 people into five groups-of-4 is

    (choose(20, 4) * choose(16, 4) * choose(12, 4) * choose(8, 4) * choose(4, 4)) / (5 * 4 * 3 * 2 * 1) = 2546168625.

    (Equivalently: (20!)/((4!)^5 * 5!) = 2546168625, as we can write down a list of 20 players, then reorder within each group of 4 and also reorder the groups.)

    If you'd like to generate all of them with a program, you can write each of them in canonical order: suppose you call the 20 players 'A' to 'T', then you can write each group-of-4 in lexicographic order (thus, the group {F, J, B, Q} would be written “BFJQ”), and finally you can write down the five groups themselves in lexicographic order (thus the first group will start with "A", the second group with the earliest letter not in the first group, etc).

    Next, if you want to cover multiple rounds, to frame it as something like an exact-cover problem again, you'd need to have the above number (≈2.5 billion) of “options” (rows). It's not clear what the items would be, but this is clearly going to be impractical, so it's not worth pursuing this line of thought.


    Instead, it turns out that your problem is well-studied, originally under the name of Kirkman's schoolgirl problem (scheduling, from 15 people, groups of 3 as many times as possible — turns out to be 7) (see Ray Toal's page here), and more recently as the “social golfer problem” (see Markus Triska's page here):

    The Social Golfer Problem (SGP) is a combinatorial optimisation problem. The task is to schedule g × p golfers in g groups of p players for w weeks such that no two golfers play in the same group more than once. An instance of the SGP is denoted by the triple g-p-w. The original problem asks for the maximal w such that the instance 8-4-w can be solved.

    And your question here asks for the maximal w such that the instance 5-4-w can be solved. The answer turns out to be w=5, and the MathWorld page on the problem gives exactly this 5-4-5 instance:

    Mon ABCD    EFGH    IJKL    MNOP    QRST
    Tue AEIM    BJOQ    CHNT    DGLS    FKPR
    Wed AGKO    BIPT    CFMS    DHJR    ELNQ
    Thu AHLP    BKNS    CEOR    DFIQ    GJMT
    Fri AFJN    BLMR    CGPQ    DEKT    HIOS
    

    See also Warwick Harvey's page which has (had) the best known solutions for many different parameters. It records that 5 is both the upper- and lower-bound for this problem, i.e. it is known how to schedule 5 rounds (you found one yourself!) and known that it is not possible to schedule more than 5 rounds.

    You can search the literature for “social golfer problem” to find more approaches to solving it by computer. More generally, problems like this are studied in the vast area of combinatorics known as Combinatorial Designs. One great reference I found while writing an earlier answer is the book Handbook of Combinatorial Designs, which has chapters like VI.3 Balanced Tournament Designs, and VI.51 Scheduling a Tournament.