rubiks-cube

How to create a pattern database for solving Rubik's Cube?


I'm trying to implement Korf's algorithm for solving 3x3x3 Rubik's Cube. Part of the solution is to create a pattern database.

This is quote from the paper that contains, literally, the whole information on how to do it:

Using a breadth-first search from the goal state, we can enumerate these states, and record in a table the number of moves required to solve each combination of corner cubies.

How do you transform this in code? Since on each step, we have multiple goal states, it's not clear to me how we can just "enumerate" all states that are reachable from it.


Solution

  • I've implemented Korf's algorithm, and you can use my code as a reference: https://github.com/benbotto/rubiks-cube-cracker/ It's a lot of code, too much to include in this post, but I can give some general algorithmic tips.

    First off, Korf's paper recommends using three pattern databases, not just one. One of the databases stores the number of moves required to solve the corner pieces of any cube. There are 8 corner cubies, and each can occupy any of the 8 positions, so there are 8! possible permutations. Each of the corner pieces can be oriented in 3 different ways--any of the three stickers can face up, for example--but the orientations of 7 of the cubies dictate the orientation of the 8th (by the laws of the cube). Therefore, there are 3^7 possible ways the corners can be orientated. Altogether then, there are 8! * 3^7 possible ways for the corners of the cube to be scrambled, and these 88,179,840 states can be iterated in a reasonable amount of time (30 minutes or so). All corner states can be reached in 11 moves or fewer, so each entry in the corner pattern database can be stored in a nibble (4 bits). On disk, the corner pattern database occupies about 42MB.

    A breadth-first search can be used to populate this database. Apply a move and use the state of the corners to create an index into the database. If the state has been seen before with fewer moves, then the search tree can be pruned: there's no reason to continue down the branch; otherwise, add the state to the database and continue the search. As mentioned above, iterating over all possible corner states doesn't take long on a modern computer due to the amount of pruning that can be done during the search.

    Korf suggests two additional databases: one for 6 of the 12 edges, and another for the other 6 edges. Korf was on limited hardware (a Sun SPARC Ultra!), but since I'm on a more modern machine I chose to use 7 edges in each of the edge databases. This drastically speeds up the solver. Anyway, 7 edges can occupy 12 positions, so there are 12P7 (12! / (12-7)!) permutations. Each corner can be oriented in 2 ways, so there are 2^7 possible orientations of 7 edges. Again, this is a small enough number of cube states to iterate over, and all states can be reached in 10 moves or fewer. Storing each entry in a nibble, each of the 7-edge databases occupies about 244MB (12P7 * 2^7 / 2 bytes).

    I implemented breadth-first search using a non-recursive algorithm for efficiency reasons (and efficiency matters in this code). While this type of search worked fine for building the corner database, the memory costs were too high for indexing the edge databases. As such I used a custom iterative-deepening depth-first search for indexing the edges. The "custom" part is that it early-outs when it reaches a state that's already been encountered.

    The on-disk database sizes above of course assume that the database contains nothing but the number of moves to get to each state, each stored in a nibble. That is, the database is a hash table, and each state is an index into the table. As such, you need a "perfect hash" algorithm that takes a permutation of cubes and returns an index. In his papers, and he has multiple papers on combination puzzles, Korf is rather terse about how to create a such a hash. It boils down to calculating Lehmer codes. Korf gives a short and sweet linear algorithm in his paper, Large-Scale Parallel Breadth-First Search.

    We scan the permutation from left to right, constructing a bit string of length n, indicating which elements of the permutation we’ve seen so far. Initially the string is all zeros. As each element of the permutation is encountered, we use it as an index into the bit string and set the corresponding bit to one. When we encounter element k in the permutation, to determine the number of elements less than k to its left, we need to know the number of ones in the first k bits of our bit string. We extract the first k bits by right shifting the string by n − k. This reduces the problem to: given a bit string, count the number of one bits in it.

    We solve this problem in constant time by using the bit string as an index into a precomputed table, containing the number of ones in the binary representation of each index.

    It took me a long time to digest that and transform it to code, especially because he does not talk about indexing partial permutations. You'll need to index partial permutations when generating the pattern databases for the edge pieces because creating a database of all 12 edges would be incredibly large. As such, I wrote an article about it on Medium: https://medium.com/@benjamin.botto/sequentially-indexing-permutations-a-linear-algorithm-for-computing-lexicographic-rank-a22220ffd6e3

    Lastly, I tested a number of different data structures for storing the cube. In my code I have multiple solvers (Korf and Thisthlewaite), plus a graphical representation. I actually store the cube in 4 different structures. With an algorithm like Korf's, the structure that you use to represent the Rubik's Cube has an enormous (extra emphatic!) impact on the speed of the solver. I wrote about different structures in another post, and option (4) is by far the fastest with the Korf algorithm in my testing. To create a single database entry, you need the indexes and orientations of each cube (e.g. {0-7, 0-2} for the corners). So, when creating pattern databases, it's efficient to represent the cube as indexes and orientations so that no additional processing is required to compute them.