I have developed a chess engine which is playing at roughly 3300 Elo. I need to process and tune a set of positions I have generated with my engine. For this, it would be ideal to store roughly more than 1B positions in my RAM. I am trying to figure out the most efficient way to store pure positional chess data.
Information like castling, en passant etc. can be ignored.
12*8=96 byte
.=64 byte
.= 56 byte
48 byte
yet I cannot get below that with bitboards.So what would be the fastest/best compression technique to store a chess position?
Greetings Finn
Huffman codes work well here, and they are likely faster than you think.
Each square is represented by a variable number of bits. The first bit is empty (0), or occupied (1). If 0, then the next bit is for the next square. If 1, then the next bit is white (0) or black (1). The next one to four bits is the type of piece: pawn (0), knight (100), bishop (101), rook (110), queen (1110), king (1111).
This codes a board in 16.9 bytes on average. (Run over one million games.) A full board codes to the maximum of 20.5 bytes.
Decoding can be done very fast with a table lookup. The longest code is six bits. Grab the next six bits and look up in a 64-entry table. Each entry in the table is the contents of the square (e.g. black bishop, white king, empty), and the number of bits in the code (1..6). Drop that many bits, get more bits to fill out what's left to six bits, and repeat.
You can use that to convert to a simple 64-byte representation for fast manipulation, and if needed, the result can be quickly converted back to the compressed form.
More can be shaved off if needed. Represent empty or occupied as eight columns (files), each as eight bits. Huffman code those eight-bit symbols. They can be coded in 6.6 bits on average, dropping the 16.9 bytes to 15.5 bytes.
Instead of one code for the files, you can shave off another 2.2 bytes with four codes for the files: one for a/h, one for b/g, one for c/f, and one for d/e. That gets the average down to 13.3 bytes.