Compression of chess position

I have developed a chess engine which is playing at roughly 3300 Elo. I need to process and tune a set of positions I have generated with my engine. For this, it would be ideal to store roughly more than 1B positions in my RAM. I am trying to figure out the most efficient way to store pure positional chess data.

Information like castling, en passant etc. can be ignored.

Ideas

The naive aproach would be using bitboards which would require exactly 12 bitboards (for each piece) which would end up with 12*8=96 byte.
Alternatively, I could only use 8 bitboards where I use 6 for the different pieces and 2 more for the different colors =64 byte.
Theoretically, I could get away with 7 bitboards using the aproach above. I would not need 2 bitboards for the colors. It would be enough to just have those for white and I could easily conclude if a piece is white or black. = 56 byte
I could come up with further compression using bitboards which would resolve in 48 byte yet I cannot get below that with bitboards.
One could also store the pure positional information for each piece. e.g. use the first 6 bits for the white king, the next 6 bits for the black king, the next 6 bits for the white queen etc. Note that one would require 8*(6+3) bits for pawns here cause pawns could promote to other pieces which would need to be encoded within 3 bits. Problematic would be encoding the absence of a piece.
There are some compression techniques like Huffman trees, yet I would also require some fast computation with the pieces on the board.

So what would be the fastest/best compression technique to store a chess position?

Greetings Finn

Solution

Huffman codes work well here, and they are likely faster than you think.

Each square is represented by a variable number of bits. The first bit is empty (0), or occupied (1). If 0, then the next bit is for the next square. If 1, then the next bit is white (0) or black (1). The next one to four bits is the type of piece: pawn (0), knight (100), bishop (101), rook (110), queen (1110), king (1111).

This codes a board in 16.9 bytes on average. (Run over one million games.) A full board codes to the maximum of 20.5 bytes.

Decoding can be done very fast with a table lookup. The longest code is six bits. Grab the next six bits and look up in a 64-entry table. Each entry in the table is the contents of the square (e.g. black bishop, white king, empty), and the number of bits in the code (1..6). Drop that many bits, get more bits to fill out what's left to six bits, and repeat.

You can use that to convert to a simple 64-byte representation for fast manipulation, and if needed, the result can be quickly converted back to the compressed form.

More can be shaved off if needed. Represent empty or occupied as eight columns (files), each as eight bits. Huffman code those eight-bit symbols. They can be coded in 6.6 bits on average, dropping the 16.9 bytes to 15.5 bytes.

Instead of one code for the files, you can shave off another 2.2 bytes with four codes for the files: one for a/h, one for b/g, one for c/f, and one for d/e. That gets the average down to 13.3 bytes.