pythonnumpycompressionbitstring

Pack data, extreme Bitpacking in Python


I need to pack information as closely as possible into a bitstream.

I have variables with a different number of distinct states:

Number_of_states=[3,5,129,15,6,2]# A bit longer in reality

The best option I have in the Moment would be to create a bitfield, using

2+3+8+4+3+1 bit ->21 bit

However it should be possible to pack these states into np.log2(3*5*129*15*6*2)=18.4 bits, saving two bits. (In reality I have 298 bits an need to save a few)

In my case this would save about >5% of the data stream, which would help a lot.

Is there a viable solution in python to pack the data in this way? I tried packalgorithms, but they create too much overhead with just a few bytes of data. The string is no problem, it is constant and will be transmitted beforehand.

This is the code I am using in the moment:

from bitstring import pack
import numpy as np

DATA_TO_BE_PACKED=np.zeros(6)

Number_of_states=[3,5,129,15,6,2]#mutch longer in reality

DATA_TO_BE_PACKED=np.random.randint(Number_of_states)

string=''

for item in Number_of_states:
    string+='uint:{}, '.format(int(np.ceil(np.log2(item))))

PACKED_DATA = pack(string,*DATA_TO_BE_PACKED)

print(len(PACKED_DATA ))

print(PACKED_DATA.unpack(string))

Solution

  • This looks like a usecase of a mixed radix numeral system.

    A quick proof of concept:

    num_states = [3, 5, 129, 15, 6, 2]
    input_data = [2, 3, 78, 9, 0, 1]
    print("Input data: %s" % input_data)
    

    To encode, you start with a 0, and for each state first multiply by number of states, and then add the current state:

    encoded = 0
    for i in range(len(num_states)):
        encoded *= num_states[i]
        encoded += input_data[i]
    
    print("Encoded: %d" % encoded)
    

    To decode, you go in reverse, and get remainder of division by number of states, and then divide by number of states:

    decoded_data = []
    for n in reversed(num_states):
        v = encoded % n
        encoded = encoded // n
        decoded_data.insert(0, v)
    
    print("Decoded data: %s" % decoded_data)
    

    Example output:

    Input data: [2, 3, 78, 9, 0, 1]
    Encoded: 316009
    Decoded data: [2, 3, 78, 9, 0, 1]
    

    Another example with more values:

    Input data: [2, 3, 78, 9, 0, 1, 84, 17, 4, 5, 30, 1]
    Encoded: 14092575747751
    Decoded data: [2L, 3L, 78L, 9L, 0L, 1L, 84L, 17L, 4L, 5L, 30L, 1L]