pythonpython-3.xstatistical-test

What statistical tests can I run to test the randomness of binary strings using python?


I'm having issues implementing the block frequency test in Python to understand the randomness of a binary string. I was wondering if anyone would be able to help me out in understanding why the code wont run.

Also, are there any statistical tests to test the randomness of a binary string in Python or possibly Matlab?

from importlib import import_module
import_module
from tokenize import Special
import math
def block_frequency(self, bin_data: str, block_size=4):
    """
     Note that this description is taken from the NIST documentation [1]
    [1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf
    The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine
    whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption
    of randomness. For block size M=1, this test degenerates to the monobit frequency test.
    :param bin_data: a binary string
    :return: the p-value from the test
    :param block_size: the size of the blocks that the binary sequence is partitioned into
    """
# Work out the number of blocks, discard the remainder
(num_blocks)= math.floor((1010110001001011011010111110010000000011010110111000001101) /4)
block_start, block_end = 0, 4
# Keep track of the proportion of ones per block 
proportion_sum = 0.0
for i in range(num_blocks):
    # Slice the binary string into a block 
    block_data = (101010001001011011010111110010000000011010110111000001101)[block_start:block_end]
    # Keep track of the number of ones 
    ones_count = 0
    for char in block_data:
        if char == '1':
           ones_count += 1
    pi = ones_count / 4
    proportion_sum += pow(pi - 0.5, 2.0) 
    # Update the slice locations 
    block_start += 4
    block_end += 4 
    # Calculate the p-value
    chi_squared = 4.0 * 4 * proportion_sum
    p_val = Special.gammaincc(num_blocks / 2, chi_squared / 2)
    print(p_val)

Solution

  • There are three issues that I see with your code.

    1. Using a hardcoded value in two different places. This is bad practice and error prone. I know this probably isn't what the OP was referring to, but it's worth fixing while we're at it.
    2. A string of binary bits (especially one comparing to "1" further down) should be encapsulated in quotation marks, not parentheses. That's one of the errors being thrown, 'cause the way it's written now you've got a large integer which your trying to "index". (This goes along with using len where necessary and some other minor changes).
    3. You're using the wrong module...You probably mean to use scipy.special.gammainc and not tokenize.Special.gammaincc, which doesn't exist anyhow.

    Putting it all together, try something like:

    from importlib import import_module
    from scipy.special import gammainc
    import_module
    import math
    
    
    def block_frequency(self, bin_data: str, block_size=4):
        """
         Note that this description is taken from the NIST documentation [1]
        [1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf
        The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine
        whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption
        of randomness. For block size M=1, this test degenerates to the monobit frequency test.
        :param bin_data: a binary string
        :return: the p-value from the test
        :param block_size: the size of the blocks that the binary sequence is partitioned into
        """
    
    
    # Work out the number of blocks, discard the remainder
    my_binary_string = '101010001001011011010111110010000000011010110111000001101'
    num_blocks = math.floor(len(my_binary_string) / 4)
    block_start, block_end = 0, 4
    # Keep track of the proportion of ones per block 
    proportion_sum = 0.0
    for i in range(num_blocks):
        # Slice the binary string into a block 
        block_data = my_binary_string[block_start:block_end]
        # Keep track of the number of ones 
        ones_count = 0
        for char in block_data:
            if char == '1':
                ones_count += 1
        pi = ones_count / 4
        proportion_sum += pow(pi - 0.5, 2.0)
        # Update the slice locations 
        block_start += 4
        block_end += 4
        # Calculate the p-value
        chi_squared = 4.0 * 4 * proportion_sum
        p_val = gammainc(num_blocks / 2, chi_squared / 2)
        print(p_val)