pythonnumpybinaryxorhamming-distance

Fastest way to get hamming distance for integer array


Let a and b be vectors of the same size with 8-bit integers (0-255). I want to compute the number of bits where those vectors differs i.e. a Hamming distance between vectors formed by concatenation of binary representations of those numbers. For example:

a = [127,255]
b= [127,240]

Using numpy library

np.bitwise_xor(a,b)
# Output: array([ 0, 15])

What I need is now to binary represent each element of the above array and count number of 1s in all the elements of the array. The above example will give hamming distance of 0+4 = 4. Any fast and elegant solution for this in Python?


Solution

  • Approach #1 : We could broadcast them into binary bits & count number of different bits, like so -

    def hamming_distance(a, b):
        r = (1 << np.arange(8))[:,None]
        return np.count_nonzero( (a & r) != (b & r) )
    

    Sample run -

    In [144]: a = [127,255]
         ...: b = [127,240]
         ...: 
    
    In [145]: hamming_distance(a, b)
    Out[145]: 4
    

    Approach #2 : Using bitwise-xor operation, we can find out the number of different binary bits between a and b -

    def hamming_distance_v2(a, b):
        r = (1 << np.arange(8))[:,None]
        return np.count_nonzero((np.bitwise_xor(a,b) & r) != 0)