pythonnumpybinarybituint16

fast way to convert array of 16 bit unsigned interger to bits


I have a large dataset containing a 3d array of 16 bit unsigned integer. I want to convert each of the integer into bits and then only retain whose 8:12 bits are "0000" So far I am using a very slow method of loop in three stages:

import numpy as np
# Generate random data
a = np.ones([4,1200,1200], dtype="int16")
# Generate an array which serves later as mask
b = np.zeros(a.shape, dtype=int)
for i in range(4):
    for j in range(1200):
        for k in range(1200):
            b[i,j,k] = int('{:016b}'.format(a[i,j,k])[8:12])
a = np.ma.masked_where(b!=0, a)

I would be thankfut if you could suggest me a clean and fast alternative for this


Solution

  • Your question and example are a bit confusing, but generally if you want to focus on certain bits you can apply the binary and operator & with the right mask. So, if you want to select "8:12 bits" in a 16 bit unsigned integer, that mask would be 0b0000000011110000 which is 240.

    For example, with arr = np.random.randint(0, 2 ** 16 - 1, (6, 6)), I've got

    array([[28111, 29985,  2056, 24534,  2837, 49004],
           [ 7584,  8798, 38715, 40600, 26665, 51545],
           [34279,  8134, 16112, 59336, 15373, 46839],
           [  131, 12500, 11779, 44852, 57627, 50253],
           [63222, 60588,  9191,  3033, 18643,  8975],
           [17299, 62925, 31776, 10933, 59953, 28443]])
    

    and then np.ma.masked_where(arr & 240, arr) yields

    masked_array(
      data=[[--, --, 2056, --, --, --],
            [--, --, --, --, --, --],
            [--, --, --, --, 15373, --],
            [--, --, 11779, --, --, --],
            [--, --, --, --, --, 8975],
            [--, --, --, --, --, --]],
      mask=[[ True,  True, False,  True,  True,  True],
            [ True,  True,  True,  True,  True,  True],
            [ True,  True,  True,  True, False,  True],
            [ True,  True, False,  True,  True,  True],
            [ True,  True,  True,  True,  True, False],
            [ True,  True,  True,  True,  True,  True]],
      fill_value=999999)
    

    which is consistent with what you'd get using your for loop.