pythonnumpy

Numpythonic way to fill value based on range indices reference (label encoding from given range indices)


I have this tensor dimension:

(batch_size, class_id, range_indices) -> (4, 3, 2)
int64
[[[1250 1302]
  [1324 1374]
  [1458 1572]]

 [[1911 1955]
  [1979 2028]
  [2120 2224]]

 [[2546 2599]
  [2624 2668]
  [2765 2871]]

 [[3223 3270]
  [3286 3347]
  [3434 3539]]]

How do I construct densed representation with filled value with this rule:

Since there is 3 class IDs, therefore:

  1. Class ID 0: filled with 1
  2. Class ID 1: filled with 2
  3. Class ID 2: filled with 3
  4. Default: filled with 0

Therefore, it will outputting vector like this:

[0 0 0 ...(until 1250)... 1 1 1 ...(until 1302)... 0 0 0 ...(until 1324)... 2 2 2 ...(until 1374)... and so on]

Here is copiable code:

data = np.array([[[1250, 1302],
                  [1324, 1374],
                  [1458, 1572]],

                 [[1911, 1955],
                  [1979, 2028],
                  [2120, 2224]],

                 [[2546, 2599],
                  [2624, 2668],
                  [2765, 2871]],

                 [[3223, 3270],
                  [3286, 3347],
                  [3434, 3539]]])

Here is code generated by ChatGPT, but I'm not sure it's Numpythonic way since it's using list comprhension:

import numpy as np

# Given tensor
tensor = np.array([[[1250, 1302],
                    [1324, 1374],
                    [1458, 1572]],

                   [[1911, 1955],
                    [1979, 2028],
                    [2120, 2224]],

                   [[2546, 2599],
                    [2624, 2668],
                    [2765, 2871]],

                   [[3223, 3270],
                    [3286, 3347],
                    [3434, 3539]]])

# Determine the maximum value in the tensor to define the size of the output array
max_value = tensor.max()

# Create an empty array filled with zeros of size max_value + 1
dense_representation = np.zeros(max_value + 1, dtype=int)

# Generate the class_ids array, replicated for each batch
class_ids = np.tile(np.arange(1, tensor.shape[1] + 1), tensor.shape[0])

# Generate start and end indices
start_indices = tensor[:, :, 0].ravel()
end_indices = tensor[:, :, 1].ravel()

# Create an array of indices to fill
indices = np.hstack([np.arange(start, end) for start, end in zip(start_indices, end_indices)])

# Create an array of values to fill
values = np.hstack([np.full(end - start, class_id) for start, end, class_id in zip(start_indices, end_indices, class_ids)])

# Fill the dense representation array
dense_representation[indices] = values

# The resulting dense representation
print(dense_representation)
print(dense_representation[1249:1303])
print(dense_representation[1323:1375])
print(dense_representation[1457:1573])
print(dense_representation[1910:1956])

Output:

[0 0 0 ... 3 3 0]
[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0]
[0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0]
[0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 0]
[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 0]

Solution

  • IIUC, you could craft the output array with zeros, repeat, tile:

    start = data[..., 0].ravel()
    end = data[..., 1].ravel()
    slices = [slice(a,b) for a,b in zip(start, end)]
    n = end-start
    out =  np.zeros(data.max(), dtype='int')
    out[np.r_[*slices]] = np.repeat(np.tile(np.arange(data.shape[1])+1, data.shape[0]), n)
    

    Variant with boolean indexing:

    start = data[..., 0].ravel()
    end = data[..., 1].ravel()
    out =  np.zeros(data.max(), dtype='int')
    idx = np.arange(len(out))
    m = ((idx >= start[:, None]) & (idx < end[:, None])).any(axis=0)
    n = end-start
    out[m] = np.repeat(np.tile(np.arange(data.shape[1])+1, data.shape[0]), n)
    

    Or:

    start = data[..., 0].ravel()
    end = data[..., 1].ravel()
    out =  np.zeros(data.max(), dtype='int')
    idx = np.arange(len(out))
    
    m1 = ((idx >= start[:, None]) & (idx < end[:, None]))
    m2 = m1.any(axis=0)
    nums = np.tile(np.arange(data.shape[1])+1, data.shape[0])
    
    out[m2] = nums[m1[:, m2].argmax(0)]
    

    Output:

    [0 0 0 ... 3 3 3]