Let's say we have an ordered 1D tensor of ints/longs t
.
I want to produce a new tensor a
with size max(t)
where each term a[i]
contains the first occurrence of the value i
in the tensor t
.
We could easily compute the tensor a
using standard Python code, but it would be too slow for large input as the one I'm using.
I'm looking for a fast solution that can run on GPU using the CUDA backend of PyTorch or simply a fast solution.
You can do this:
def first_occurrence_indices(t):
max_val = torch.max(t) + 1
a = torch.full((max_val,), -1, dtype=torch.long, device=t.device)
# Get the first occurrence index using `scatter_reduce`
indices = torch.arange(len(t), device=t.device)
a.scatter_reduce_(0, t, indices, reduce="amin", include_self=False)
return a
Now you can use it like so:
>>> t = torch.tensor([2, 1, 3, 2, 3, 0, 1, 5], device="cuda")
>>> first_occurrence_indices(t)
tensor([ 5, 1, 0, 2, -1, 7])
Notice that "4" is not in our tensor, so the index is still the default fill value of -1.
That said, I would test to see how efficient scatter_reduce
is with your full tensors.