pythonpython-3.xpytorchsparse-matrixtensor

How to produce tensor of first occurrencies of another tensor using PyTorch


Let's say we have an ordered 1D tensor of ints/longs t.

I want to produce a new tensor a with size max(t) where each term a[i] contains the first occurrence of the value i in the tensor t.

We could easily compute the tensor a using standard Python code, but it would be too slow for large input as the one I'm using.

I'm looking for a fast solution that can run on GPU using the CUDA backend of PyTorch or simply a fast solution.


Solution

  • You can do this:

    def first_occurrence_indices(t):
        max_val = torch.max(t) + 1
        a = torch.full((max_val,), -1, dtype=torch.long, device=t.device)
    
        # Get the first occurrence index using `scatter_reduce`
        indices = torch.arange(len(t), device=t.device)
        a.scatter_reduce_(0, t, indices, reduce="amin", include_self=False)
    
        return a
    

    Now you can use it like so:

    >>> t = torch.tensor([2, 1, 3, 2, 3, 0, 1, 5], device="cuda")
    >>> first_occurrence_indices(t)
    tensor([ 5,  1,  0,  2, -1,  7])
    

    Notice that "4" is not in our tensor, so the index is still the default fill value of -1.

    That said, I would test to see how efficient scatter_reduce is with your full tensors.