pythonpytorchtensor

groupby aggregate product in PyTorch


I have the same problem as groupby aggregate mean in pytorch. However, I want to create the product of my tensors inside each group (or labels). Unfortunately, I couldn't find a native PyTorch function that could solve my problem, like a hypothetical scatter_prod_ for products (equivalent to scatter_add_ for sums), which was the function used in one of the answers.

Recycling the example code from @elyase's question, consider the 2D tensor:

samples = torch.Tensor([
    [0.1, 0.1],    #-> group / class 1
    [0.2, 0.2],    #-> group / class 2
    [0.4, 0.4],    #-> group / class 2
    [0.0, 0.0]     #-> group / class 0
])

with labels where it is true that len(samples) == len(labels)

labels = torch.LongTensor([1, 2, 2, 0])

So my expected output is:

res == torch.Tensor([
    [0.0, 0.0],
    [0.1, 0.1], 
    [0.08, 0.08] # -> PRODUCT of [0.2, 0.2] and [0.4, 0.4]
])

Here the question is, again, following @elyase's question, how can this be done in pure PyTorch (i.e. no numpy so that I can autograd) and ideally without for loops?

Crossposted in PyTorch forums.


Solution

  • You can use the scatter_ function to calculate the product of the tensors in each group.

    samples = torch.Tensor([
        [0.1, 0.1],    #-> group / class 1
        [0.2, 0.2],    #-> group / class 2
        [0.4, 0.4],    #-> group / class 2
        [0.0, 0.0]     #-> group / class 0
    ])
    
    labels = torch.LongTensor([1,2,2,0])
    
    label_size = 3
    sample_dim = samples.size(1)
    
    index = labels.unsqueeze(1).repeat((1, sample_dim))
    
    res = torch.ones(label_size, sample_dim, dtype=samples.dtype)
    res.scatter_(0, index, samples, reduce='multiply')
    

    res:

    tensor([[0.0000, 0.0000],
            [0.1000, 0.1000],
            [0.0800, 0.0800]])