pythonpytorchpytorch-geometric

How to get the values from the list of tensors by matching indices in pytorch?


I have a question about calling the values from the list of tensors with multiple indices.
Although I think that there are similar questions such as here, I couldn't completely use it.

I have a dataset comprising the 4-dimensional features for about 108,000 nodes and their links.

tmp = []
for _ in range(4):
    tmp.append(torch.rand((107940, 4), dtype=torch.float).to(device))

tmp
# [tensor([[0.9249, 0.5367, 0.5161, 0.6898],
#         [0.2189, 0.5593, 0.8087, 0.9893],
#         [0.4344, 0.1507, 0.4631, 0.7680],
#         ...,
#         [0.7262, 0.0339, 0.9483, 0.2802],
#         [0.8652, 0.3117, 0.8613, 0.6062],
#         [0.5434, 0.9583, 0.3032, 0.3919]], device='cuda:0'),
# tensor([...], device='cuda:0'),
# tensor([...], device='cuda:0'),
# tensor([...], device='cuda:0')]
# batch.xxx: factors in the batch from the graph
# Note that batch.edge_index[0] is the target node and batch.edge_index[1] is the source node.
# If you need more information, please see the Pytorch Geometric data format.

print(batch.n_id[batch.edge_index])
print(batch.edge_index_class)

#tensor([[10231,  3059, 32075, 10184,  1187,  6029, 10134, 10173,  6521,  9400,
#         14942, 31065, 10087, 10156, 10158, 26377, 85009,   918,  4542, 10176,
#         10180,  6334, 10245, 10228,  2339,  7891, 10214, 10240, 10041, 10020,
#          7610, 10324,  4320,  5951,  9078,  9709],
#        [ 1624,  1624,  6466,  6466,  6779,  6779,  7691,  7691,  8655,  8655,
#         30347, 30347, 32962, 32962, 34435, 34435,  3059,  3059, 32075, 32075,
#          1187,  1187,  6029,  6029, 10173, 10173,  6521,  6521,  9400,  9400,
#         31065, 31065, 10087, 10087, 10158, 10158]], device='cuda:0')
#tensor([3., 3., 2., 2., 0., 0., 3., 3., 2., 2., 0., 0., 2., 2., 2., 2., 3., 3.,
#        2., 2., 0., 0., 0., 0., 3., 3., 2., 2., 2., 2., 0., 0., 2., 2., 2., 2.],
#       device='cuda:0')

In this case, I want the new tensor that contains the feature values matched to the edge_index_class.
For example, tmp_filled will have the 1624, 10231, and 3059th values from the fourth dataset in tmp because they are labeled with edge_index_class as 3. Similarly, 6466, 32075, and 10184th values in the third dataset in tmp will go into the same index in tmp_filled.

To do this, I tried the code as below:

for k in range(len(batch.edge_index_class)):
    tmp_filled[batch.n_id[torch.unique(batch.edge_index)]] = tmp[int(batch.edge_index_class[k].item())][batch.n_id[torch.unique(batch.edge_index)]]

tmp_filled
# tensor([[0., 0., 0., 0.],
#        [0., 0., 0., 0.],
#        [0., 0., 0., 0.],
#        ...,
#        [0., 0., 0., 0.],
#        [0., 0., 0., 0.],
#        [0., 0., 0., 0.]], device='cuda:0')

But it returned the wrong result.

tmp_filled[1624]
# tensor([0.3438, 0.5555, 0.6229, 0.7983], device='cuda:0')

tmp[3][1624]
# tensor([0.6895, 0.3241, 0.1909, 0.1635], device='cuda:0')

When I need the tmp_filled data to consist of (107940 x 4) format, how should I correct my code?

Thank you for reading my question!


Solution

  • The below code resulted in what I want. But if anyone has a more efficient solution, please feel free to answer.

    for edge_index_class in torch.unique(batch.edge_index_class):
        # Find indices where edge_index_class matches
        indices = (batch.edge_index_class == edge_index_class).nonzero(as_tuple=True)[0]
        
        # Extract corresponding edge_index and n_id
        # edge_index = batch.edge_index[:, indices]
        n_id = torch.unique(batch.n_id[batch.edge_index[:, indices]])
        
        tmp_filled[n_id] = tmp[int(edge_index_class.item())][n_id]
    
    
    tmp_filled[1624]
    # tensor([0.6071, 0.9668, 0.9829, 0.1886], device='cuda:0')
    
    tmp[3][1624]
    # tensor([0.6071, 0.9668, 0.9829, 0.1886], device='cuda:0')