[SOLVED] Split DBP15K from pytorch geometric in train test and validation

Split DBP15K from pytorch geometric in train test and validation

I have a code in which I use the DBP15K dataset via

from torch_geometric.datasets import DBP15K

data = DBP15K(path, args.category, transform=SumEmbedding())[0].to(device)

But according to the documentation of pytorch geometric this one is divided only in train and in test.

I tried to divide it by myself using the function "train_test_split_edges" .

But nothing I tried worked so I wanted to know if some of you already tried to split this dataset.

Solution

Finally I just need to split either the test or the train to have the validation.

I just did it like this:

data = DBP15K(path, args.category, transform=SumEmbedding())[0].to(device)
# Divide the tensor into two parts with ratios 0.8 and 0.2
split_index = int(0.8 * data.train_y.shape[1])
train_y, val_y = torch.split(data.train_y, [split_index, data.train_y.shape[1] - split_index], dim=1)

# Display tensor shapes
print(train_y.shape)  # torch.Size([2, 3296])
print(val_y.shape)    # torch.Size([2, 825])