I'm using pytorch geometric. My data is of the class: torch_geometric.data.Data
. Most tutorials I see use torch_geometric.utils.train_test_split_edges
(depreciated now, recommended to use torch_geometric.transforms.random_link_split
. Any way, both of these functions work to split my data. However, my data has a time component and I'd like to do a train/test split using a date as a threshold. How can I accomplish this?
My data object looks like:
Data(x=[17815, 13], edge_index=[2, 62393], edge_attr=[62393], edge_time=[62393], edge_label=[62393], input_id=[1], batch_size=1)
I can get my own train_mask
and test_mask
by doing something like:
train_mask = (data.edge_time < time_threshold)
test_mask = (data.edge_time >= time_threshold)
But again this would take some work to filter all the components of Data
and it does not have negative edge indices. My model needs positive and negative edge indices just like torch_geometric.utils.train_test_split_edges
returns.
Does anyone know how to accomplish this? Thanks so much!!
You can in theory simply use the node mask to generate a train and test edge_index tensor:
edge_index_train = data.edge_index[:, train_mask]
edge_attr_train = data.edge_index[train_mask]
and respectively replace train_mask
with ~train_mask
(or test_mask) for the test dataset.