I have a dataset which is in a deque
buffer, and I want to load random batches from this with a DataLoader
. The buffer starts empty. Data will be added to the buffer before the buffer is sampled from.
self.buffer = deque([], maxlen=capacity)
self.batch_size = batch_size
self.loader = DataLoader(self.buffer, batch_size=batch_size, shuffle=True, drop_last=True)
However, this causes the following error:
File "env/lib/python3.8/site-packages/torch_geometric/loader/dataloader.py", line 78, in __init__
super().__init__(dataset, batch_size, shuffle,
File "env/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 268, in __init__
sampler = RandomSampler(dataset, generator=generator)
File "env/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 102, in __init__
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
Turns out that the RandomSampler
class checks that num_samples
is positive when it is initialised, which causes the error.
if not isinstance(self.num_samples, int) or self.num_samples <= 0:
raise ValueError("num_samples should be a positive integer "
"value, but got num_samples={}".format(self.num_samples))
Why does it check for this here, even though RandomSampler
does support datasets which change in size at runtime?
One workaround is to use an IterableDataset
, but I want to use the shuffle functionality of DataLoader
.
Can you think of a nice way to use a DataLoader
with a deque
? Much appreciated!
The problem here is neither the usage of deque nor the fact that the dataset is dynamically growable. The problem is that you are starting with a Dataset of size zero - which is invalid.
The easiest solution would be to just start with any arbitrary object in the deque and dynamically remove it afterwards.