I was implementing a DCGAN application based on the lsun-bedroom dataset. I was planning to utilize tfds, since lsun was on its catalog. Since the total dataset contains 42.7 GB of images, I only wanted to load a portion(10%) of the full data and used the following code to load the data according to the manual. Unfortunately, the same error informing not enough disk space occurred. Would there be a possible solution with tfds or should I use another API to load the data?
tfds.load('lsun/bedroom',split='train[10%:]')
Not enough disk space. Needed: 42.77 GiB (download: 42.77 GiB, generated: Unknown size)
I was testing on Google Colab
TFDS download the dataset from the original author website. As the datasets are often published as monolithic archive (e.g lsun.zip
), it is unfortunately
impossible for TFDS to only download/install part of the dataset.
The split argument only filter the dataset after it has been fully generated. Note: You can see the download size of the datasets in the catalog: https://www.tensorflow.org/datasets/catalog/overview