tensorflowlarge-data

Not enough disk space when loading dataset with TFDS


I was implementing a DCGAN application based on the lsun-bedroom dataset. I was planning to utilize tfds, since lsun was on its catalog. Since the total dataset contains 42.7 GB of images, I only wanted to load a portion(10%) of the full data and used the following code to load the data according to the manual. Unfortunately, the same error informing not enough disk space occurred. Would there be a possible solution with tfds or should I use another API to load the data?

tfds.load('lsun/bedroom',split='train[10%:]')

Not enough disk space. Needed: 42.77 GiB (download: 42.77 GiB, generated: Unknown size)

I was testing on Google Colab


Solution

  • TFDS download the dataset from the original author website. As the datasets are often published as monolithic archive (e.g lsun.zip), it is unfortunately impossible for TFDS to only download/install part of the dataset.

    The split argument only filter the dataset after it has been fully generated. Note: You can see the download size of the datasets in the catalog: https://www.tensorflow.org/datasets/catalog/overview