google-colaboratoryhuggingface-datasetshuggingface-hub

Colab cannot find HuggingFace dataset


When I try to run the following code to load a dataset from Hugging Face hub to google Colab, I get an error!

! pip install transformers datasets
from datasets import load_dataset
cv_13 = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="train")
<ipython-input-9-4d772f75be89> in <cell line: 3>()
      1 from datasets import load_dataset
      2 
----> 3 cv_13 = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="train")

2 frames
/usr/local/lib/python3.10/dist-packages/datasets/load.py in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
   1505                     raise e1 from None
   1506                 if isinstance(e1, FileNotFoundError):
-> 1507                     raise FileNotFoundError(
   1508                         f"Couldn't find a dataset script at {relative_to_absolute_path(combined_path)} or any data file in the same directory. "
   1509                         f"Couldn't find '{path}' on the Hugging Face Hub either: {type(e1).__name__}: {e1}"

FileNotFoundError: Couldn't find a dataset script at /content/mozilla-foundation/common_voice_13_0/common_voice_13_0.py or any data file in the same directory. Couldn't find 'mozilla-foundation/common_voice_13_0' on the Hugging Face Hub either: FileNotFoundError: Dataset 'mozilla-foundation/common_voice_13_0' doesn't exist on the Hub. If the repo is private or gated, make sure to log in with `huggingface-cli login`.

The dataset exists in Huggingface hub and loads successfully in my local Jupiter Lab. What should I do?


Solution

  • The Common Voice dataset at https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0 is a gated dataset, so you need to log in to access it, e.g. using:

    huggingface-cli login