pythonssl-certificategoogle-colaboratorytensorflow-datasets

SSL:CERTIFICATE_VERIFY_FAILED for TensorFlow dataset loading


Issue

I have previously loaded this dataset in Google Colab without error, but I now encounter an error.

Code

!pip install --upgrade 'tensorflow_data_validation[visualization]<2'
import tensorflow as tf
import tensorflow_datasets as tfds

ratings, info_ratings = tfds.load("movielens/100k-ratings", split="train", with_info=True)

Error

WARNING:urllib3.connectionpool:Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1016)'))': /datasets/movielens/ml-100k.zip

Fixes Tried

I have tried the following fixes that I have found in previous related questions on Stack, but none have solved the issue:

import os
os.environ['TFDS_HTTP_VERIFY'] = '0'
pip install --upgrade certifi
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

I thought that it might have something to do with the recent Colab runtime upgrade to Python 3.12, so I have also created a copy of the notebook in VS Code (running on a WSL2 environment), but I get exactly the same error there.

I have also used SSL Checker to check the certificate dates for the TensorFlow datasets website, and this appears within the validity period.

Please could someone advise me as to what the issue could be? I have scoured the GitHub issues to see if it is a known issue, but I have not found anything. However, I may not be looking in the right places.

I am aware that I can manually import the data and then create a dataset, but if anyone could get the code above to work, I would be most grateful.


Solution

  • This issue has now been resolved. I raised an issue in the GitHub issue repo for googlecolab/colabtools and I tried the same code as above this morning, and it loaded fine.