pythongoogle-cloud-platformgoogle-cloud-storagegcp-ai-platform-training

Accessing files on Google Storage from a Google Cloud Python job


I am loosely following a tutorial to train a TensorFlow estimator on Google Cloud AI Platform.

I would like to access a directory that contains my training and evaluation data, and to this end I have copied my data files recursively to Google Storage like this:

gsutil cp -r data gs://name-of-my-bucket/data

This works fine, and gsutil ls gs://name-of-my-bucket/data correctly returns:

gs://name-of-my-bucket/data/test.json
gs://name-of-my-bucket/data/test
gs://name-of-my-bucket/data/train

However, calling os.listdir(data_dir) from a Python script raises a FileNotFoundError for any value of data_dir that I've tried so far, including 'data/' and 'name-of-my-bucket/data/'. Why?

I know that my Python script is being executed from the directory /root/.local/lib/python3.7/site-packages/trainer/ /user_dir.

Python code where the issue arises (edit)

Here is the code that precedes the line where the error arises, directly from the __main__ section of my Python script:

PARSER = argparse.ArgumentParser()
PARSER.add_argument('--job-dir', ...)
PARSER.add_argument('--eval-steps', ...)
PARSER.add_argument('--export-format', ...)

ARGS = PARSER.parse_args()
tf.logging.set_verbosity('INFO')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = str(tf.logging.__dict__['INFO'] / 10)

HPARAMS = hparam.HParams(**ARGS.__dict__)

Here is the line of code where the error arises (first line of a separate function that gets invoked right after the lines of code I have reported above):

mug_dirs = [f for f in os.listdir(image_dir) if not f.startswith('.')]

Logs (edit)

My logs for this job are a list of infos (plus 5 deprecation warnings related to TensorFlow), and then an error from the master-replica-0 task:

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 114, in <module> train_model(HPARAMS) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 55, in train_model (train_data, train_labels) = data.create_data_with_labels("data/train/") File "/root/.local/lib/python3.7/site-packages/trainer/data.py", line 13, in create_data_with_labels mug_dirs = [f for f in os.listdir(image_dir) if not f.startswith('.')] FileNotFoundError: [Errno 2] No such file or directory: 'data/train/'

... followed by another error from the same task (reporting non-zero exit status from my Python command), then two infos about clean-up, and finally an error from the service task:

The replica master 0 exited with a non-zero status of 1. Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 114, in <module> train_model(HPARAMS) File "/root/.local/lib/python3.7/site-packages/trainer/final_task.py", line 55, in train_model (train_data, train_labels) = data.create_data_with_labels("data/train/") File "/root/.local/lib/python3.7/site-packages/trainer/data.py", line 13, in create_data_with_labels mug_dirs = [f for f in os.listdir(image_dir) if not f.startswith('.')] FileNotFoundError: [Errno 2] No such file or directory: 'data/train/' To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=1047296516162&resource=ml_job%2Fjob_id%2Fml6_run_25&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22ml6_run_25%22

Solution

  • Cloud Storage objects are a flat namespace and not contained in folders. Due to a more user-friendly experience, gsutil and the Google Cloud Storage UI will create an illusion of a hierarchical file tree. More info can be found on the documentation.

    Now, if you are trying to read from a file object that is hosted on Cloud Storage, you may want to use the following documentation to download an object to your local directory using the Cloud Storage Client Libraries. Alternatively, you may as well use the gsutil cp command, which will allow you to copy data between your local directory and Cloud Storage buckets, among other options.

    Once you download a replica object from a GCS bucket in your local directory, you will be able to manipulate said file as needed.

    Update - Referencing a Cloud Storage object file - do not use os.listdir to access a GCS bucket object.

    Because Cloud Storage is a flat namespace, a Cloud Storage bucket, gs://my-bucket/data/test.json will contain an object called data/test.json stored in the root directory of gs://my-bucket. Note that the object name includes / characters. Therefore, if you would like access, for instance, your file test.json in your bucket, you can check the documentation above and use data/test.json as the reference - the concept of folder does not exist per se. Optionally, if you needed to access your train file object, you would use data/train as the reference.