I have a pretty standard CI
pipeline using Cloud Build
for my Machine Learning training model based on container:
Now in Machine Learning it is impossible to validate a model without testing it with real data. Normally we add 2 extra checks:
This allow to catch issues inside the code of model. In my setup, I have my Cloud Build
in a build GCP
project and the data in another GCP
project.
Q1: did somebody managed to use AI Platform training
service in Cloud Build
to train on data sitting in another GCP
project ?
Q2: how to tell Cloud Build to wait until the AI Platform training
job finished and check what is the status (successful/failed) ? It seems that the only option when looking at the documentation link it to use --stream-logs
but it seems non optimal (using such option, I saw some huge delay)
When you submit an AI platform training job, you can specify a service account email to use.
Be sure that the service account has enough authorization in the other project to use data from there.
For you second question, you have 2 solutions
--stream-logs
as you mentioned. If you don't want the logs in your Cloud Build, you can redirect the stdout and/or the stderr to /dev/null
- name: name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args:
- -c
- |
gcloud ai-platform jobs submit training <your params> --stream-logs >/dev/null 2>/dev/null
Or you can create an infinite loop that check the status
- name: name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args:
- -c
- |
JOB_NAME=<UNIQUE Job NAME>
gcloud ai-platform jobs submit training $${JOB_NAME} <your params>
# test the job status every 60 seconds
while [ -z "$$(gcloud ai-platform jobs describe $${JOB_NAME} | grep SUCCEEDED)" ]; do sleep 60; done
Here my test is simple, but you can customize the status tests as you want to match your requirement
Don't forget to set the timeout as expected.