I have a .py file containing all the instructions to generate the predictions for some data. Those data are taken from BigQuery and the predictions should be inserted in another BigQuery table. Right now the code is running on a AIPlatform Notebook, but I want to schedule its execution every day, is there any way to do it?
I run into the AIPlatform Jobs, but I can't understand what should my code do and what should be the structure of the code, is there any step-by-step guide to follow?
You can schedule a Notebook execution using different options:
nbconvert Different variants of the same technology:
KubeFlow Fairing Is a Python package that makes it easy to train and deploy ML models on Kubeflow. Kubeflow Fairing can also be extended to train or deploy on other platforms. Currently, Kubeflow Fairing has been extended to train on Google AI Platform.
AI Platform Notebook Executor There are two core functions of the Scheduler extension: Ability to submit a Notebook to run on AI Platform’s Machine Learning Engine as a training job with a custom container image. This allows you to experiment and write your training code in a cost-effective single VM environment, but scale out to an AI Platform job to take advantage of superior resources (ie. GPUs, TPUs, etc.). Scheduling a Notebook for recurring runs follows the exact same sequence of steps, but requires a crontab-formatted schedule option.
Nova Plugin: This is the predecessor of the Notebook Scheduler project. Allows you to execute notebooks directly from your Jupyter UI.
Notebook training Python package allows users to run a Jupyter notebook at Google Cloud AI Platform Training Jobs.
GCP runner: Allows running any Jupyter notebook function on Google Cloud Platform
Unlike all other solutions listed above, it allows to run training for the whole project, not single Python file or Jupyter notebook
Allows running any function with parameters, moving from local execution to cloud is just a matter of wrapping function in a: gcp_runner.run_cloud(<function_name>, …)
call.
This project is production-ready without any modifications
Supports execution on local (for testing purposes), AI Platform, and Kubernetes environments Full end to end example can be found here:
https://www.github.com/vlasenkoalexey/criteo_nbdev
tensorflow_cloud (Keras for GCP) Provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.
Update July 2021:
The recommended option in GCP is Notebook Executor which is already available in EAP.