google-cloud-platformairflowgoogle-cloud-composergcp-ai-platform-notebookgcp-ai-platform-training

How do you schedule GCP AI Platform notebooks via Google Cloud Composer?


I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles.

Any help is appreciated!


Solution

  • First step is to create Jupyter Lab Notebook. If you want to use additional libraries, install them and restart the kernel (Restart Kernel and Clear All Outputs option). Then, define the processing inside your Notebook.

    When it's ready, remove all the runs, peeks and dry runs before you start the scheduling phase.

    Now, you need to set up Cloud Composer environment (remember about installing additional packages, that you defined in first step). To schedule workflow, go to Jupyter Lab and create second notebook which generates DAG from workflow.

    The final step is to upload the zipped workflow to the Cloud Composer DAGs folder. You can manage your workflow using Airflow UI.

    I recommend you to take a look for this article.

    Another solution that you can use is Kubeflow, which aims to make running ML workloads on Kubernetes. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. You can find interesting tutorial on codelabs.

    I hope you find the above pieces of information useful.