I want to deploy an application to Airflow that accepts a config file as a parameter, pulls the git repository specified by said config, then transforms it into a Docker image, then uploads that image to GCP's Artifact Registry. What is the best practice for building a docker image inside an Airflow DAG?
I have tried orchestrating a manually-triggered cloud build run via Airflow - I have not been able to pass the necessary substitutions into the cloudbuild.yaml file using the CloudBuildCreateBuildOperator, nor have I been able to specify the workspace.
I have also created a docker image that itself can create new docker images (when the docker.sock file is mounted as a volume). However, using a KubernetesPodOperator to call this seems to go against the design philosophy of Airflow, since this task would be affecting the host machine by building new docker images directly on it.
It's not the responsability of Airflow
to apply this kind of use case.
Airflow
is a pipeline and tasks orchestrator based on DAGs
(direct acyclic graph).
Your need corresponds to usual CI CD pipelines.
It's better to delegate this work on a tool like Cloud Build
or Gitlab CI
for example.
From Cloud Build
, you can apply and automate all the actions specified in your question.
When you will build your image in the CI CD part, you can then in the Airflow
DAG, use a Docker
image if needed with KubernetesPodOperator
.
This would be more coherent because each concern will be put in the right place and on the right tool.