Im attempting to incorporate git-sync sidecar container into my Airflow deployment yaml so my private Github repo gets synced to my Airflow Kubernetes env every time I make a change in the repo.
So far, it successfully creates a git-sync container along with our scheduler, worker, and web server pods, each in their respective pod (ex: scheduler pod contains a scheduler container and gitsync container).
I looked at the git-sync container logs and it looks like it successfully connects with my private repo (using a personal access token) and prints success logs every time I make a change to my repo.
INFO: detected pid 1, running init handler
I0411 20:50:31.009097 12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","-wait=60","-repo=https://github.com/jorgeavelar98/AirflowProject.git","-branch=master","-root=/opt/airflow/dags","-username=jorgeavelar98","-password-file=/etc/git-secret/token"]
I0411 20:50:31.029064 12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="https://github.com/jorgeavelar98/AirflowProject.git" "path"="/opt/airflow/dags"
I0411 20:50:31.031728 12 main.go:956] "level"=0 "msg"="git root exists and is not empty (previous crash?), cleaning up" "path"="/opt/airflow/dags"
I0411 20:50:31.894074 12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="18d3c8e19fb9049b7bfca9cfd8fbadc032507e03"
I0411 20:50:31.907256 12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/opt/airflow/dags/18d3c8e19fb9049b7bfca9cfd8fbadc032507e03" "branch"="origin/master"
I0411 20:50:31.911039 12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/opt/airflow/dags/18d3c8e19fb9049b7bfca9cfd8fbadc032507e03" "hash"="18d3c8e19fb9049b7bfca9cfd8fbadc032507e03"
I0411 20:50:31.911065 12 main.go:865] "level"=0 "msg"="updating submodules"
However, despite their being no error logs in my git-sync container logs, I could not find any of the files in the destination directory where my repo is supposed to be synced into (/opt/airflow/dags). Therefore, no DAGs are appearing in the Airflow UI
This is our scheduler containers/volumes yaml definition for reference. We have something similar for workers and webserver
containers:
- name: airflow-scheduler
image: <redacted>
imagePullPolicy: IfNotPresent
envFrom:
- configMapRef:
name: "AIRFLOW_SERVICE_NAME-env"
env:
<redacted>
resources:
requests:
memory: RESOURCE_MEMORY
cpu: RESOURCE_CPU
volumeMounts:
- name: scripts
mountPath: /home/airflow/scripts
- name: dags-data
mountPath: /opt/airflow/dags
subPath: dags
- name: dags-data
mountPath: /opt/airflow/plugins
subPath: plugins
- name: variables-pools
mountPath: /home/airflow/variables-pools/
- name: airflow-log-config
mountPath: /opt/airflow/config
command:
- "/usr/bin/dumb-init"
- "--"
args:
<redacted>
- name: git-sync
image: registry.k8s.io/git-sync/git-sync:v3.6.5
args:
- "-wait=60"
- "-repo=<repo>"
- "-branch=master"
- "-root=/opt/airflow/dags"
- "-username=<redacted>"
- "-password-file=/etc/git-secret/token"
volumeMounts:
- name: git-secret
mountPath: /etc/git-secret
readOnly: true
- name: dags-data
mountPath: /opt/airflow/dags
volumes:
- name: scripts
configMap:
name: AIRFLOW_SERVICE_NAME-scripts
defaultMode: 493
- name: dags-data
emptyDir: {}
- name: variables-pools
configMap:
name: AIRFLOW_SERVICE_NAME-variables-pools
defaultMode: 493
- name: airflow-log-config
configMap:
name: airflow-log-configmap
defaultMode: 493
- name: git-secret
secret:
secretName: github-token
What can be the issue? I couldn't find much documentation that could help me further investigate. Any help and guidance would be greatly appreciated!
Looks like my issue was that my worker, scheduler, and web server container had different dag volume mounts from the ones I defined for my git-sync container.
This is what I had:
containers:
- name: airflow-scheduler
image: <redacted>
imagePullPolicy: IfNotPresent
envFrom:
- configMapRef:
name: "AIRFLOW_SERVICE_NAME-env"
env:
<redacted>
resources:
requests:
memory: RESOURCE_MEMORY
cpu: RESOURCE_CPU
volumeMounts:
- name: scripts
mountPath: /home/airflow/scripts
- name: dags-data
mountPath: /opt/airflow/dags
subPath: dags
- name: dags-data
mountPath: /opt/airflow/plugins
subPath: plugins
- name: variables-pools
mountPath: /home/airflow/variables-pools/
- name: airflow-log-config
mountPath: /opt/airflow/config
And the following edits made it work. I removed the dag subpath and plugins volume mount:
containers:
- name: airflow-scheduler
image: <redacted>
imagePullPolicy: IfNotPresent
envFrom:
- configMapRef:
name: "AIRFLOW_SERVICE_NAME-env"
env:
<redacted>
resources:
requests:
memory: RESOURCE_MEMORY
cpu: RESOURCE_CPU
volumeMounts:
- name: scripts
mountPath: /home/airflow/scripts
- name: dags-data
mountPath: /opt/airflow/dags
- name: variables-pools
mountPath: /home/airflow/variables-pools/
- name: airflow-log-config
mountPath: /opt/airflow/config