githubkubernetesairflowsidecargit-sync

Git-sync sidecar container is not syncing GitHub repo DAGS into Airflow Kubernetes cluster properly


Im attempting to incorporate git-sync sidecar container into my Airflow deployment yaml so my private Github repo gets synced to my Airflow Kubernetes env every time I make a change in the repo.

So far, it successfully creates a git-sync container along with our scheduler, worker, and web server pods, each in their respective pod (ex: scheduler pod contains a scheduler container and gitsync container).  

I looked at the git-sync container logs and it looks like it successfully connects with my private repo (using a personal access token) and prints success logs every time I make a change to my repo.

INFO: detected pid 1, running init handler
I0411 20:50:31.009097      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","-wait=60","-repo=https://github.com/jorgeavelar98/AirflowProject.git","-branch=master","-root=/opt/airflow/dags","-username=jorgeavelar98","-password-file=/etc/git-secret/token"]
I0411 20:50:31.029064      12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="https://github.com/jorgeavelar98/AirflowProject.git" "path"="/opt/airflow/dags"
I0411 20:50:31.031728      12 main.go:956] "level"=0 "msg"="git root exists and is not empty (previous crash?), cleaning up" "path"="/opt/airflow/dags"
I0411 20:50:31.894074      12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="18d3c8e19fb9049b7bfca9cfd8fbadc032507e03"
I0411 20:50:31.907256      12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/opt/airflow/dags/18d3c8e19fb9049b7bfca9cfd8fbadc032507e03" "branch"="origin/master"
I0411 20:50:31.911039      12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/opt/airflow/dags/18d3c8e19fb9049b7bfca9cfd8fbadc032507e03" "hash"="18d3c8e19fb9049b7bfca9cfd8fbadc032507e03"
I0411 20:50:31.911065      12 main.go:865] "level"=0 "msg"="updating submodules"

 

However, despite their being no error logs in my git-sync container logs, I could not find any of the files in the destination directory where my repo is supposed to be synced into (/opt/airflow/dags). Therefore, no DAGs are appearing in the Airflow UI

This is our scheduler containers/volumes yaml definition for reference. We have something similar for workers and webserver

      containers:
        - name: airflow-scheduler
          image: <redacted>
          imagePullPolicy: IfNotPresent
          envFrom:
            - configMapRef:
                name: "AIRFLOW_SERVICE_NAME-env"
          env:            
            <redacted>
          resources: 
            requests:
              memory: RESOURCE_MEMORY
              cpu: RESOURCE_CPU
          volumeMounts:
            - name: scripts
              mountPath: /home/airflow/scripts
            - name: dags-data
              mountPath: /opt/airflow/dags
              subPath: dags
            - name: dags-data
              mountPath: /opt/airflow/plugins
              subPath: plugins
            - name: variables-pools
              mountPath: /home/airflow/variables-pools/
            - name: airflow-log-config
              mountPath: /opt/airflow/config
          command:
            - "/usr/bin/dumb-init"
            - "--"
          args:
            <redacted>
        - name: git-sync
          image: registry.k8s.io/git-sync/git-sync:v3.6.5
          args:
            - "-wait=60"
            - "-repo=<repo>"
            - "-branch=master"
            - "-root=/opt/airflow/dags"
            - "-username=<redacted>"
            - "-password-file=/etc/git-secret/token"
          volumeMounts:
            - name: git-secret
              mountPath: /etc/git-secret
              readOnly: true
            - name: dags-data
              mountPath: /opt/airflow/dags
      volumes:
        - name: scripts
          configMap:
            name: AIRFLOW_SERVICE_NAME-scripts
            defaultMode: 493
        - name: dags-data
          emptyDir: {}
        - name: variables-pools
          configMap:
            name: AIRFLOW_SERVICE_NAME-variables-pools
            defaultMode: 493
        - name: airflow-log-config
          configMap:
            name: airflow-log-configmap
            defaultMode: 493
        - name: git-secret
          secret:
            secretName: github-token

What can be the issue? I couldn't find much documentation that could help me further investigate. Any help and guidance would be greatly appreciated!


Solution

  • Looks like my issue was that my worker, scheduler, and web server container had different dag volume mounts from the ones I defined for my git-sync container.

    This is what I had:

    containers:
            - name: airflow-scheduler
              image: <redacted>
              imagePullPolicy: IfNotPresent
              envFrom:
                - configMapRef:
                    name: "AIRFLOW_SERVICE_NAME-env"
              env:            
                <redacted>
              resources: 
                requests:
                  memory: RESOURCE_MEMORY
                  cpu: RESOURCE_CPU
              volumeMounts:
                - name: scripts
                  mountPath: /home/airflow/scripts
                - name: dags-data
                  mountPath: /opt/airflow/dags
                  subPath: dags
                - name: dags-data
                  mountPath: /opt/airflow/plugins
                  subPath: plugins
                - name: variables-pools
                  mountPath: /home/airflow/variables-pools/
                - name: airflow-log-config
                  mountPath: /opt/airflow/config
    

    And the following edits made it work. I removed the dag subpath and plugins volume mount:

    containers:
            - name: airflow-scheduler
              image: <redacted>
              imagePullPolicy: IfNotPresent
              envFrom:
                - configMapRef:
                    name: "AIRFLOW_SERVICE_NAME-env"
              env:            
                <redacted>
              resources: 
                requests:
                  memory: RESOURCE_MEMORY
                  cpu: RESOURCE_CPU
              volumeMounts:
                - name: scripts
                  mountPath: /home/airflow/scripts
                - name: dags-data
                  mountPath: /opt/airflow/dags
                - name: variables-pools
                  mountPath: /home/airflow/variables-pools/
                - name: airflow-log-config
                  mountPath: /opt/airflow/config