azureairflowkubernetes-helmairflow-2.xairflow-webserver

Airflow webserver pod unable to get log from worker pod


I have deployed Airflow [2.9.3] in AKS. But when executing the DAGS getting below error. Don't getting any clue what needs to be updated. I am using Helm [1.15.0] for deployment. I am totally clueless please help..

Error

Could not read served logs: HTTPConnectionPool(host='file-sensor-waiting-for-trigger-file-in-nprd-vgcdjyz8', port=8793): Max retries exceeded with url: /log/dag_id=file_sensor/run_id=manual__2024-11-06T03:10:17.703058+00:00/task_id=waiting_for_trigger_file_in_nprd/attempt=4.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f23131760>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Updated after doing some updates as below

I have created one "Connection" for "Azure Blob Storage" from Webportal with the name "adlsgen2". The connection was successfully tested.

Azure blob storage connection adlsgen2

Then I updated "logging:" in values.yaml with below code

config:
  core:
    dags_folder: '{{ include "airflow_dags" . }}'
    # This is ignored when used with the official Docker image
    load_examples: 'False'
    executor: '{{ .Values.executor }}'
    # For Airflow 1.10, backward compatibility; moved to [logging] in 2.0
    colored_console_log: 'False'
    remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
  logging:
    # remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
    remote_logging: 'True'
    remote_log_con_id: 'adlsgen2'
    remote_base_log_folder: 'wasb://airflowlogs@testadlsgen.blob.core.windows.net/logs'
    delete_local_logs: 'True'
    colored_console_log: 'False'

Then I executed below commands to roll out the DNS.

roll out the dns

Then I executed below commands to re deploy the airflow with helm. And it was successfully deployed.

helm upgrade airflow apache-airflow/airflow --namespace dfs-test -f values.yaml pod details after installation

But I am getting below warning when looking into web server

startup prob failed

When I am trying to access the web server from browser I am able to login and found that log massage is gone. But no log is showing. Also no log is written in Blob storage, where I made my webconnection for logging. .

wbserver not showing any logs


Solution

  • This error typically arises from misconfigured DNS within the cluster

    deploy one airflow namespace to isolate the environment.

    kubectl create namespace airflow
    kubectl config set-context --current --namespace=airflow
    

    add the repo for apache

    helm repo add apache-airflow https://airflow.apache.org
    helm repo update
    

    Create a config file for exampleairflow-values.yaml to specify the configuration of Airflow components, including the executor type, ingress, and logging settings

    executor: "CeleryExecutor"
    airflowVersion: "2.9.3"
    
    web:
      service:
        type: LoadBalancer
      defaultUser:
        enabled: true
        username: admin
        password: admin
      secretKeyRef: airflow-webserver-secret-key  
    
    workers:
      replicas: 2
    
    ingress:
      web:
        enabled: true
        ingressClassName: nginx  
        hosts:
          - name: "airflow.example.com"  
      flower:
        enabled: false
    
    logging:
      remote_logging: true
      remote_log_conn_id: "logfile"
      worker_log_server_port: 8793
      remote_log_fetch_timeout: 60  
      remote_log_fetch_retries: 5   
    
    

    enter image description here

    Deploy Airflow using Helm with this customized configuration

    helm install airflow apache-airflow/airflow --namespace airflow -f airflow-values.yaml
    

    enter image description here

    After deployment, verify the same

    enter image description here

    Sometimes, Core DNS may not correctly resolve DNS. Restart CoreDNS to ensure it's functioning properly.

    kubectl rollout restart deployment coredns -n kube-system
    kubectl get pods -n kube-system -l k8s-app=kube-dns
    

    enter image description here

    To validate that DNS resolution is working correctly, do a nslookup within the Airflow pod

    kubectl run -i --tty dnsutils --image=tutum/dnsutils --rm=true -- bash
    nslookup airflow-worker.airflow.svc.cluster.local
    

    enter image description here

    As you can see the DNS resolution inside the cluster is working as expected. The airflow-worker.airflow.svc.cluster.local domain is correctly resolving to multiple IP addresses.

    Update- In addition to above troubleshooting, kindly configure permissions correctly for the storage account and AKS.

    create an aks cluster

    az aks create \
        --resource-group arkorg \
        --name arkoAKSCluster \
        --node-count 2 \
        --enable-managed-identity \
        --generate-ssh-keys \
        --location eastus
    

    enter image description here

    Create storage A/C

    az storage account create \
        --name arkoairflowstorage \
        --resource-group arkorg \
        --location eastus \
        --sku Standard_LRS
    

    Create blob storage container for logs

    az storage container create \
        --account-name arkoairflowstorage \
        --name airflowlogs
    

    enter image description here

    Retrieve storage account key

    az storage account keys list \
        --resource-group arkorg \
        --account-name arkoairflowstorage \
        --query "[0].value" -o tsv
    

    enter image description here

    Here is the updated values file

    executor: "CeleryExecutor"
    airflowVersion: "2.9.3"
    
    web:
      service:
        type: LoadBalancer
      defaultUser:
        enabled: true
        username: admin
        password: admin
      env:
        - name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
          value: "wasb://airflowlogs@arkoairflowstorage.blob.core.windows.net/logs"
        - name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
          value: "azure_blob_storage"
    
    workers:
      replicas: 2
      env:
        - name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
          value: "wasb://airflowlogs@arkoairflowstorage.blob.core.windows.net/logs"
        - name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
          value: "azure_blob_storage"
    
    scheduler:
      env:
        - name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
          value: "wasb://airflowlogs@arkoairflowstorage.blob.core.windows.net/logs"
        - name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
          value: "azure_blob_storage"
    
    logging:
      remote_logging: true
      remote_log_conn_id: "azure_blob_storage"
      worker_log_server_port: 8793
      remote_log_fetch_timeout: 60
      remote_log_fetch_retries: 5
    
    connections:
      - id: "azure_blob_storage"
        type: "wasb"
        extra: |
          {
            "connection_string": "DefaultEndpointsProtocol=https;AccountName=arkoairflowstorage;AccountKey=abcdefghijklmnopthisisasamplefakestringuseyourownoriginaloneyougetfromabovecommand==;EndpointSuffix=core.windows.net"
          }
    

    Few modifications I made such as the CeleryExecutor should be specified as executor in the main values.yaml , outside the config section. Plus, I have added a web section which defines the service type, creds, and environment variables for remote logging.

    Add the Airflow Helm Repository

    helm repo add apache-airflow https://airflow.apache.org
    helm repo update
    

    Deploy Airflow to AKS

    helm install airflow apache-airflow/airflow \
        --namespace airflow \
        --create-namespace \
        -f airflow-values.yaml
    

    enter image description here

    Verify Deployment

    kubectl get pods -n airflow
    

    enter image description here

    from Airflow UI, confirm the connection under Admin > Connections and confirm Logging in Blob Storage by running a sample DAG to generate logs.

    enter image description here