I have deployed Airflow [2.9.3] in AKS. But when executing the DAGS getting below error. Don't getting any clue what needs to be updated. I am using Helm [1.15.0] for deployment. I am totally clueless please help..
Error
Could not read served logs: HTTPConnectionPool(host='file-sensor-waiting-for-trigger-file-in-nprd-vgcdjyz8', port=8793): Max retries exceeded with url: /log/dag_id=file_sensor/run_id=manual__2024-11-06T03:10:17.703058+00:00/task_id=waiting_for_trigger_file_in_nprd/attempt=4.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f23131760>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Updated after doing some updates as below
I have created one "Connection" for "Azure Blob Storage" from Webportal with the name "adlsgen2". The connection was successfully tested.
Then I updated "logging:" in values.yaml with below code
config:
core:
dags_folder: '{{ include "airflow_dags" . }}'
# This is ignored when used with the official Docker image
load_examples: 'False'
executor: '{{ .Values.executor }}'
# For Airflow 1.10, backward compatibility; moved to [logging] in 2.0
colored_console_log: 'False'
remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
logging:
# remote_logging: '{{- ternary "True" "False" .Values.elasticsearch.enabled }}'
remote_logging: 'True'
remote_log_con_id: 'adlsgen2'
remote_base_log_folder: 'wasb://airflowlogs@testadlsgen.blob.core.windows.net/logs'
delete_local_logs: 'True'
colored_console_log: 'False'
Then I executed below commands to roll out the DNS.
Then I executed below commands to re deploy the airflow with helm. And it was successfully deployed.
helm upgrade airflow apache-airflow/airflow --namespace dfs-test -f values.yaml
But I am getting below warning when looking into web server
When I am trying to access the web server from browser I am able to login and found that log massage is gone. But no log is showing. Also no log is written in Blob storage, where I made my webconnection for logging. .
This error typically arises from misconfigured DNS within the cluster
deploy one airflow
namespace to isolate the environment.
kubectl create namespace airflow
kubectl config set-context --current --namespace=airflow
add the repo for apache
helm repo add apache-airflow https://airflow.apache.org
helm repo update
Create a config file for exampleairflow-values.yaml
to specify the configuration of Airflow components, including the executor type, ingress, and logging settings
executor: "CeleryExecutor"
airflowVersion: "2.9.3"
web:
service:
type: LoadBalancer
defaultUser:
enabled: true
username: admin
password: admin
secretKeyRef: airflow-webserver-secret-key
workers:
replicas: 2
ingress:
web:
enabled: true
ingressClassName: nginx
hosts:
- name: "airflow.example.com"
flower:
enabled: false
logging:
remote_logging: true
remote_log_conn_id: "logfile"
worker_log_server_port: 8793
remote_log_fetch_timeout: 60
remote_log_fetch_retries: 5
Deploy Airflow using Helm with this customized configuration
helm install airflow apache-airflow/airflow --namespace airflow -f airflow-values.yaml
After deployment, verify the same
Sometimes, Core DNS may not correctly resolve DNS. Restart CoreDNS to ensure it's functioning properly.
kubectl rollout restart deployment coredns -n kube-system
kubectl get pods -n kube-system -l k8s-app=kube-dns
To validate that DNS resolution is working correctly, do a nslookup
within the Airflow pod
kubectl run -i --tty dnsutils --image=tutum/dnsutils --rm=true -- bash
nslookup airflow-worker.airflow.svc.cluster.local
As you can see the DNS resolution inside the cluster is working as expected. The airflow-worker.airflow.svc.cluster.local
domain is correctly resolving to multiple IP addresses.
Update- In addition to above troubleshooting, kindly configure permissions correctly for the storage account and AKS.
create an aks cluster
az aks create \
--resource-group arkorg \
--name arkoAKSCluster \
--node-count 2 \
--enable-managed-identity \
--generate-ssh-keys \
--location eastus
Create storage A/C
az storage account create \
--name arkoairflowstorage \
--resource-group arkorg \
--location eastus \
--sku Standard_LRS
Create blob storage container for logs
az storage container create \
--account-name arkoairflowstorage \
--name airflowlogs
Retrieve storage account key
az storage account keys list \
--resource-group arkorg \
--account-name arkoairflowstorage \
--query "[0].value" -o tsv
Here is the updated values file
executor: "CeleryExecutor"
airflowVersion: "2.9.3"
web:
service:
type: LoadBalancer
defaultUser:
enabled: true
username: admin
password: admin
env:
- name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
value: "wasb://airflowlogs@arkoairflowstorage.blob.core.windows.net/logs"
- name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
value: "azure_blob_storage"
workers:
replicas: 2
env:
- name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
value: "wasb://airflowlogs@arkoairflowstorage.blob.core.windows.net/logs"
- name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
value: "azure_blob_storage"
scheduler:
env:
- name: AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
value: "wasb://airflowlogs@arkoairflowstorage.blob.core.windows.net/logs"
- name: AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
value: "azure_blob_storage"
logging:
remote_logging: true
remote_log_conn_id: "azure_blob_storage"
worker_log_server_port: 8793
remote_log_fetch_timeout: 60
remote_log_fetch_retries: 5
connections:
- id: "azure_blob_storage"
type: "wasb"
extra: |
{
"connection_string": "DefaultEndpointsProtocol=https;AccountName=arkoairflowstorage;AccountKey=abcdefghijklmnopthisisasamplefakestringuseyourownoriginaloneyougetfromabovecommand==;EndpointSuffix=core.windows.net"
}
Few modifications I made such as the CeleryExecutor should be specified as executor in the main values.yaml , outside the config
section. Plus, I have added a web
section which defines the service type, creds, and environment variables for remote logging.
Add the Airflow Helm Repository
helm repo add apache-airflow https://airflow.apache.org
helm repo update
Deploy Airflow to AKS
helm install airflow apache-airflow/airflow \
--namespace airflow \
--create-namespace \
-f airflow-values.yaml
Verify Deployment
kubectl get pods -n airflow
from Airflow UI, confirm the connection under Admin > Connections and confirm Logging in Blob Storage by running a sample DAG to generate logs.