I am using Airflow with Kubernetes executor and testing out locally (using minikube), While I was able to get it up and running, I cant seem to store my logs in S3. I have tried all solutions that are described and I am still getting the following error,
*** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log
*** Fetching from: http://examplepythonoperatorprintthecontext-5b01d602e9d2482193d933e7d2:8793/log/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='examplepythonoperatorprintthecontext-5b01d602e9d2482193d933e7d2', port=8793): Max retries exceeded with url: /log/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd00688a650>: Failed to establish a new connection: [Errno -2] Name or service not known'))
I implemented a custom Logging class as mentioned in this answer and still no luck.
My airflow.yaml
looks like this
airflow:
image:
repository: airflow-docker-local
tag: 1
executor: Kubernetes
service:
type: LoadBalancer
config:
AIRFLOW__CORE__EXECUTOR: KubernetesExecutor
AIRFLOW__CORE__TASK_LOG_READER: s3.task
AIRFLOW__CORE__LOAD_EXAMPLES: True
AIRFLOW__CORE__FERNET_KEY: ${MASKED_FERNET_KEY}
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow@airflow-postgresql:5432/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://postgres:airflow@airflow-postgresql:5432/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:airflow@airflow-redis-master:6379/0
# S3 Logging
AIRFLOW__CORE__REMOTE_LOGGING: True
AIRFLOW__CORE__REMOTE_LOG_CONN_ID: s3://${AWS_ACCESS_KEY_ID}:${AWS_ACCESS_SECRET_KEY}@S3
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://${BUCKET_NAME}/logs
AIRFLOW__CORE__S3_LOG_FOLDER: s3://${BUCKET_NAME}/logs
AIRFLOW__CORE__LOGGING_LEVEL: INFO
AIRFLOW__CORE__LOGGING_CONFIG_CLASS: log_config.LOGGING_CONFIG
AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
# End of S3 Logging
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: True
AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC: 30
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1
AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
AIRFLOW__KUBERNETES__NAMESPACE: airflow
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: True
AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: '{\"_request_timeout\":[60,60]}'
persistence:
enabled: true
existingClaim: ''
accessMode: 'ReadWriteMany'
size: 5Gi
logsPersistence:
enabled: false
workers:
enabled: true
postgresql:
enabled: true
redis:
enabled: true
I have tried setting up the Connection via UI and creating connection via airflow.yaml
and nothing seems to work, I have been trying this for 3 days now with no luck, any help would be much appreciated.
I have attached the screenshot for reference,
I am pretty certain this issue is because the s3 logging configuration has not been set on the worker pods. The worker pods don't get given configuration set using environment variables such as AIRFLOW__CORE__REMOTE_LOGGING: True
. If you wish to set this variable in the worker pod then you must copy the variable and append AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__
to the copied environment variable name: AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True
.
In this case you would need to duplicate all of your variables specifying config for s3 logging and append AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__
to the copies.