pythonpython-3.xairflowhashicorp-vault

How can one use HashiCorp Vault in Airflow?


I am starting to use Apache Airflow and I am wondering how to effectively make it use secrets and passwords stored in Vault. Unfortunately, search does not return meaningful answers beyond a yet-to-be-implemented hook in Airflow project itself.

I can always use Python's hvac module to generically access Vault from PythonOperator but I was wondering if there is any better way or a good practice (e.g. maybe an Airflow plugin I missed).


Solution

  • Airflow >=1.10.10 supports Secrets Backends and supports getting Airflow Variables and Connections from Hashicorp Vault.

    More Details in Airflow Docs: https://airflow.apache.org/docs/stable/howto/use-alternative-secrets-backend.html#hashicorp-vault-secrets-backend

    If you want to test it locally check the tutorial at https://www.astronomer.io/docs/astro/secrets-backend/hashicorp-vault

    Set the following config in airflow.cfg, update based on your environment:

    backend = airflow.contrib.secrets.hashicorp_vault.VaultBackend
    backend_kwargs = {"connections_path": "connections", "variables_path": "variables", "mount_point": "airflow", "url": "http://127.0.0.1:8200"}
    

    Example DAG to test the integration:

    from airflow import DAG
    from airflow.operators.python_operator import PythonOperator
    from datetime import datetime
    from airflow.hooks.base_hook import BaseHook
    
    
    def get_secrets(**kwargs):
        conn = BaseHook.get_connection(kwargs['my_conn_id'])
        print(f"Password: {conn.password}, Login: {conn.login}, URI: {conn.get_uri()}, Host: {conn.host}")
    
    with DAG('example_secrets_dags', start_date=datetime(2020, 1, 1), schedule_interval=None) as dag:
    
    
        test_task = PythonOperator(
            task_id='test-task',
            python_callable=get_secrets,
            op_kwargs={'my_conn_id': 'smtp_default'},
        )