airflowkubernetespodoperator

How to install dependency modules for airflow DAG task(or python code)? , "Failed to import module" in airflow DAG when using kuberentesExecutor


I have a airflow DAG "example_ml.py" which has a task "train_ml_model" and this task is calling/running a python script "training.py".

-Dags/example_ml.py -Dags/training.py

When I run Dag, it is failing to import modules required for training script to execute. Error in import sklearn module

Code snippet for DAG task:

   train_model = PythonOperator(
        task_id='train_model',
        python_callable=training,
        dag = dag
    )

PS: I'm using k8s cluster. Airflow is running in k8s cluster, and executor is set to kubernetesExecutor. So when each DAG is triggered a new pod gets assigned to complete the task.


Solution

  • I have encountered the same issue with two dependencies (praw and supabase).

    Here's my solution:

    1. Add dependencies to a requirements.txt file:

      To do this, list your dependencies using pip freeze. Then, select the target dependencies you want to install in Airflow. Finally, copy the dependency and its version. For example: supabase==2.3.3

    2. Modify your Dockerfile by adding the following to copy the requirements and install them:

      COPY requirements.txt /requirements.txt
      RUN pip install --user --upgrade pip
      RUN pip install --no-cache-dir --user -r /requirements.txt
      
      
    3. Build a new Docker image:

      docker build . --tag extending_airflow:latest

    4. Finally, bring up your environment:

      docker-compose up -d --no-deps --build airflow-webserver airflow-scheduler

    I hope this helps you!