I have a airflow DAG "example_ml.py" which has a task "train_ml_model" and this task is calling/running a python script "training.py".
-Dags/example_ml.py -Dags/training.py
When I run Dag, it is failing to import modules required for training script to execute.
Code snippet for DAG task:
train_model = PythonOperator(
task_id='train_model',
python_callable=training,
dag = dag
)
PS: I'm using k8s cluster. Airflow is running in k8s cluster, and executor is set to kubernetesExecutor. So when each DAG is triggered a new pod gets assigned to complete the task.
I have encountered the same issue with two dependencies (praw and supabase).
Here's my solution:
Add dependencies to a requirements.txt file:
To do this, list your dependencies using pip freeze
.
Then, select the target dependencies you want to install in Airflow.
Finally, copy the dependency and its version. For example: supabase==2.3.3
Modify your Dockerfile by adding the following to copy the requirements and install them:
COPY requirements.txt /requirements.txt
RUN pip install --user --upgrade pip
RUN pip install --no-cache-dir --user -r /requirements.txt
Build a new Docker image:
docker build . --tag extending_airflow:latest
Finally, bring up your environment:
docker-compose up -d --no-deps --build airflow-webserver airflow-scheduler
I hope this helps you!