I am following the tutorial mentioned in this link - download_rocket_launches.py . As I am running this in Cloud Composer, I want to put in the native path i.e. /home/airflow/gcs/dags but it's failing with error path not found.
What path can I give for this command to work? Here is the task I am trying to execute -
download_launches = BashOperator(
task_id="download_launches",
bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", # noqa: E501
dag=dag,
)
This worked on my end:
import json
import pathlib
import airflow.utils.dates
import requests
import requests.exceptions as requests_exceptions
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
dag = DAG(
dag_id="download_rocket_launches",
description="Download rocket pictures of recently launched rockets.",
start_date=airflow.utils.dates.days_ago(14),
schedule_interval="@daily",
)
download_launches = BashOperator(
task_id="download_launches",
bash_command="curl -o /home/airflow/gcs/data/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming' ", # put space in between single quote and double quote
dag=dag,
)
download_launches
Output:
The key was to put space between single quote '
and double quote "
towards the end of your bash command.
Also, it is recommended to use the Data
folder when it comes to mapping out your output file as stated in the GCP documentation:
gs://bucket-name/data /home/airflow/gcs/data: Stores the data that tasks produce and use. This folder is mounted on all worker nodes.