I have submitted a Hive job using Dataproc Workflow Template with the help of Airflow operator (DataprocWorkflowTemplateInstantiateInlineOperator) written in Python. Once the job is submitted some name will be assigned as jobId (example: job0-abc2def65gh12
).
Since I was not able to get jobId I tried to pass jobId as a parameter from REST API which isn't working.
Can I fetch jobId or, if it's not possible, can I pass jobId as a parameter?
The JobId will be available as part of metadata
field in Operation
object that is returned from Instantiate operation. See this [1] article for how to work with metadata.
The Airflow operator only polls [2] on the Operation but does not return the final Operation object. You could try to add a return to execute
.
Another option would to be to use dataproc rest API [3] after workflow finishes. Any labels assigned to the workflow itself will be propagated to clusters and jobs so you can do a list jobs call. For example the filter parameter could look like: filter = labels.my-label=12345
[1] https://cloud.google.com/dataproc/docs/concepts/workflows/debugging#using_workflowmetadata
[2] https://github.com/apache/airflow/blob/master/airflow/contrib/operators/dataproc_operator.py#L1376
[3] https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs/list