airflowgoogle-cloud-dataflowapache-beamdataflow

Difference between BeamRunPythonPipelineOperator and DataFlowPythonOperator in apache airflow


I am trying to run a job in Airflow which executes a dataflow job. I realized there are 2 Operators, that are BeamRunPythonPipelineOperator and DataFlowPythonOperator, both operators can submit jobs to dataflow, but i have concern about the difference between them. Is there any difference between them? Please help me. Any help would be highly appreciated?


Solution

  • DataFlowPythonOperator was deprecated and replaced by DataflowCreatePythonJobOperator which was then deprecated and replaced by BeamRunPythonPipelineOperator.

    TLDR; use BeamRunPythonPipelineOperator as of July 2022.