google-cloud-platformgoogle-cloud-dataflowapache-beambeam-sql

What's the difference between Dataflow sql, Beam SQL (Zeta sql or CALCITE SQL)?


While browsing I just came across Dataflow SQL. Is it any different from beamSQL?


Solution

  • Apache Beam SQL is a functionality of Apache Beam that allows you to execute queries directly from your pipeline.

    As you can see here, Beam SQL has two options of SQL syntax: Beam Calcite SQL and Zeta SQL. The advantage of using Zeta SQL is that its very similar to BigQuery's syntax hence its useful in pipelines that read from or write to BigQuery.

    Dataflow SQL is a functionality of Dataflow that allows you to create pipelines directly from a BigQuery query. It's said in the documentation that it supports the Zeta SQL syntax (BigQuery syntax).

    To create a new Dataflow job through the BigQuery's console, to the following steps:

    1. Go to BigQuery's console
    2. Just under the Query editor, click in More and then in Query settings
    3. Select Cloud Dataflow engine in the first option as you can see below

    enter image description here

    After that, you can click in Create Cloud Dataflow job and your query will become a job in Dataflow.

    I hope it helps