azure-functionsazure-data-factoryazure-databricks

Orchestration control in Data Pipeline in Azure


I am designing the Data Pipeline which consumes data from Salesforce using bulk API endpoint (pull mechanism).

The data comes and lands in an ADLS Gen2 Bronze Layer.

Next transformation job will start and clean the data and push to Silver layer ADLS Gen2. The transformation will be performed by Databricks.

Push the clean records to ADLS Gen2 Silver layer, then using Databricks, I push the clean records to another Databricks environment.

My questions are :

May someone please suggest how to achieve this.

Which option is scalable, reliable and can handle high throughput?

Image : Logical Flow

Thanks a lot.


Solution

  • For your scenario, the best approach is option 1 : use an azure function for ingestion, orchestrated end-to-end with azure data factory (adf), then transform bronze to silver using databricks.

    this pattern keeps ingestion, transformation, and orchestration loosely coupled but fully automated ideal for high-throughput pipelines.

    references: