azure-data-factoryazure-sql

Azure Data Factory - Inner join a table with a SQL query inside stored procedure activity


Situation:

I have two databases DB1 and DB2. DB1 has 9 tables (out of these 9, any of the tables could get updated with only column to track that change i.e TIME_STAMP column).

Now I am trying to bring only the Updated/New records from these 9 tables from DB1 to DB2 (each table in DB1 map to one table in DB2) using Azure Data Factory and then run my final query on the tables in DB2. I have only read access on DB1.

Logic:

  1. Find the set of the Ids using UNION which got updated/inserted in these 9 tables
  2. Inner join the result of the UNION query with each of the 9 tables individually

Union Query Example:

SELECT DISTINCT idcolumn 
FROM table1
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z' 

UNION 

SELECT DISTINCT idcolumn 
FROM table2
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z'

UNION 

SELECT DISTINCT idcolumn 
FROM table3
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z'

UNION 
    
SELECT DISTINCT idcolumn 
FROM table4
WHERE TIME_STAMP BETWEEN '2020-11-06T13:14:13.807Z' AND  '2020-11-07T13:14:13.807Z' 

And similarly for 5 other tables.

Problem:

Is there a way in ADF with which I can find the union query result only once and then join the resultant data with all the 9 tables in one go rather than running the union query 9 times with each table in 9 different copy activity?


Solution

  • You can achieve that with Data Flow. For example,

    1. Create a Source 1: run the query to get source dataset.
    2. Create Source 2.
    3. Add a Join active to join with Source 1 and Source 2.

    Here's the data flow overview: enter image description here

    Just with Copy active in ADF pipeline, it's impossible. We can not join the A Copy active source to B copy active's source.