azure-data-factoryoracle-adffuzzy-logic

How to parametrise the Data flow source 1 and source 2 in the same SQL DB but two different tables


I am building a Data flow pipeline which will call control flow pipeline.

The solution I am trying to achieve is to implement the Fuzzy logic for two master table to find relevant matching. As show on below screen capture, I am building the Data flow to connect to the underline tables in the same database, but connection is the same. I want to parameterise the table names hence I can use the same generic connection rather than creating new one for each table.

enter image description here enter image description here

my issues are:

  1. How can I send the table names from control flow to data flow via parameter
  2. How can I brown the table's column names on the join condition as shown on the screen capture

I tried creating parameters on the control flow and it is appeared to be working, however I can't brows the table to pick the join columns from table 1 (source 1 ) table 2 ( source 2 )

Please see below where it is using BLOB storage, enter image description here


Solution

  • You need to use dataset parameters for this scenario.

    Go to your dataset and create a string type parameter like below.

    enter image description here

    In the dataset, click on edit and use the parameter for the table name like this @dataset().table_name.

    enter image description here

    Make sure the Schema in the above dataset is empty. Now, give the dataset for the two sources in the dataflow.

    For join, the incoming schema should not be empty. That means you should import the schema in the sources.

    For that, go to Dataflow debug -> parameters -> give your table names in the parameters.

    enter image description here

    Now, go to the sources of the dataflow and import the mapping like below.

    enter image description here

    Similarly, do the same for the second source as well.

    Give the columns in the join like below and set the fuzzy logic as per your requirement.

    enter image description here

    You can see the result in the preview of the join transformation. Give your sink dataset. All this process is for the debug of the join transformation.

    To run the dataflow from pipeline, you need to give your table names for the parameters in the dataflow activity in the pipeline like below.

    enter image description here

    Run the pipeline and dataflow will succeed like mine.