I am building a Data flow pipeline which will call control flow pipeline.
The solution I am trying to achieve is to implement the Fuzzy logic for two master table to find relevant matching. As show on below screen capture, I am building the Data flow to connect to the underline tables in the same database, but connection is the same. I want to parameterise the table names hence I can use the same generic connection rather than creating new one for each table.
enter image description here enter image description here
my issues are:
I tried creating parameters on the control flow and it is appeared to be working, however I can't brows the table to pick the join columns from table 1 (source 1 ) table 2 ( source 2 )
Please see below where it is using BLOB storage, enter image description here
You need to use dataset parameters for this scenario.
Go to your dataset and create a string type parameter like below.
In the dataset, click on edit and use the parameter for the table name like this @dataset().table_name
.
Make sure the Schema in the above dataset is empty. Now, give the dataset for the two sources in the dataflow.
For join, the incoming schema should not be empty. That means you should import the schema in the sources.
For that, go to Dataflow debug -> parameters -> give your table names in the parameters.
Now, go to the sources of the dataflow and import the mapping like below.
Similarly, do the same for the second source as well.
Give the columns in the join like below and set the fuzzy logic as per your requirement.
You can see the result in the preview of the join transformation. Give your sink dataset. All this process is for the debug of the join transformation.
To run the dataflow from pipeline, you need to give your table names for the parameters in the dataflow activity in the pipeline like below.
Run the pipeline and dataflow will succeed like mine.