azureazure-data-factorygigya

How to dynamically select given columns from a SAP CDC linked service?


In Azure Data Factory, I have a dynamic pipeline that extracts data from our SAP systems. The metadata for each given table (which are around 2000) is extracted via a lookup from which I get the name, ODP context, key columns and connection related details, like so: Metadata output

Afterwards, I have a data flow that extracts the data and sinks it to a .parquet file: Pipeline DF - Source settings DF - Source settings Sink Settings Sink mapping

I can figure out how to assign most of the parameters dynamically, but I struggle to figure out how to make a selection only of the columns given in my metadata (KeyColumns) and not the entire object. I am running only full loads.

I have tried to run it as Full load then incremental, where I parsed the key columns to the Key Columns parameter, but not surprising, the whole object was loaded since this parameter indicates columns used for de-duplication.


Solution

  • To make a selection only of the columns given in the metadata and not the entire object, you need to use select transformation. Giving the expression in Key columns, does not help you in selecting the desired columns dynamically. Key columns are used for incremental data loading. Below is the way to do using select transformation.

    Here replace <array_parameter_name> with parameter name which contains the array of column names. name is the keyword which is used here to check if any of the column names matches with value in the array.