snowflake-cloud-data-platformazure-data-factorygoogle-cloud-dataflow

Azure Data Factory Data Flows - Forcing headers into upper case for mapping


I have a pipeline in Data Factory that will be using a data flow. The plan is:

Load .xlsx file into data flow from blob storage

Pass the data into a snowflake sink table

The problem I have is, this data flow will be within a ForEach container, to run the data flow against each sheet within the .xlsx file. The headers in the .xlsx file are not all upper case, whereas the column headers in Snowflake are all upper case, so the data flow can't automatically map them as its case sensitive.

Is there a way to amend the column headers from the .xlsx file to be all upper case, and overwrite the headers that are already there, instead of making new headers? I have checked online and this should be able to do it. A derived column between the source and the sink that reads each column, and rewrites them to upper()

enter image description here

However it doesn't seem to work. When I look at the 'Inspect' tab it shows the columns have been marked as updated, but they are not in upper case:

enter image description here

This means as its not upper case, it can't auto-map to the snowflake table headers that are all upper case.

Has anyone come across something like this, and a way to solve it?


Solution

  • You are using wrong transformation for your requirement. The derived column with above configurations will convert all the string rows in those columns to the upper case. It won't change the case of those column names. To deal with the column names dynamically, use a Select transformation with a Rule-based mapping as shown below.

    Take select transformation after the source and uncheck the Auto mapping. Now, click on Add mapping. If there is a schema imported in the source projection, it will show two options which are Fixed mapping and Rule-based mapping. In these select Rule-based mapping. If source schema is not imported and is empty, then it will automatically show the Rule-based mapping by default.

    In the Rule-based mapping, as you want to change the case of all columns case to upper, you can give the expression true() in the condition and give expression upper($$) as shown below.

    enter image description here

    This will change the case of all columns which you can check in the inspect option of the select transformation.

    enter image description here

    Here, column AGE is already in upper case, so it wasn't updated. If you want the case change for only string columns, you can use this expression type=="string" in the select condition. Add your snowflake sink after this and it will be mapped as expected.