azure-data-factorywildcardazure-synapsedynamic-content

Synapse pipeline - extract year and country from a filename in a wildcard path


I have files named File 2024 US adj.csv. I have a wild card path in a copy activity which ingests all files in an ADLS location and passes the year as a wild card. I'm trying to create two new columns in the files for year and country but don't know how to write the dynamic content to be able to extract these values from the file name. Any help is greatly appreciated.


Solution

    1. GetMetadata1: This activity gets metadata for all child items in the ADLS location and returns the childItems field.

    2. Filter1:

    Expression:

    Items: @activity('Get Metadata1').output.childItems

    Condition: @startswith(item().name,'2024')

    This activity filters the child items to include only files that start with "2024".

    1. ForEach1:

    Expression: Items:@activity('Filter 1').output.value

    This activity loops through each file and copies the data to a sink dataset.

    3.1. copy activity: Take the copy activity inside the for-each activity. In source settings, add additional columns.

    Additional columns expression:

    year: @substring(item().name,0,4) country: @substring(item().name,5,2)

    The year and country columns are added as additional columns in the source dataset.