azure-data-factoryexpressionbuilder

azure data factory, expression builder for substring_index not working


I am trying to build a return a value in azure data factory, pipeline expression builder, and its not working. From what I understand the engine behind it is apache sparx scala. if I look at the documentation for scala, substring_Index is one of the options. when I try and use substring_index, I get an error stating.... 'substring_index' is not a recognized function.

substringIndex('hello.com','.',1)

what am I not understanding here?

I've tried different variations of spellings for the function and nothing

screenshot


Solution

  • Even though ADF runs on the Apache Spark Engine, it doesn't support all the functions of Spark. It has its own Expressions and functions, and currently, substring_index() is not supported in ADF expressions.

    However, you can build an equivalent expression like below and use it as a workaround.

    I have stored the input string 'www.apache.org' in a string variable mystring and count in an integer variable count.

    Use this expression to get the desired results:

    @if(
        greaterOrEquals(int(replace(string(variables('count')),'-','')),length(split(variables('mystring'),'.'))),
            variables('mystring'),
                if(greater(variables('count'),0),
                    join(take(split(variables('mystring'),'.'),variables('count')),'.'),
                        join(skip(split(variables('mystring'),'.'),add(length(split(variables('mystring'),'.')),variables('count'))),'.')))
    

    enter image description here

    Result when count value is 2:

    enter image description here

    I have checked all the input test cases like negative count and count>count of matched chars, and it is giving expected results like substring_index() function in Spark.