scalaapache-sparkapache-spark-sqlinstr

Instr function in Spark with 3 arguments


I have a problem with using instr() function in Spark. The definition of function looks like below: instr(Column str, String substring)

I want to use instr in the same way as it is in Impala like:

instr(Column str, String substring, Int [position]) - return index position

In spark we option to give only 2 parameters, but i need to use 3rd parameter with int value basically (-1)

Col has value like

SNNNN NNNNN NNSNN SNSNS NNNNS

Expected code:- instr("ColName", "S", -1) Expected result :- 1 0 3 5 5


Solution

  • If you wanted to go with -ve position number, substring_index + length might be helpful as below-

     val frame = Seq("SNNNN","NNNNN","NNSNN","SNSNS","NNNNS").toDF("values")
        frame.withColumn("x", length($"values") - length(substring_index($"values", "S", -1)))
          .show(false)
        /**
          * +------+---+
          * |values|x  |
          * +------+---+
          * |SNNNN |1  |
          * |NNNNN |0  |
          * |NNSNN |3  |
          * |SNSNS |5  |
          * |NNNNS |5  |
          * +------+---+
          */