scalaudf

Scala: variadic UDF


I have a DataFrame with a many columns. I also have a function

def getFeatureVector(features:Array[String]) : Vector

that is fairly complex, but takes some strings and returns a spark MLlib vector.

Now, I want to look at some columns in the DF (I don't know which beforehand), pass them to getFeatureVector, and add a new column containing the resulting vectors.

I have access to an array of the columns I want to use, and I wrote a function that casts it to string, and makes an array column:

val colNamesToEncode = Array("col1", "col2", "col3", "col4")
def getColsToEncode:Column = {
    val cols = colNamesToEncode.map(x => col(x).cast("string"))
    array(cols:_*)
}

Finally, I try to make a udf and apply it to the DF:

val encoderUDF = udf(getFeatureVector _)
val cols = getColsToEncode()
data.withColumn(featuresColName,encoderUDF(cols))

but when I run that, I get java.lang.RuntimeException: Unsupported literal type class scala.runtime.BoxedUnit ()

How can I apply to function to the DF?

PS: I was using this answer (Spark UDF with varargs) as a guide while writing my code.


Solution

  • Just remove () from the below line, that resolved the error.

    From val cols = getColsToEncode()

    To

    val cols = getColsToEncode