stringscalaapache-sparkexpressionscala-quasiquotes

Convert String expression to actual working instance expression


I am trying to convert an expression in Scala that is saved in database as String back to working code.

I have tried Reflect Toolbox, Groovy, etc. But I can't seem to achieve what I require.

Here's what I tried:


import scala.reflect.runtime.universe._
import scala.reflect.runtime.currentMirror
import scala.tools.reflect.ToolBox

val toolbox = currentMirror.mkToolBox()
val code1 = q"""StructType(StructField(id,IntegerType,true), StructField(name,StringType,true), StructField(tstamp,TimestampType,true), StructField(date,DateType,true))"""
val sType = toolbox.compile(code1)().asInstanceOf[StructType]

where I need to use the sType instance for passing customSchema to csv file for dataframe creation but it seems to fail.

Is there any way I can get the string expression of the StructType to convert into actual StructType instance? Any help would be appreciated.


Solution

  • If StructType is from Spark and you want to just convert String to StructType you don't need reflection. You can try this:

    import org.apache.spark.sql.catalyst.parser.LegacyTypeStringParser
    import org.apache.spark.sql.types.{DataType, StructType}
    
    import scala.util.Try
    
    def fromString(raw: String): StructType =
      Try(DataType.fromJson(raw)).getOrElse(LegacyTypeStringParser.parse(raw)) match {
        case t: StructType => t
        case _             => throw new RuntimeException(s"Failed parsing: $raw")
      }
    
    val code1 =
      """StructType(Array(StructField(id,IntegerType,true), StructField(name,StringType,true), StructField(tstamp,TimestampType,true), StructField(date,DateType,true)))"""
    fromString(code1) // res0: org.apache.spark.sql.types.StructType
    

    The code is taken from the org.apache.spark.sql.types.StructType companion object from Spark. You cannot use it directly as it's in private package. Moreover, it uses LegacyTypeStringParser so I'm not sure if this is good enough for Production code.