scalaapache-sparksparkcorespark-redshift

check whether is spark format exists or not


Context

Spark reader has the function format, which is used to specify a data source type, for example, JSON, CSV or third party com.databricks.spark.redshift

Help

how can I check whether a third-party format exists or not, let me give a case

What I tried

I looking for a proper & reliable solution


Solution

  • May this answer help you.

    To only check whether is spark format exists or not,

    spark.read.format("..").load() in try/catch

    is enough.

    And as all data sources usually register themselves using DataSourceRegister interface (and use shortName to provide their alias):

    You can use Java's ServiceLoader.load method to find all registered implementations of DataSourceRegister interface.

    import java.util.ServiceLoader
    import org.apache.spark.sql.sources.DataSourceRegister
    
    val formats = ServiceLoader.load(classOf[DataSourceRegister])
    
    import scala.collection.JavaConverters._
    formats.asScala.map(_.shortName).foreach(println)