sql-serverscalaapache-sparkspark-jdbc

Check table exists Spark jdbc


I am reading some data into a data frame from Microsoft SQL server using Spark JDBC. And when the table does not exist (for example, it was dropped accidentally) I get an exception: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'TestAllData'.

I would like to create some mechanism to check first whether the table exists and only then read the data. Is there a way to do that using Spark JDBC? Because I tried using if exists construct from Ms sql server, but it does not work for querying with Spark.

Currently, my code for reading the data looks like this:

     def getDataQuery() = {
    s"(select * from TestData) as subq"
  }


def jdbcOptions(dataQuery: String, partitionColumn: String, lowerBound: String, upperBound: String, numPartitions: String) = Map[String,String](
    "driver" -> config.getString("sqlserver.db.driver"),
    "url" -> config.getString("sqlserver.db.url"),
    "user" -> config.getString("sqlserver.db.user"),
    "password" -> config.getString("sqlserver.db.password"),
    "customSchema" -> config.getString("sqlserver.db.custom_schema"),
    "dbtable" -> dataQuery,
    "partitionColumn" -> partitionColumn,
    "lowerBound" -> lowerBound,
    "upperBound" -> upperBound,
    "numPartitions" -> numPartitions
  )

    val dataDF = sparkSession
      .read
      .format("jdbc")
      .options(jdbcOptions(getDataQuery()))
      .load()

Solution

  • You can check with a Query and if:

    def tableExist() = {
      s"show tables in default"
    }
    
    val existDF = sparkSession
      .read
      .format("jdbc")
      .options(jdbcOptions(tableExist()))
      .load()
    
    val dataDF = if (existDF.select("tableName").collect().map(_ (0)).contains("TestData"))
      sparkSession
        .read
        .format("jdbc")
        .options(jdbcOptions(getDataQuery()))
        .load()