scalaapache-sparkspark-shell

Scala/Spark determine the path of external table


I am having one external table on on gs bucket and to do some compaction logic, I want to determine the full path on which the table is created.

val tableName="stock_ticks_cow_part"
val primaryKey="key"
val versionPartition="version"
val datePartition="dt"
val datePartitionCol=new org.apache.spark.sql.ColumnName(datePartition)

import spark.implicits._

val compactionTable = spark.table(tableName).withColumnRenamed(versionPartition, "compaction_version").withColumnRenamed(datePartition, "date_key")
compactionTable. <code for determining the path>

Let me know if anyone knows how to determine the table path in scala.


Solution

  • I think you can use .inputFiles to

    Returns a best-effort snapshot of the files that compose this Dataset

    Be aware that this returns an Array[String], so you should loop through it to get all information you're looking for.

    So actually just call

    compactionTable.inputFiles
    

    and look at each element of the Array