apache-sparkapache-spark-sqlapache-spark-2.0

Where is table data stored in Spark?


Hi I'm trying to find out where SparkSQL stores the table metadata in Spark? If it is not in the Hive metastore by default, then where is it stored?


Solution

  • Here is explanation from spark-2.2.0 documentation

    When not configured by hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started. Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse.

    Here is the link: https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html