apache-sparkderby

How to get rid of derby.log, metastore_db from Spark Shell


When running spark-shell it creates a file derby.log and a folder metastore_db. How do I configure spark to put these somewhere else?

For derby log I've tried Getting rid of derby.log like so spark-shell --driver-memory 10g --conf "-spark.driver.extraJavaOptions=Dderby.stream.info.file=/dev/null" with a couple of different properties but spark ignores them.

Does anyone know how to get rid of these or specify a default directory for them?


Solution

  • The use of the hive.metastore.warehouse.dir is deprecated since Spark 2.0.0, see the docs.

    As hinted by this answer, the real culprit for both the metastore_db directory and the derby.log file being created in every working subdirectory is the derby.system.home property defaulting to ..

    Thus, a default location for both can be specified by adding the following line to spark-defaults.conf:

    spark.driver.extraJavaOptions -Dderby.system.home=/tmp/derby
    

    where /tmp/derby can be replaced by the directory of your choice.