I am using spark v2.4.4 via the python API
According to the spark documentation I can force spark to download all the hive jars for interacting with my hive_metastore by setting the following config
spark.sql.hive.metastore.version=${my_version}
spark.sql.hive.metastore.jars=maven
However, when I run the following python code, no jar files are downloaded from maven.
from pyspark.sql import SparkSession
from pyspark import SparkConf
conf = (
SparkConf()
.setAppName("myapp")
.set("spark.sql.hive.metastore.version", "2.3.3")
.set("spark.sql.hive.metastore.jars","maven")
)
spark = (
SparkSession
.builder
.config(conf=conf)
.enableHiveSupport()
.getOrCreate()
)
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=INFO
in $SPARK_HOME/conf/log4j.properties. I can see no logging which says that spark is interacting with maven. according to this I should see an INFO level logFor anyone else trying to solve this:
spark.catalog.listDatabases()