hivepysparkmetastorehive-metastore

Can't access external Hive metastore with Pyspark


I am trying to run a simple code to simply show databases that I created previously on my hive2-server. (note in this example there are both, examples in python and scala both with the same results).

If I log in into a hive shell and list my databases I see a total of 3 databases.

When I start Spark shell(2.3) on pyspark I do the usual and add the following property to my SparkSession:

sqlContext.setConf("hive.metastore.uris","thrift://*****:9083")

And re-start a SparkContext within my session.

If I run the following line to see all the configs:

pyspark.conf.SparkConf().getAll()
spark.sparkContext._conf.getAll()

I can indeed see the parameter has been added, I start a new HiveContext:

hiveContext = pyspark.sql.HiveContext(sc)

But If I list my databases:

hiveContext.sql("SHOW DATABASES").show()

It will not show the same results from the hive shell.

I'm a bit lost, for some reason it looks like it is ignoring the config parameter as I am sure the one I'm using it's my metastore as the address I get from running:

hive -e "SET" | grep metastore.uris

Is the same address also if I run:

ses2 = spark.builder.master("local").appName("Hive_Test").config('hive.metastore.uris','thrift://******:9083').getOrCreate()
ses2.sql("SET").show()

Could it be a permission issue? Like some tables are not set to be seen outside the hive shell/user.

Thanks


Solution

  • Managed to solve the issue, because a communication issue the Hive was not hosted in that machine, corrected the code and everything fine.