I'm facing the following problem:
def my_func(table, usr, psswrd):
from pyspark import SparkContext, SQLContext, HiveContext, SparkConf
sconf = SparkConf()
sconf.setAppName('TEST')
sconf.set("spark.master", "local[2]")
sc = SparkContext(conf=sconf)
hctx = HiveContext(sc)
## Initialize variables
df = hctx.read.format("jdbc").options(url=url,
user=usr,
password=psswd,
driver=driver,
dbtable=table).load()
pd_df = df.toPandas()
sc.stop()
return pd_df
The problem here is the persistence of HiveContext (i.e if I do hctx._get_hive_ctx() it returns JavaObject id=Id) So if I use my_func several times in the same script it will failed at the second time. I would try remove the HiveContext which is apparently not deleted when I stop the SparkContext.
Thanks
Removing HiveContext is not possible as some state persists after sc.stop() that makes it not work in some cases.
But you could have a workaround for this (caution!! it's dangerous) if it's feasible for you. You have to delete the metastore_db everytime you start/stop your sparkContext. Again, see if it's feasible for you. The code is Java is below (in your case you have to modify it in Python).
File hiveLocalMetaStorePath = new File("metastore_db");
FileUtils.deleteDirectory(hiveLocalMetaStorePath);
You can better understand it from the following links.