I am trying to list all the databases using HiveContext in Spark 1.6 but its giving me just the default database.
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("SHOW DATABASES").show()
+-------------+
| result|
+-------------+
| default|
+-------------+
Invoking SHOW DATABASES
in sql
is the right approach in Spark < 2.0.
In Spark 2.0 or later you should use pyspark.sql.catalog.Catalog.listDatabases
:
spark.catalog.listDatabases()
If you don't see expected databases it typically means one of two things:
hive-site.xml
is not present on Spark's classpath (see Custom Hadoop/Hive Configuration in the Spark Configuration Guide).SQLContext
instead of HiveContext
in case of 1.6).