apache-sparkapache-spark-sqlapache-spark-1.6hivecontext

How to list All Databases using HiveContext in PySpark 1.6


I am trying to list all the databases using HiveContext in Spark 1.6 but its giving me just the default database.

from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("SHOW DATABASES").show()
+-------------+
|       result|
+-------------+
|      default|
+-------------+

Solution

  • Invoking SHOW DATABASES in sql is the right approach in Spark < 2.0.

    In Spark 2.0 or later you should use pyspark.sql.catalog.Catalog.listDatabases:

    spark.catalog.listDatabases()
    

    If you don't see expected databases it typically means one of two things: