javaapache-sparkhiveapache-spark-sqlspark-hive

Spark Java append data to Hive table


I'm facing some problem when trying to append data to an hive table. I declared the session correctly the session: I can retrieve data from the table

SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark SQL basic example")
                .config("hive.metastore.uris", "thrift://localhost:9083")

                .enableHiveSupport()
                .master("local[*]")
                .getOrCreate();

When trying to append some data using df.write().mode(SaveMode.Append).saveAsTable("sample.test_table"); I'm getting

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: hive. Please find packages at http://spark.apache.org/third-party-projects.html

What I'm missing?

EDIT: Using df.write().insertInto("prova2.test_table"); works, no idea how it works


Solution

  • saveAsTable() seems to not work as you intend. Try this instead,

    Register a temp table

    df.registerTempTable("sample.temptable")
    

    Create and insert data if the table does not exitsts already,

    sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table as select * from sample.temptable")
    

    OR simpty create it without inserting any value so that you can rerun the code without any exceptions

    sqlContext.sql("CREATE TABLE IF NOT EXISTS sample.test_table")
    

    Insert data (Table should exist)

    sqlContext.sql("insert into table sample.test_table select * from sample.temptable")
    

    Drop temp table

    sqlContext.sql("DROP TABLE IF EXISTS sample.temptable")
    

    Read more on temporary table usage