apache-sparkgoogle-cloud-dataproclivy

How to include BigQuery Connector inside Dataproc using Livy


I'm trying to run my application using Livy that resides inside GCP Dataproc but I'm getting this: "Caused by: java.lang.ClassNotFoundException: bigquery.DefaultSource"

I'm able to run hadoop fs -ls gs://xxxx inside Dataproc and I checked if Spark is pointing to the right location in order to find gcs-connector.jar and that's ok too.

I included Livy in Dataproc using initialization (https://github.com/GoogleCloudDataproc/initialization-actions/blob/master/livy/)

How can I include bigquery-connector in Livy's classpath? Could you help me, please? Thank you all!


Solution

  • It looks like your application is depending on the BigQuery connector, not the GCS connector (bigquery.DefaultSource).

    The GCS connector should always be included in the HADOOP classpath by default, but you will have to manually add the BigQuery connector jar to your application.

    Assuming this is a Spark application, you can set the Spark jar property to pull in the bigquery connector jar from GCS at runtime: spark.jars='gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar'

    For more installation options, see https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/master/README.md