I'm using Azure Databricks
solution to connect to Cassandra
. My Cassandra
instance is exposed at some specific port and accessible from cqlsh
.
Cassandra
SHOW versions returns:
[cqlsh 6.0.0 | Cassandra 3.11.10 | CQL spec 3.4.4 | Native protocol v4]
I've created Cluster
that runs on runtime:
7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)
I've installed following libraries:
com.datastax.oss:java-driver-core:4.12.0
and com.datastax.spark:spark-cassandra-connector_2.12:3.0.1
Now I'm trying to execute simple query to load data with Dataframes:
spark.read.format("org.apache.spark.sql.cassandra")
.option("spark.cassandra.connection.host", ...)
.option("spark.cassandra.auth.username", ...)
.option("spark.cassandra.auth.password", ...)
.option("table", ...)
.option("keyspace", ...)
.load()
In response I'm getting:
java.io.IOException: Failed to open native connection to Cassandra at :: Could not initialize class com.datastax.oss.driver.internal.core.config.typesafe.TypesafeDriverConfig
How can I correctly initialize connection?
You need to use spark-cassandra-connector-assembly
(Maven Central) instead of spark-cassandra-connector
. The reason - Spark Cassandra Connector uses newer version of Typesafe Config library than Databricks runtime. The assembly version includes all necessary libraries as shaded versions. And you don't need to install java-driver-core
- it will be pulled as dependency automatically.