apache-sparkibm-cloudcompose-dbscyllaanalytics-engine

WARN Session: Error creating pool to /xxx.xxx.xxx.xxx:28730


I'm trying to connect to a ScyllaDB database running on IBM Cloud from Spark 2.3 running on IBM Analytics Engine.

I'm starting the spark shell like so ...

$ spark-shell --master local[1] \
       --files jaas.conf \
       --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0,datastax:spark-cassandra-connector:2.3.0-s_2.11,commons-configuration:commons-configuration:1.10 \
       --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf" \
       --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf" \
       --conf spark.cassandra.connection.host=xxx1.composedb.com,xxx2.composedb.com,xxx3.composedb.com \
       --conf spark.cassandra.connection.port=28730 \
       --conf spark.cassandra.auth.username=scylla \
       --conf spark.cassandra.auth.password=SECRET \
       --conf spark.cassandra.connection.ssl.enabled=true \
       --num-executors 1  \
       --executor-cores 1 

Then executing the following spark scala code:

import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._

val stocksRdd = sc.cassandraTable("stocks", "stocks")

stocksRdd.count()

However, I see a bunch of warnings:

18/08/23 10:11:01 WARN Cluster: You listed xxx1.composedb.com/xxx.xxx.xxx.xxx:28730 in your contact points, but it wasn't found in the control host's system.peers at startup
18/08/23 10:11:01 WARN Cluster: You listed xxx1.composedb.com/xxx.xxx.xxx.xxx:28730 in your contact points, but it wasn't found in the control host's system.peers at startup
18/08/23 10:11:06 WARN Session: Error creating pool to /xxx.xxx.xxx.xxx:28730
com.datastax.driver.core.exceptions.ConnectionException: [/xxx.xxx.xxx.xxx:28730] Pool was closed during initialization
...

However, after the stacktrace in the warning, I see the output I am expecting:

res2: Long = 4 

If I navigate to the compose UI, I see a map json:

[
  {"xxx.xxx.xxx.xxx:9042":"xxx1.composedb.com:28730"},
  {"xxx.xxx.xxx.xxx:9042":"xxx2.composedb.com:28730"},
  {"xxx.xxx.xxx.xxx:9042":"xxx3.composedb.com:28730"}
]

It seems the warning is related to the map file.

What are the implications of the warning? Can I ignore it?


NOTE: I've seen a similar question, however I believe this question is different because of the map file and I have no control over how the scylladb cluster has been setup by Compose.


Solution

  • This is just warning. The warning is happening because the IPs that spark is trying to reach are not know to Scylla itself. Apparently Spark is connecting to the cluster and retrieving the expected information so you should be fine.