I have a generic question about Apache Spark :
We have some spark streaming scripts that consume Kafka messages. Problem : they are failing randomly without a specific error...
Some script does nothing while they are working when I run them manually, one is failing with this message :
ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries!
So I'm wondering if there is maybe a specific way to run the scripts in parallel ?
They are all in the same jar and I run them with Supervisor. Spark is installed on Cloudera Manager 5.4 on Yarn.
Here is how I launch a script :
sudo -u spark spark-submit --class org.soprism.kafka.connector.reader.TwitterPostsMessageWriter /home/soprism/sparkmigration/data-migration-assembly-1.0.jar --master yarn-cluster --deploy-mode client
Thanks for your help !
Update : I changed the command and now run this (it stops with now specific message) :
root@ns6512097:~# sudo -u spark spark-submit --class org.soprism.kafka.connector.reader.TwitterPostsMessageWriter --master yarn --deploy-mode client /home/soprism/sparkmigration/data-migration-assembly-1.0.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/avro-tools-1.7.6-cdh5.4.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/09/28 16:14:21 INFO Remoting: Starting remoting
15/09/28 16:14:21 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ns6512097.ip-37-187-69.eu:52748]
15/09/28 16:14:21 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@ns6512097.ip-37-187-69.eu:52748]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/avro-tools-1.7.6-cdh5.4.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
This issue occurs if multiple users tries to start spark session at the same time or existing spark session are not property closed
There are two ways to fix this issue.
Start new spark session on a different port as follow
spark-submit --conf spark.ui.port=5051 <other arguments>`<br>`spark-shell --conf spark.ui.port=5051
Find all spark session using ports from 4041 to 4056 and kill process using kill command, netstat and kill command can be used to find process which are occupying the port and kill the process respectively. Here's the usage:
sudo netstat -tunalp | grep LISTEN| grep 4041
Above command will produce output as below, last column is process id, in this case PID is 32028
tcp 0 0 :::4040 :::* LISTEN 32028/java
Once you find out the process id(PID) you can kill the spark process(spark-shell or spark-submit) using the below command
sudo kill -9 32028