I have installed spark 2.1.1 on 2 machines but in different relative locations ie in one machine I have installed somewhere on an NTFS drive and on the other one I have installed it on an ext4 drive. I am trying to start a cluster in standalone mode with 2 slaves and a master by having 1 Master and 1 slave on 1 machine and 1 slave on other machine.
When I try to start this cluster via start-all.sh
script on master node, I get the following error :-
192.168.1.154: bash: line 0: cd: /home/<somePath>/spark-2.1.1-bin-hadoop2.7: No such file or directory
I have set proper SPARK_HOME
in respective bashrc
files. Below is my slave file (in the 1 master + 1 slave machine)
localhost
192.168.1.154
I can remotely login to the 1 slave machine via ssh. I am able to run Spark cluster individually in each machine.
It is my understanding when I try to remotely start a slave from my master machine via start-all.sh
script it is trying to goto the location where spark is installed on master node, but as on slave node the spark is installed on a different location, it fails. Can anyone please tell me how can I rectify this problem?
In start-all.sh you can find the following:
if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi
# Load the Spark configuration
. "${SPARK_HOME}/sbin/spark-config.sh"
# Start Master
"${SPARK_HOME}/sbin"/start-master.sh
# Start Workers
"${SPARK_HOME}/sbin"/start-slaves.sh
which has nothing to do with the Spark installation on the standalone master. start-all.sh
simply uses whatever SPARK_HOME
you've defined globally and uses it across all nodes in the cluster, for standalone master and workers.
In your case, I'd recommend writing a custom startup script that would start the standalone Master and workers per respective SPARK_HOME
env vars.
start-slaves.sh
(source here) does simply the following:
cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/start-slave.sh" "spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"
And so there is not much magic going on, but to ssh
to every node and execute the command line.
I think I'd even use Ansible for this.