I have three physical nodes with docker installed on them. I have one docker container with Mesos, Marathon, Hadoop and Flink. I configured Master node and Slave nodes for Mesos,Zookeeper and Marathon. I do these works step by step. First, In Master node, I enter to docker container with this command:
docker run -v /home/user/.ssh:/root/.ssh --privileged -p 5050:5050 -p 5051:5051 -p 5052:5052 -p 2181:2181 -p 8082:8081 -p 6123:6123 -p 8080:8080 -p 50090:50090 -p 50070:50070 -p 9000:9000 -p 2888:2888 -p 3888:3888 -p 4041:4040 -p 7077:7077 -p 52222:22 -e WEAVE_CIDR=10.32.0.2/12 -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins -e LIBPROCESS_IP=10.32.0.2 -e MESOS_RESOURCES=ports*:[11000-11999] -ti hadoop_marathon_mesos_flink_2 /bin/bash
Then run Mesos and Zookeeper :
/home/zookeeper-3.4.14/bin/zkServer.sh restart
/home/mesos-1.7.2/build/bin/mesos-master.sh --ip=10.32.0.1 --hostname=10.32.0.1 --roles=marathon,flink --quorum=1 --work_dir=/var/run/mesos --log_dir=/var/log/mesos
After that run Marathon in the same container:
/home/marathon-1.7.189-48bfd6000/bin/marathon --master 10.32.0.1:5050 --zk zk://10.32.0.1:2181/marathon --hostname 10.32.0.1 --webui_url 10.32.0.1:8080 --logging_level debug
And finally, I run hadoop:
/opt/hadoop/sbin/start-dfs.sh
Marathon,Mesos and Hadoop are run without any problems. The most important part of my work is running Flink in Marathon. I configured Flink in docker container like this:
env.java.home: /opt/java
jobmanager.rpc.address: 10.32.0.1
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha/
high-availability.zookeeper.quorum: 10.32.0.1:2181,10.32.0.2:2181
recovery.zookeeper.path.mesos-workers: /mesos-workers
In Marathon UI, I create Application and put this JSON file on it, but it is failed.
{
"id": "flink",
"cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh
-Dmesos.master=10.32.0.1:5050,10.32.0.2:5050
-Dmesos.initial-tasks=1",
"cpus": 1.0,
"mem": 1024
}
Flink application is failed in Mesos UI. It shows this error:
I0428 06:01:39.586699 6155 exec.cpp:162] Version: 1.7.2
I0428 06:01:39.596458 6154 exec.cpp:236] Executor registered on agent 984595ae-e811-48fb-a9f5-ca6128e1cc1a-S0
I0428 06:01:39.598870 6157 executor.cpp:188] Received SUBSCRIBED event
I0428 06:01:39.599761 6157 executor.cpp:192] Subscribed executor on 10.32.0.3
I0428 06:01:39.599963 6157 executor.cpp:188] Received LAUNCH event
I0428 06:01:39.601236 6157 executor.cpp:697] Starting task flink.16a7cc18-697b-11e9-928f-ce235caa831e
I0428 06:01:39.613719 6157 executor.cpp:712] Forked command at 6163
I0428 06:01:39.787395 6157 executor.cpp:1013] Command exited with status 1 (pid: 6163)
I0428 06:01:40.791885 6162 process.cpp:927] Stopped the socket accept loop
The strange thing is that in STDout, I see this text; even though I set JAVA_HOME in /etc/environment and flink-conf.yam.
Please specify JAVA_HOME. Either in Flink config ./conf/flink-conf.yaml or as system-wide JAVA_HOME.
Would you please tell me what I should do for that problem?
Many Thanks.
You can check your Flink log in Slave node. Also, it is better to change your JSON file like this. It helps you to follow your application.
{
"id": "flink",
"cmd": "/home/flink-1.7.0/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024
-Djobmanager.rpc.port=6123 -Drest.port=8081
-Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024
-Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2
-Dmesos.resourcemanager.tasks.cpus=1",
"cpus": 1.0,
"mem": 1024,
"fetch": [
{
"uri": "/home/flink-1.7.0/bin/mesos-appmaster.sh",
"executable": true
}
]
}
Also, JAVA_HOME to Flink_conf.yaml in every nodes, Master and Slaves.
env.java.home: /opt/java
With adding JAVA_HOME, you do not see the error in STDOUT.
I think it is useful.