I've started H2O 3 nodes instance on Hadoop (yarn), at night something went wrong at cluster and I see job container attempt 1 was killed and yarn started new job on different node attempt 2. 3 mapper job is running. How can find H2O UI now ? http://different_node:54321/ is not working, mapper nodes also is not responding at port 54321. Looks like H2O unable to restore UI application after failure.
H2O-3 cluster requires a static environment. It can't recover from the scenario when one of the nodes is killed. After such an event, a state of in-memory storage gets corrupted. The newly created node won't join the cluster since the cluster was locked after first usage. If the situation above happens, the whole cluster must be killed and new one started.
If you want to get a deterministic URL for flow UI, add -proxy
parameter to your hadoop jar command:
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -proxy
It will start a simple HTTP proxy on Hadoop edge node that will forward all the traffic to one of the H2O nodes (leader) running on a Hadoop compute node.