javacluster-computingh2olocal-network

Setting up a multinode h2o cluster via R: Trouble when joining with the second node


I tried to set up a local cluster with just 2 nodes, via h2o. I tried to set up the computers in the terminal like it is specified under :https://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/deployment/multinode.html . Although my second node says for a short period that he joined, he exists right after due to "Attempting to join an H2O cloud that is no longer accepting new H2O nodes from"

I really would appreciate your help since I am quite new in this area.

Running MacOS 10.15.4, r version 4.00, spark version Please have a look on the terminal output:

Cannot load library from path lib/osx_64/libxgboost4j_gpu.dylib
Cannot load library from path lib/libxgboost4j_gpu.dylib
Failed to load library from both native path and jar!
Cannot load library from path lib/osx_64/libxgboost4j_omp.dylib
Cannot load library from path lib/libxgboost4j_omp.dylib
Failed to load library from both native path and jar!
05-21 22:40:23.640 192.168.1.168:54321   2272         main  INFO water.default: ----- H2O started  -----
05-21 22:40:23.641 192.168.1.168:54321   2272         main  INFO water.default: Build git branch: master
05-21 22:40:23.641 192.168.1.168:54321   2272         main  INFO water.default: Build git hash: d3d24b7c6059f15c6b6333a84ccb70e70bc5d3dc
05-21 22:40:23.641 192.168.1.168:54321   2272         main  INFO water.default: Build git describe: jenkins-master-5076-39-gd3d24b7
05-21 22:40:23.642 192.168.1.168:54321   2272         main  INFO water.default: Build project version: 3.31.0.5077
05-21 22:40:23.642 192.168.1.168:54321   2272         main  INFO water.default: Build age: 16 hours and 8 minutes
05-21 22:40:23.642 192.168.1.168:54321   2272         main  INFO water.default: Built by: 'jenkins'
05-21 22:40:23.642 192.168.1.168:54321   2272         main  INFO water.default: Built on: '2020-05-21 06:31:36'
05-21 22:40:23.642 192.168.1.168:54321   2272         main  INFO water.default: Found H2O Core extensions: [XGBoost, KrbStandalone]
05-21 22:40:23.642 192.168.1.168:54321   2272         main  INFO water.default: Processed H2O arguments: [-flatfile, flatfile.txt, -port, 54321]
05-21 22:40:23.643 192.168.1.168:54321   2272         main  INFO water.default: Java availableProcessors: 8
05-21 22:40:23.643 192.168.1.168:54321   2272         main  INFO water.default: Java heap totalMemory: 123,0 MB
05-21 22:40:23.643 192.168.1.168:54321   2272         main  INFO water.default: Java heap maxMemory: 17,78 GB
05-21 22:40:23.643 192.168.1.168:54321   2272         main  INFO water.default: Java version: Java 1.8.0_251 (from Oracle Corporation)
05-21 22:40:23.643 192.168.1.168:54321   2272         main  INFO water.default: JVM launch parameters: [-Xmx20g]
05-21 22:40:23.643 192.168.1.168:54321   2272         main  INFO water.default: JVM process id: 2272@Mihais-iMac-2.local
05-21 22:40:23.643 192.168.1.168:54321   2272         main  INFO water.default: OS version: Mac OS X 10.15.4 (x86_64)
05-21 22:40:23.644 192.168.1.168:54321   2272         main  INFO water.default: Machine physical memory: 8,00 GB
05-21 22:40:23.644 192.168.1.168:54321   2272         main  INFO water.default: Machine locale: de_DE
05-21 22:40:23.644 192.168.1.168:54321   2272         main  INFO water.default: X-h2o-cluster-id: 1590093617567
05-21 22:40:23.644 192.168.1.168:54321   2272         main  INFO water.default: User name: 'Max'
05-21 22:40:23.644 192.168.1.168:54321   2272         main  INFO water.default: IPv6 stack selected: false
05-21 22:40:23.645 192.168.1.168:54321   2272         main  INFO water.default: Network address/interface is not reachable in 150ms: /fe80:0:0:0:8045:3098:f44:71b2%utun1/name:utun1 (utun1)
05-21 22:40:23.645 192.168.1.168:54321   2272         main  INFO water.default: Network address/interface is not reachable in 150ms: /fe80:0:0:0:2118:84f2:fc7a:7ea1%utun0/name:utun0 (utun0)
05-21 22:40:23.645 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: llw0 (llw0), fe80:0:0:0:6084:87ff:fe44:8adf%llw0
05-21 22:40:23.645 192.168.1.168:54321   2272         main  INFO water.default: Network address/interface is not reachable in 150ms: /fe80:0:0:0:6084:87ff:fe44:8adf%awdl0/name:awdl0 (awdl0)
05-21 22:40:23.645 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: en1 (en1), fe80:0:0:0:95:3412:ed28:553f%en1
05-21 22:40:23.645 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: en1 (en1), 192.168.1.87
05-21 22:40:23.646 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: en0 (en0), fe80:0:0:0:8ca:d39f:9e33:5b4d%en0
05-21 22:40:23.646 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: en0 (en0), 192.168.1.168
05-21 22:40:23.646 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: lo0 (lo0), fe80:0:0:0:0:0:0:1%lo0
05-21 22:40:23.646 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: lo0 (lo0), 0:0:0:0:0:0:0:1%lo0
05-21 22:40:23.646 192.168.1.168:54321   2272         main  INFO water.default: Possible IP Address: lo0 (lo0), 127.0.0.1
05-21 22:40:23.646 192.168.1.168:54321   2272         main  WARN water.default: Multiple local IPs detected:
05-21 22:40:23.647 192.168.1.168:54321   2272         main  WARN water.default:   /192.168.1.87  /192.168.1.168
05-21 22:40:23.647 192.168.1.168:54321   2272         main  WARN water.default: Attempting to determine correct address...
05-21 22:40:23.647 192.168.1.168:54321   2272         main  WARN water.default: Using /192.168.1.168
05-21 22:40:23.647 192.168.1.168:54321   2272         main  INFO water.default: H2O node running in unencrypted mode.
05-21 22:40:23.648 192.168.1.168:54321   2272         main  INFO water.default: Internal communication uses port: 54322
05-21 22:40:23.649 192.168.1.168:54321   2272         main  INFO water.default: Listening for HTTP and REST traffic on http://192.168.1.168:54321/
05-21 22:40:23.653 192.168.1.168:54321   2272         main  WARN water.default: -flatfile specified but not found: flatfile.txt
05-21 22:40:23.653 192.168.1.168:54321   2272         main  INFO water.default: H2O cloud name: 'Max' on /192.168.1.168:54321, static configuration based on -flatfile flatfile.txt
05-21 22:40:23.654 192.168.1.168:54321   2272         main  INFO water.default: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
05-21 22:40:23.654 192.168.1.168:54321   2272         main  INFO water.default:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 Max@192.168.1.168'
05-21 22:40:23.654 192.168.1.168:54321   2272         main  INFO water.default:   2. Point your browser to http://localhost:55555
05-21 22:40:24.102 192.168.1.168:54321   2272         main  INFO water.default: Log dir: '/tmp/h2o-Max/h2ologs'
05-21 22:40:24.102 192.168.1.168:54321   2272         main  INFO water.default: Cur dir: '/Users/Max/Downloads/h2o-3.31.0.5077'
05-21 22:40:24.108 192.168.1.168:54321   2272         main  INFO water.default: Subsystem for distributed import from HTTP/HTTPS successfully initialized
05-21 22:40:24.108 192.168.1.168:54321   2272         main  INFO water.default: HDFS subsystem successfully initialized
05-21 22:40:24.111 192.168.1.168:54321   2272         main  INFO water.default: S3 subsystem successfully initialized
05-21 22:40:24.122 192.168.1.168:54321   2272         main  INFO water.default: GCS subsystem successfully initialized
05-21 22:40:24.123 192.168.1.168:54321   2272         main  INFO water.default: Flow dir: '/Users/Max/h2oflows'
05-21 22:40:24.134 192.168.1.168:54321   2272         main  INFO water.default: Cloud of size 1 formed [/192.168.1.168:54321]
05-21 22:40:24.141 192.168.1.168:54321   2272         main  INFO water.default: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, CSV]
05-21 22:40:24.141 192.168.1.168:54321   2272         main  INFO water.default: XGBoost extension initialized
05-21 22:40:24.142 192.168.1.168:54321   2272         main  INFO water.default: KrbStandalone extension initialized
05-21 22:40:24.142 192.168.1.168:54321   2272         main  INFO water.default: Registered 2 core extensions in: 300ms
05-21 22:40:24.143 192.168.1.168:54321   2272         main  INFO water.default: Registered H2O core extensions: [XGBoost, KrbStandalone]
05-21 22:40:24.272 192.168.1.168:54321   2272         main  INFO hex.tree.xgboost.XGBoostExtension: Found XGBoost backend with library: xgboost4j_minimal
05-21 22:40:24.272 192.168.1.168:54321   2272         main  WARN hex.tree.xgboost.XGBoostExtension: Your system supports only minimal version of XGBoost (no GPUs, no multithreading)!
05-21 22:40:24.350 192.168.1.168:54321   2272         main  INFO water.default: Registered: 214 REST APIs in: 207ms
05-21 22:40:24.351 192.168.1.168:54321   2272         main  INFO water.default: Registered REST API extensions: [Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4]
05-21 22:40:24.474 192.168.1.168:54321   2272         main  INFO water.default: Registered: 284 schemas in 123ms
05-21 22:40:24.474 192.168.1.168:54321   2272         main  INFO water.default: H2O started in 6901ms
05-21 22:40:24.474 192.168.1.168:54321   2272         main  INFO water.default: 
05-21 22:40:24.474 192.168.1.168:54321   2272         main  INFO water.default: Open H2O Flow in your web browser: http://192.168.1.168:54321
05-21 22:40:24.474 192.168.1.168:54321   2272         main  INFO water.default: 
05-21 22:40:40.217 192.168.1.168:54321   2272   1.91:54321 ERROR water.default: Attempting to join an H2O cloud that is no longer accepting new H2O nodes from /192.168.1.91:54321
05-21 22:40:40.220 192.168.1.168:54321   2272   1.91:54321 FATAL water.default: Exiting.```

Solution

  • The documentation you are following is for version 2.6 (the copyright at the bottom is 2013!), but you are running version 3.31.

    I normally start H2O-related searches here: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html

    To set up a multinode cluster, I've used the flatfile approach, and then start each of them from the commandline. Make sure all your nodes have started up and found each other (you can see this by watching the logs), before you try to connect to any node in the cluster, otherwise you will see the message you get.