mpicluster-computingconnection-refusedmpj-express

How to solve "Connection refused" error in MPJ Express?


I run my MPJ program using 5 PCs with the same name (DESKTOP-J49PIF5) but it has different IP address. It run successfully in a laboratory. But when I tried to run the same program with the same configuration in a new laboratory (different place), I got "Connection refused" error.

More info that may help. The same problem happened to my Apache Spark program, but I can solve the problem by adding "--conf “spark.driver.host=<<master_ip>>”" in the configuration. Someone said that the program can not find the driver host so we have to add that extra line in the configuration. Please note that in the previous laboratory I didn't add that line and either my MPJ and Spark program are working. <<

Now, my problem is why I got "Connection refused" error in my MPJ program? If the problem is the same as Apache Spark then how can I configure the MPJ? Perhaps by adding master_ip similar to Apache Spark? But I don't know how to do it.

Output

enter image description here

... this error is repeated for 5 PCs.


Solution

  • After struggling for a few days, finally I found the answer. The problem was in the hostnames. Each PC has a different IP address and I can ping them. But, for cluster computing instead of using IP address, it uses the hostname to contact each other so we have to give a unique hostname for every PC. I changed all hostnames and the program is running fine.