javarmirmiregistry

How to quickly check RMI registry?


I'm trying to implement the Raft Consensus Algorithm for a Distributed System project.

I need some very quickly way to know if a server A is reachable from a server B AND A's Distributed System is up. In other words, it could happen that the A is reachable by B but the A's cloud system isn't up yet. So I think that InetAddress.getByName(ip).isReachable(timeout); isn't enough.

Since each server's stub is renamed as the server's name, I thought to get the server's registry and then check if there exists a stub with the same name of the server: if it's not the case, then skip to the next server, otherwise execute the lookup (which can take a looong time). This is part of the code:

try {
    System.out.println("Getting "+clusterElement.getId()+"'s registry");
    Registry registry = LocateRegistry.getRegistry(clusterElement.getAddress());
    System.out.println("Checking contains:");
    if(!Arrays.asList(registry.list()).contains(clusterElement.getId())) {
        System.out.println("Server "+clusterElement.getId()+" not bound (maybe down?)!");
        continue;
    }
    System.out.println("Looking up "+clusterElement.getId()+"'s stub");
    ServerInterface stub = (ServerInterface) registry.lookup(clusterElement.getId());
    System.out.println("Asking vote to "+clusterElement.getId());
    //here methods are called on stub (exploiting costum SocketFactory)
} catch (NoSuchObjectException | java.rmi.ConnectException | java.rmi.ConnectIOException e){
    System.err.println("Candidate "+serverRMI.id+" cannot request vote to "+clusterElement.getId()+" because not reachable");
} catch (UnmarshalException e) {
    System.err.println("Candidate " + serverRMI.id + " timeout requesting vote to " + clusterElement.getId());
} catch (RemoteException e) {
    e.printStackTrace();
} catch (NotBoundException e) {
   System.out.println("Candidate "+serverRMI.id+" NotBound "+clusterElement.getId());
}

Now the problem is that the server gets stuck on the contains() line, since the message Checking contains is printed while Looking up... isn't.

Why this happens? There is any way to speed up the process? This algorithm is FULL of timeouts, so any suggestion would be really appreciated!

UPDATE: After trying every possible VM property about RMI's timeouts, like: -Dsun.rmi.transport.tcp.responseTimeout=1 -Dsun.rmi.transport.proxy.connectTimeout=1 -Dsun.rmi.transport.tcp.handshakeTimeout=1 I didn't see any difference at all, even if the an exception should have been thrown at every RMI operation (since each timeout is set to 1 ms!).

The only solution that I found out for this problem is to use this RMISocketFactory reimplementation:

final int timeoutMillis = 100;            
RMISocketFactory.setSocketFactory( new RMISocketFactory()
            {
                public Socket createSocket( String host, int port )
                        throws IOException
                {
                    Socket socket = new Socket();
                    socket.setSoTimeout(timeoutMillis);
                    socket.connect(new InetSocketAddress(host, port), timeoutMillis);
                    return socket;
                }

                public ServerSocket createServerSocket( int port )
                        throws IOException
                {
                    return new ServerSocket( port );
                }
            } );

Solution

  • It gets stuck in Registry.list(). It will time out eventually.

    You'd be better off just calling lookup() without this prior step, which doesn't add any value, and investigating all the timeout options mentioned in the two properties pages linked from the RMI Home Page.