javaazureakkaakka-clusterakka-remote-actor

Akka remote routees hostname configuration issue


I am experiencing the akka remote feature for a tool I am making. Actually, I was able to make core and remote systems works in the same host with diferent ports. Note that my remote servers runs over a router, as explained in akka docs.

Now I am trying to use several azure virtual machines to make a better experiment but I am experiencing some issues.

The core application has the following configuration (I've changed some names for security reasons):

akka.actor.deployment {
  /querierActor/querierPool {
    router = round-robin-pool
    nr-of-instances = 12
    target.nodes = [
       "akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560"
      ,"akka.tcp://SYSTEM@remote-srv02.cloudapp.net:2560"
      ,"akka.tcp://SYSTEM@remote-srv03.cloudapp.net:2560"

    ]
  }
}

// remote configuration. Use it for multiple machines calculation
akka {
  actor {
    provider = "akka.remote.RemoteActorRefProvider"
  }
  remote {
    enabled-transports = ["akka.remote.netty.tcp"]
    netty.tcp {
      maximum-frame-size = 100MiB
      port = 2552
      hostname = "0.0.0.0"
    }
  }
}

While the remote hosts has the following configuration:

akka.actor.deployment {
  /querierActor/querierPool {
    router = balancing-pool
    nr-of-instances = 15
  }
}

akka {
  actor {
    provider = "akka.remote.RemoteActorRefProvider"
  }
  remote {
    enabled-transports = ["akka.remote.netty.tcp"]
    netty.tcp {
      maximum-frame-size = 100MiB
      hostname = "0.0.0.0"
      port = 2560
    }
  }
}

Using this configuration, server and remote hosts apparently are able to comunicate but the remote host start to log some errors:

 [ERROR] [01/17/2015 12:55:05.734] [SYSTEM-akka.remote.default-remote-dispatcher-16] [akka.tcp://SYSTEM@0.0.0.0:2560/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FSYSTEM%400.0.0.0%3A2552-0/endpointWriter] dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560/]] arriving at [akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560] inbound addresses are [akka.tcp://SYSTEM@0.0.0.0:2560]

And after while, server and remote host starts to log error and freeze.

Server error:

[WARN] [01/17/2015 12:21:05.658] [CRAWLER-LD-akka.remote.default-remote-dispatcher-7] [akka.tcp://SYSTEM@0.0.0.0:2552/system/remote-watcher] Detected unreachable: [akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560]
[WARN] [01/17/2015 12:21:05.664] [SYSTEM-akka.remote.default-remote-dispatcher-17] [Remoting] Association to [akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560] with unknown UID is reported as quarantined, but address cannot be quarantined without knowing the UID, gating instead for 5000 ms.

(...)

[INFO] [01/17/2015 12:21:05.712] [SYSTEM-akka.actor.default-dispatcher-6] [akka://SYSTEM/user/querierActor/querierPool] Message [akka.dispatch.sysmsg.DeathWatchNotification] from Actor[akka://SYSTEM/user/querierActor/querierPool#-1217916605] to Actor[akka://SYSTEM/user/querierActor/querierPool#-1217916605] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

(...) 

Remote error (similar lines several times):

(...)
[ERROR] [01/17/2015 14:21:16.371] [SYSTEM-akka.remote.default-remote-dispatcher-16] [akka.tcp://SYSTEM@0.0.0.0:2560/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FSYSTEM%400.0.0.0%3A2552-2/endpointWriter] dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560/]] arriving at [akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560] inbound addresses are [akka.tcp://SYSTEM@0.0.0.0:2560]
[ERROR] [01/17/2015 14:21:17.388] [SYSTEM-akka.remote.default-remote-dispatcher-16] [akka.tcp://SYSTEM@0.0.0.0:2560/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FSYSTEM%400.0.0.0%3A2552-2/endpointWriter] dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560/]] arriving at [akka.tcp://SYSTEM@remote-srv01.cloudapp.net:2560] inbound addresses are [akka.tcp://SYSTEM@0.0.0.0:2560]
[WARN] [01/17/2015 14:21:17.465] [SYSTEM-akka.remote.default-remote-dispatcher-16] [akka.tcp://SYSTEM@0.0.0.0:2560/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FSYSTEM%400.0.0.0%3A2552-2] Association with remote system [akka.tcp://SYSTEM@0.0.0.0:2552] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
[INFO] [01/17/2015 14:21:17.467] [SYSTEM-akka.actor.default-dispatcher-21] [akka://SYSTEM/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FSYSTEM%40186.228.120.115%3A56044-3] Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://SYSTEM/deadLetters] to Actor[akka://SYSTEM/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FSYSTEM%40186.228.120.115%3A56044-3#-2070785548] was not delivered. [6] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [01/17/2015 14:21:17.468] [SYSTEM-akka.actor.default-dispatcher-21] [akka://SYSTEM/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FSYSTEM%40186.228.120.115%3A56044-3] Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://SYSTEM/deadLetters] to Actor[akka://SYSTEM/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FSYSTEM%40186.228.120.115%3A56044-3#-2070785548] was not delivered. [7] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

(...)

I figured out that the problem may be in the hostname configuration and tried to put the hostname to server and remote host. But, in this case, the system does not even load:

Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'FacadeMemory' defined in file [D:\data\development\git\semantic-web-crawler\crawlerld.core\target\classes\net\dovale\websemantics\linkedDataRecommender\facade\memory\FacadeMemory.class]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [facade.memory.FacadeMemory]: Constructor threw exception; nested exception is org.jboss.netty.channel.ChannelException: Failed to bind to: /remote-srv01.cloudapp.net:2560
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateBean(AbstractAutowireCapableBeanFactory.java:1077)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1022)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:504)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:475)
    at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:302)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:228)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:298)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193)
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:706)
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:762)
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
    at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:109)
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:691)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:320)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:952)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:941)
    at facade.memory.GUIMain.main(GUIMain.java:23)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [facade.memory.FacadeMemory]: Constructor threw exception; nested exception is org.jboss.netty.channel.ChannelException: Failed to bind to: /remote-srv01.cloudapp.net:2560
    at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:164)
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:89)
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateBean(AbstractAutowireCapableBeanFactory.java:1070)
    ... 21 more
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /remote-srv01.cloudapp.net:2560
    at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
    at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
    at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
    at scala.util.Success$$anonfun$map$1.apply(Try.scala:236)
    at scala.util.Try$.apply(Try.scala:191)
    at scala.util.Success.map(Try.scala:236)
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
    at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
    at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.BindException: Cannot assign requested address: bind
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:436)
    at sun.nio.ch.Net.bind(Net.java:428)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
    at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I don't know what I am doing wrong. I tried to find information about the issue but any of what I found is related to my problem. I have opened the ports on azure configuration also.

How can I enable my server host to comunicate propertly with my remote hosts?


Solution

  • I was able to address the problem.

    After some fruitless research, I had to try some different things. I am making some assumptions that could be wrong as I didn't find any other information. If you are reading this answer and find any error, please let me know.

    The problem was that the framework (sun.nio.ch.Net.bind0 apparently, but I didn't find many docs about it) allows the following range of ips: 0.0.0.0 (in case you accept connections from any network interface in the machine), 127.0.0.0 (in case you only work with local request - I guest) and the IP address of any of computer's network interface. In this last case, requests will be only allowed to this specific interface.

    The problem is that the "hostname" property is also used to address remote nodes of Akka. I mean, when the host node calls for a remote node, it uses this information to identify were the result need to be sent after finished. Also, if you put the property hostname with the value 0.0.0.0 and tries to reach this node by its dns name (which could not be associated to any network interface) it will fail. You have to identify the machine with the same IP as one of the network interface.

    So, my setup changed slightly:

    For the host node, I made this change:

    (...)
    akka.actor.deployment {
      /sparqlQuerierMasterActor/sparqlQuerierPool {
        router = round-robin-pool
        nr-of-instances = 12
        target.nodes = [
           "akka.tcp://SYSTEM@XXX.XXX.XXX.XXX:2560"
          ,"akka.tcp://SYSTEM@YYY.YYY.YYY.YYY:2560"
          ,"akka.tcp://SYSTEM@ZZZ.ZZZ.ZZZ.ZZZ:2560"
    
        ]
      }
    }
    (...)
    

    XXX, YYY and ZZZ are reachable IP's of remote nodes which are also registered at a network interface.

    The configuration of the remote node changed to:

    (...)
      remote {
        enabled-transports = ["akka.remote.netty.tcp"]
        netty.tcp {
          maximum-frame-size = 100MiB
          hostname = "YYY.YYY.YYY.YYY"
          port = 2560
        }
      }
    (...)
    

    I didn't test if I can maintain the previous 0.0.0.0 configuration. Maybe it is possible.

    This solution allowed me to make host and remote nodes to comunicate flawlessly =)