ignitegridgainapacheignite

Apache Ignite : Near Cache communication with Data node


We have near caches configured with main caches (in data node). Looking at documentation it says "Near caches are fully transactional and get updated or invalidated automatically whenever the data changes on the server nodes."

I am trying to understand how this communication of automatic updates work:

We saw an issue where data nodes were continuously trying to connect to client node with this in log lines :

org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3322) [ignite-core-2.9.1.jar:2.9.1]

We suspect that Data nodes were blocked because these continuosly trying to send messages to near caches. This is based on stack trace we saw.

Full Stack trace:

at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:191) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3229) [ignite-core-2.9.1.jar:na]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013) [ignite-core-2.9.1.jar:na]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2960) [ignite-core-2.9.1.jar:na]
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2100) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:2365) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1964) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1935) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1917) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:1324) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:1261) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:1059) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$600(CacheContinuousQueryHandler.java:90) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onEntryUpdated(CacheContinuousQueryHandler.java:459) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:447) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2495) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2657) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2118) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1935) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1734) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3322) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:141) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:241) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.9.1.jar:2.9.1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]

Caused by: org.apache.ignite.spi.communication.tcp.NodeForceEvictException: Node evicted forcefully from topology. at org.apache.ignite.spi.communication.tcp.IBTcpCommunicationSpi.createTcpClient(IBTcpCommunicationSpi.java:60) ~[ib-compute-grid-23.2.10.jar:na] at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3375) [ignite-core-2.9.1.jar:na] at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3180) [ignite-core-2.9.1.jar:na] ... 36 common frames omitted Caused by: org.apache.ignite.spi.communication.tcp.internal.NodeUnreachableException: Failed to connect to all addresses of node 4ae96cc6-d3ba-4bb4-94f8-4c116d5bd9eb: [/10.228.30.249:47000]; inverse connection will be requested. at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3982) [ignite-core-2.9.1.jar:na] at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3635) [ignite-core-2.9.1.jar:na]


Solution

  • I'm not sure that it's about near caching at all.

    CacheWriteSynchronizationMode is about primary and backup partitions syncronization. Whilst near caching is about local node data memoization in on-heap and make sense for client nodes mostly or non-replicated caches on server nodes.

    This line:

    org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3322) [ignite-core-2.9.1.jar:2.9.1]
    

    is not about a near cache, though the naming could be confusing. It's just a regular cache update on a local node. Remember that you have to enable a near cache explicitly in the configuration, as highlighted in the docs in order to make it work.

    The stactrace is indeed a CQ notification routine.

    a) Here a local update (more precisely - DELETE) happens:

    at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2495) [ignite-core-2.9.1.jar:2.9.1]
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2657) [ignite-core-2.9.1.jar:2.9.1]
    at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2118) [ignite-core-2.9.1.jar:2.9.1]
    

    GridDhtAtomicCache impies that an update is happening to a non-transactional cache. GridCacheMapEntry is the base adapter for all entiries, whereas GridNearCacheEntry is for real near cache updates. In other words, I see that an update happens to a regular entry in offheap.

    b) Then Ignite checks if there any active CQs, detects one and sends a notification, waiting for ACK:

        at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:191) ~[ignite-core-2.9.1.jar:2.9.1]
        ...
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3229) [ignite-core-2.9.1.jar:na]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013) [ignite-core-2.9.1.jar:na]
        ...
        at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:447) [ignite-core-2.9.1.jar:2.9.1]
    

    c) At the very bottom you have an interesting message about connectivity issue:

    Failed to connect to all addresses of node 4ae96cc6-d3ba-4bb4-94f8-4c116d5bd9eb: [/10.228.30.249:47000]; inverse connection will be requested
    

    I assume that CQ could not notify a listener due to n/w issues and hangs. There were improvements in CQ logic since 2.9.1, could solve the hang issue.

    d) To have a more precise conclusion, you need to check the logs (thread dumps could help as well) from other nodes.