We have a cluster of 15 servers running Spring Boot, Hibernate and an infinispan cache in invalidation mode.
When we upgrade Infinispan (through updating Spring Boot) our cluster does not start up properly because of incompatibale infinispan or jgroups versions.
We had Spring Boot Version 3.2.3 running and tried to upgrade to 3.2.4.
With spring-boot:3.2.3 we had (docs, pom.xml)
With spring-boot:3.2.4 we have (docs, pom.xml)
Let's say we have Servers A und B. We advise our loadbalancer to sent all traffic to server B, still running Spring Boot 3.2.3.
No we can deploy the new Version with Spring Boot 3.2.4 to Server A. But Server A doesn't start up. When trying to detect the cluster it fails:
2024-04-08T10:11:29.584+02:00 ERROR org.infinispan.CLUSTER : ISPN000475: Error processing response for request 2 from kt139-51832
java.io.IOException: Unknown type: 28
at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:720)
at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
at org.infinispan.marshall.core.BytesObjectInput.readObject(BytesObjectInput.java:32)
at org.infinispan.topology.CacheTopology$Externalizer.doReadObject(CacheTopology.java:269)
at org.infinispan.topology.CacheTopology$Externalizer.doReadObject(CacheTopology.java:250)
at org.infinispan.commons.marshall.InstanceReusingAdvancedExternalizer.readObject(InstanceReusingAdvancedExternalizer.java:102)
at org.infinispan.marshall.core.GlobalMarshaller.readWithExternalizer(GlobalMarshaller.java:727)
at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:708)
at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
at org.infinispan.marshall.core.BytesObjectInput.readObject(BytesObjectInput.java:32)
at org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:98)
at org.infinispan.topology.CacheStatusResponse$Externalizer.readObject(CacheStatusResponse.java:85)
at org.infinispan.marshall.core.GlobalMarshaller.readWithExternalizer(GlobalMarshaller.java:727)
at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:708)
at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
at org.infinispan.marshall.core.BytesObjectInput.readObject(BytesObjectInput.java:32)
at org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:71)
at org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.readObject(SuccessfulResponse.java:63)
at org.infinispan.marshall.core.GlobalMarshaller.readWithExternalizer(GlobalMarshaller.java:727)
at org.infinispan.marshall.core.GlobalMarshaller.readNonNullableObject(GlobalMarshaller.java:708)
at org.infinispan.marshall.core.GlobalMarshaller.readNullableObject(GlobalMarshaller.java:357)
at org.infinispan.marshall.core.GlobalMarshaller.objectFromObjectInput(GlobalMarshaller.java:191)
at org.infinispan.marshall.core.GlobalMarshaller.objectFromByteBuffer(GlobalMarshaller.java:220)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1562)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1471)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1663)
at org.jgroups.JChannel.up(JChannel.java:734)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:936)
at org.jgroups.protocols.FRAG3.up(FRAG3.java:134)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:846)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:226)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1086)
at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:825)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:807)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:456)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:680)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:180)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:294)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:274)
at org.jgroups.protocols.Discovery.up(Discovery.java:294)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1184)
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:107)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
The complete startup process is blocked. Three minute later it catches up as soon as the second server is restarting with the new version. Then this server gets a new cluster view. Meanwhile users can't access the website as the first server is in error state waiting for the new cluster view and the second server is just starting up.
With just minor version updates, this is very annoying. the only possible solution to me looks like to change the jgroups TCP Port on every update and build a new cluster for the new deployment.
Unfortunately, Infinispan does not provide compatibility between different version.
In this particular case, a new boolean field was added to the topology command and it breaks the wire format.