We are using Ignite embedded in a Java application which can be deployed to Kubernetes. I am currently investigating the feasability of an upgrade to Ignite 3.
One critical issue is the scaling of the application in Kubernetes. With Ignite 2.x we could just scale up and down at any time and when nodes were added they were treated the same as nodes that were already present when initializing the cluster.
Ignite 3 introduced the concept of "meta storage nodes" that are used, among other things, for split brain detection. When initializing the cluster, you have to specify which nodes are supposed to be included into the meta storage nodes:
InitParameters initParameters = InitParameters.builder()
.metaStorageNodeNames(nodeNames)
.clusterName("cluster")
.build();
ignite.initCluster(initParameters);
I have not found any way to add nodes to the meta storage nodes later. Thus my question: is this even possible, and if yes, how?
If it is not possible, how do you handle the situation where a meta storage node goes down and has to be restarted? Is there a way that node can re-join the meta storage nodes?
A little note first: in Ignite 3, there are two system Raft groups: cluster management group (CMG) and Metastorage group (MG). If you only specify MG nodes on cluster init, CMG will use the same nodes.
how do you handle the situation where a meta storage node goes down and has to be restarted
Raft is majority-based, so, for any of these groups to function (including MG), majority of their nodes must be online. If you need them to tolerate 1 node being down (like doing a restart), you need to have 3 nodes in the corresponding group; so you have to specify 3 Metastorage nodes during cluster init.
I have not found any way to add nodes to the meta storage nodes later
Currently, there is no way to reconfigure CMG or MG gracefully. The corresponding Jira issues are https://issues.apache.org/jira/browse/IGNITE-22378 and https://issues.apache.org/jira/browse/IGNITE-22379
But you could use disaster recovery mechanism for Metastorage group to change its topology. Disclaimer: this is a last resort mechanism purposed to save you after a disaster when you completely lost some nodes; use it at your own risk. But, if all your nodes are actually online (and you keep them online), it should be safe to use it to just change the MG topology, even without any disaster.
Example:
cli recovery cluster reset --url <http://your-host:10800> --metastorage-replication-factor 3
This will initiate MG 'repair' which will reset the cluster (CMG state will be rewritten), new MG will have 3 nodes chosen automatically. All cluster nodes will be restarted while doing the repair.