I have a citus cluster with 1 coordinator and 3 workers.
Recently, the server which hosted coordinator is completely down which cannot be recovered.
Now the cluster only does Query and Some DML, but not DDL, which can only be executed on coordinator!
So, how could I promote a metadatasynced worker to coordinator?
Firstly, there is no accepted solution we currently have for such scenarios. Typically, this would have to be resolved by restoring the coordinator using a backup. It is highly recommended to setup your database to take periodic backups and be able to restore them in case of such unrecoverable crashes.
This solution does not promote a worker but establishes a new coordinator. Complications might arise (your database might break) if you apply this to your own cluster because this involves messing with citus' metadata. This solution is experimental, so I highly recommend you try to apply this to a fork of your cluster or to snapshot your disks to ensure no damage is done to your data.
This solution has been tested in a limited scope for citus 11
in a local cluster containing a distributed and a reference table. If you have other distributed objects in your database like views or collations this solution may fail (haven't tested).
Throughout this you will need to lookup groupid-s and nodeid-s with SELECT * FROM pg_dist_node;
Some steps need to be included/excluded if the old coordinator is in the pg_dist_node table. I will mark such steps with [if CIM include] for steps that need to be included and [if CIM exclude] for steps that need to be excluded if the old coordinator is in the pg_dist_table. (CIM = coordinator in metadata)
Create a new node and install citus.
[if CIM include] Run in all your nodes except the new one you just created:
-- removes old coordinator from citus metadata
DELETE FROM pg_dist_node WHERE groupid = 0;
-- removes old coordinator placements from citus metadata
DELETE FROM pg_dist_placement WHERE groupid = 0;
-- temporarily mark this node as the coordinator in the metadata
UPDATE pg_dist_local_group SET groupid = 0;
UPDATE pg_dist_node SET groupid = 0 WHERE groupid = 1;
UPDATE pg_dist_placement SET groupid = 0 WHERE groupid = 1;
-- adjust the metadata as if we are ready to add new node
SELECT max(groupid) as groupid FROM pg_dist_node \gset
SELECT setval('pg_dist_groupid_seq', :groupid, true);
SELECT max(nodeid) as nodeid FROM pg_dist_node \gset
SELECT setval('pg_dist_node_nodeid_seq', :nodeid, true);
SELECT max(placementid) as placementid FROM pg_dist_placement \gset
SELECT setval('pg_dist_placement_placementid_seq', :placementid, true);
-- add the new node
SELECT citus_add_node('NEW_NODE_HOST', NEW_NODE_PORT);
-- set back the original metadata
UPDATE pg_dist_local_group SET groupid = 1;
UPDATE pg_dist_node SET groupid = 1 WHERE groupid = 0;
UPDATE pg_dist_placement SET groupid = 1 WHERE groupid = 0;
-- look up your new node's group id
-- this value is important and I will refer to it in future steps as NEW_NODE_GROUP_ID
SELECT * FROM pg_dist_node;
-- restore the metadata of your temporary coodinator
UPDATE pg_dist_node SET groupid = 1 WHERE groupid = 0;
UPDATE pg_dist_placement SET groupid = 1 WHERE groupid = 0;
-- set the new node as the coordinator
UPDATE pg_dist_local_group SET groupid = 0;
-- set the metadata for the new coordinator
SELECT max(groupid) as groupid FROM pg_dist_node \gset
SELECT setval('pg_dist_groupid_seq', :groupid, true);
SELECT max(nodeid) as nodeid FROM pg_dist_node \gset
SELECT setval('pg_dist_node_nodeid_seq', :nodeid, true);
SELECT max(placementid) as placementid FROM pg_dist_placement \gset
SELECT setval('pg_dist_placement_placementid_seq', :placementid, true);
SELECT max(shardid) as shardid FROM pg_dist_shard \gset
SELECT setval('pg_dist_shardid_seq', :shardid, true);
-- if the coordinator is not in the metadata, all shards in the new node which will be the coordinator
-- need to be dropped
SET citus.enable_manual_changes_to_shards TO true;
DO $$
DECLARE
row record;
BEGIN
FOR row IN
SELECT CONCAT(logicalrelid, '_', shardid) AS shard_name
FROM pg_dist_placement NATURAL JOIN pg_dist_shard
WHERE groupid = NEW_NODE_GROUP_ID
LOOP
EXECUTE 'DROP TABLE ' || quote_ident(row.shard_name);
RAISE INFO 'Dropped shard: %', quote_ident(row.shard_name);
END LOOP;
END;
$$;
RESET citus.enable_manual_changes_to_shards;
-- if coordinator is not in the metadata remove the new node from the pg_dist_node
-- and remove its shard placements from the metadata
DELETE FROM pg_dist_node WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM exclude]
DELETE FROM pg_dist_placement WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM exclude]
-- if the coordinator is in the metadata set the new node as coordinator, set shouldhaveshars
-- for the coordinator to False, and update the coordinator placement group ids
UPDATE pg_dist_node SET groupid = 0, shouldhaveshards = False WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM include]
UPDATE pg_dist_placement SET groupid = 0 WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM include]
This should be it, your new node is now the coordinator. Good luck :)