postgresqlworkercitus

How to promote metadatasynced worker to coordinator in Citus, if coordinator completely lost


I have a citus cluster with 1 coordinator and 3 workers.

Recently, the server which hosted coordinator is completely down which cannot be recovered.

Now the cluster only does Query and Some DML, but not DDL, which can only be executed on coordinator!

So, how could I promote a metadatasynced worker to coordinator?


Solution

  • Firstly, there is no accepted solution we currently have for such scenarios. Typically, this would have to be resolved by restoring the coordinator using a backup. It is highly recommended to setup your database to take periodic backups and be able to restore them in case of such unrecoverable crashes.

    This solution does not promote a worker but establishes a new coordinator. Complications might arise (your database might break) if you apply this to your own cluster because this involves messing with citus' metadata. This solution is experimental, so I highly recommend you try to apply this to a fork of your cluster or to snapshot your disks to ensure no damage is done to your data.

    -- removes old coordinator from citus metadata
    DELETE FROM pg_dist_node WHERE groupid = 0;
    
    -- removes old coordinator placements from citus metadata
    DELETE FROM pg_dist_placement WHERE groupid = 0;
    
    -- temporarily mark this node as the coordinator in the metadata
    UPDATE pg_dist_local_group SET groupid = 0;
    UPDATE pg_dist_node SET groupid = 0 WHERE groupid = 1;
    UPDATE pg_dist_placement SET groupid = 0 WHERE groupid = 1;
    
    -- adjust the metadata as if we are ready to add new node
    SELECT max(groupid) as groupid FROM pg_dist_node \gset
    SELECT setval('pg_dist_groupid_seq', :groupid, true);
    SELECT max(nodeid) as nodeid FROM pg_dist_node \gset
    SELECT setval('pg_dist_node_nodeid_seq', :nodeid, true);
    SELECT max(placementid) as placementid FROM pg_dist_placement \gset
    SELECT setval('pg_dist_placement_placementid_seq', :placementid, true);
    
    -- add the new node
    SELECT citus_add_node('NEW_NODE_HOST', NEW_NODE_PORT);
    
    -- set back the original metadata
    UPDATE pg_dist_local_group SET groupid = 1;
    UPDATE pg_dist_node SET groupid = 1 WHERE groupid = 0;
    UPDATE pg_dist_placement SET groupid = 1 WHERE groupid = 0;
    
    -- look up your new node's group id
    -- this value is important and I will refer to it in future steps as NEW_NODE_GROUP_ID
    SELECT * FROM pg_dist_node;
    
    -- restore the metadata of your temporary coodinator
    UPDATE pg_dist_node SET groupid = 1 WHERE groupid = 0;
    UPDATE pg_dist_placement SET groupid = 1 WHERE groupid = 0;
    
    -- set the new node as the coordinator
    UPDATE pg_dist_local_group SET groupid = 0;
    
    -- set the metadata for the new coordinator
    SELECT max(groupid) as groupid FROM pg_dist_node \gset
    SELECT setval('pg_dist_groupid_seq', :groupid, true);
    SELECT max(nodeid) as nodeid FROM pg_dist_node \gset
    SELECT setval('pg_dist_node_nodeid_seq', :nodeid, true);
    SELECT max(placementid) as placementid FROM pg_dist_placement \gset
    SELECT setval('pg_dist_placement_placementid_seq', :placementid, true);
    SELECT max(shardid) as shardid FROM pg_dist_shard \gset
    SELECT setval('pg_dist_shardid_seq', :shardid, true);
    
    
    -- if the coordinator is not in the metadata, all shards in the new node which will be the coordinator
    -- need to be dropped
    SET citus.enable_manual_changes_to_shards TO true;
    DO $$
    DECLARE
        row record;
    BEGIN
        FOR row IN 
        SELECT CONCAT(logicalrelid, '_', shardid) AS shard_name 
        FROM pg_dist_placement NATURAL JOIN pg_dist_shard 
        WHERE groupid = NEW_NODE_GROUP_ID
        LOOP
            EXECUTE 'DROP TABLE ' || quote_ident(row.shard_name);
            RAISE INFO 'Dropped shard: %', quote_ident(row.shard_name);
        END LOOP;
    END;
    $$;
    RESET citus.enable_manual_changes_to_shards;
    
    -- if coordinator is not in the metadata remove the new node from the pg_dist_node
    -- and remove its shard placements from the metadata
    DELETE FROM pg_dist_node WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM exclude]
    DELETE FROM pg_dist_placement WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM exclude]
    
    -- if the coordinator is in the metadata set the new node as coordinator, set shouldhaveshars
    -- for the coordinator to False, and update the coordinator placement group ids
    UPDATE pg_dist_node SET groupid = 0, shouldhaveshards = False WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM include]
    UPDATE pg_dist_placement SET groupid = 0 WHERE groupid = NEW_NODE_GROUP_ID; -- [if CIM include]
    

    This should be it, your new node is now the coordinator. Good luck :)