mongodbreplicationwiredtigerdatabase

How to migrate from MMAPv1 to WiredTiger with minimal downtime without mongodump/mongorestore


Most guidelines recommend to use mongodump/mongorestore, but for large product databases downtime can be very long


Solution

  • You can use replication and an additional server for this or the same server if the load allows.

    1. You need 3 running MongoDB instance:

      • Your server you want to update (remind that WiredTiger support since 3.0).
      • Second instance of MongoDB which can be run on an additional server. Database will be temporarily copied to it by the replication.
      • And the third instance of MongoDB is arbiter, which doesn’t store data and only participates in the election of primary server. The arbiter can be run on the additional server on a separate port.
    2. Anyway you need to backup your database. You can run “mongodump” without parameters and directory “./dump” will be created with the database dump. You can use “--gzip“ parameter to compress result size.

      mongodump --gzip
      

      Just in case, the command to restore:

      mongorestore --gzip
      

      It should be run in the same directory where “./dump” dir and “--gzip“ parameter should be added if used in “mongodump”.

    3. Begin configure from the additional server. My target system is Linux RedHat without Internet, so I download and install MongoDB via RPM manually. Add the section to /etc/mongod.conf:

      replication:
         oplogSizeMB: 10240
         replSetName: REPLICA  
      

      Check that the net section look like this to allow access from other servers:

      net:
        bindIp: 0.0.0.0
        port: 27017
      

      and run:

      service mongod start
      
    4. Run the third MongoDB instance - arbiter. It can work on the additional server on a different port. Create a temporary directory for the arbiter database:

      mkdir /tmp/mongo
      chmod 777 -R /tmp/mongo
      

      and run:

      mongod --dbpath /tmp/mongo --port 27001 --replSet REPLICA \
          --fork --logpath /tmp/mongo/db1.log
      
    5. Now configure the main server. Edit /etc/mongod.conf

      replication:
         oplogSizeMB: 10240
         replSetName: REPLICA   
      

      and restart MongoDB on the main server:

      service mongod restart
      
    6. It’s important! After restarting the main server read operations may be unavailable. I was getting the following error:

      { "ok" : 0, "errmsg" : "node is recovering", "code" : 13436 }
      

      So as quickly as possible you need to connect to MongoDB on the main server via “mongo” console and run the following command to configure replication:

      rs.initiate(
      {
        _id: "REPLICA",
        members: [
          { _id: 0, host : "<IP address of main server>:27017",
                    priority: 1.0 },
          { _id: 1, host : "<IP  address of additional server>:27017",
                    priority: 0.5 },
          { _id: 2, host : "<IP address of additional server(the arbiter)>:27001", 
                    arbiterOnly : true,  priority: 0.5  }
        ]
      }
      )
      

      After this operation all actions with MongoDB will be available and data synchronization will be started.

      I don’t recommend to use rs.initiate() on the main server without parameters as in most tutorials, because name of the main server will be configured by default as DNS-name from the /etc/hostname. It's not very convenient for me because I use IP-addresses for communications in my projects.

      To check the synchronization progress you can call from “mongo” console:

      rs.status()
      

      Result example:

      {
         "set" : "REPLICA",
         "date" : ISODate("2017-01-19T14:30:34.292Z"),
         "myState" : 1,
         "term" : NumberLong(1),
         "heartbeatIntervalMillis" : NumberLong(2000),
         "members" : [
             {
                 "_id" : 0,
                 "name" : "<IP address of main server>:27017",
                 "health" : 1.0,
                 "state" : 1,
                 "stateStr" : "PRIMARY",
                 "uptime" : 165,
                 "optime" : {
                     "ts" : Timestamp(6377323060650835, 3),
                     "t" : NumberLong(1)
                 },
                 "optimeDate" : ISODate("2017-01-19T14:30:33.000Z"),
                 "infoMessage" : "could not find member to sync from",
                 "electionTime" : Timestamp(6377322974751490, 1),
                 "electionDate" : ISODate("2017-01-19T14:30:13.000Z"),
                 "configVersion" : 1,
                 "self" : true
             },
             {
                 "_id" : 1,
                 "name" : "<IP address of additional server>:27017",
                 "health" : 1.0,
                 "state" : 5,
                 "stateStr" : "STARTUP2",
                 "uptime" : 30,
                 "optime" : {
                     "ts" : Timestamp(0, 0),
                     "t" : NumberLong(-1)
                 },
                 "optimeDate" : ISODate("1970-01-01T00:00:00.000Z"),
                 "lastHeartbeat" : ISODate("2017-01-19T14:30:33.892Z"),
                 "lastHeartbeatRecv" : ISODate("2017-01-19T14:30:34.168Z"),
                 "pingMs" : NumberLong(3),
                 "syncingTo" : "<IP address of main server>:27017",
                 "configVersion" : 1
             },
             {
                 "_id" : 2,
                 "name" : "<IP address of additional server (the arbiter)>:27001",
                 "health" : 1.0,
                 "state" : 7,
                 "stateStr" : "ARBITER",
                 "uptime" : 30,
                 "lastHeartbeat" : ISODate("2017-01-19T14:30:33.841Z"),
                 "lastHeartbeatRecv" : ISODate("2017-01-19T14:30:30.158Z"),
                 "pingMs" : NumberLong(0),
                 "configVersion" : 1
             }
         ],
         "ok" : 1.0
      }
      

      After “stateStr” of the additional server will be replaced from ”STARTUP2” to ”SECONDARY”, our servers are synchronized.

    7. While we wait for the end of the synchronization, it is necessary to modify client applications a little bit they can work with all servers in replica.

      • If you use the ConnectionString, you should replace it with something like:

        mongodb://<IP address of main server>:27017,<IP address of additional server>:27017,<IP address of additional server (the arbiter)>:27001/?replicaSet=REPLICA
        
      • If you use C++ mongo-cxx-driver legacy, as I am, you should to use mongo::DBClientReplicaSet instead mongo::DBClientConnection and list all three servers in connection parameters, including the arbiter.

      • There is a third option - you can simply change IP of MongoDB server in clients after switching PRIMARY-SECONDARY, but it's not very fair.

    8. After the synchronization has ended and an additional server status has established as SECONDARY, we need to switch the PRIMARY and SECONDARY by executing the command in “mongo” console on the main server. This is important because command will not work on the additional server.

      cfg = rs.conf()
      cfg.members[0].priority = 0.5
      cfg.members[1].priority = 1
      cfg.members[2].priority = 0.5
      rs.reconfig(cfg)
      

      Then check server status by executing:

      rs.status()
      
    9. Stop the MongoDB on the main server

      service mongod stop
      

      and simply delete the entire contents of a directory with database. It is safe, because we have a working copy on the additional server, and in the beginning we have made a backup. Be careful. MongoDB doesn’t create a database directory itself. If you've deleted it, you need not only to restore

      mkdir /var/lib/mongo
      

      and setup owner:

      chown -R mongod:mongod /var/lib/mongo
      
    10. Check storage engine wiredTiger is configured in /etc/mongod.conf. From 3.2 it is used by default:

      storage:
           ...
          engine: wiredTiger
          ...
      

      And run MongoDB:

      service mongod start
      

      The main server will get the configuration from the secondary server automatically and data will be synced back to WiredTiger storage.

    11. After the synchronization is finished switch the PRIMARY server back. This operation should be performed on an additional server because it is the PRIMARY now.

      cfg = rs.conf()
      cfg.members[0].priority = 1
      cfg.members[1].priority = 0.5
      cfg.members[2].priority = 0.5
      rs.reconfig(cfg)
      
    12. Return the old version of database clients or change ConnectionString back.

    13. Now turn off replication if necessary. Remove 2 replication servers from the main server:

      rs.remove("<IP address of additional server>:27017")
      rs.remove("<IP address of additional server (the arbiter)>:27001")
      

      Remove all “replication” section from /etc/mongod.conf and restart MongoDB:

      service mongod restart
      

      After these we get the warning when connected via the “mongo” console:

      2017-01-19T12:26:51.948+0300 I STORAGE  [initandlisten] ** WARNING: mongod started without --replSet yet 1 documents are present in local.system.replset
      2017-01-19T12:26:51.948+0300 I STORAGE  [initandlisten] **          Restart with --replSet unless you are doing maintenance and  no other clients are connected.
      2017-01-19T12:26:51.948+0300 I STORAGE  [initandlisten] **          The TTL collection monitor will not start because of this.
      

      To get rid of it, you need to remove the database “local”. There is only one collection “startup_log” in this database in default state, so you can do this without fear via “mongo” console

      use local
      db.dropDatabase()
      

      and restart MongoDB:

      service mongod restart
      

      If you will remove the “local” database before “replication” section from /etc/mongod.conf, it is immediately restored. So I could not do only one MongoDB restart.

    14. On the additional server perform the same action:

      • remove “replication“ section from /etc/mongod.conf
      • restart MongoDB
      • drop the “local“ database
      • again restart
    15. The arbiter just stop and remove:

      pkill -f /tmp/mongo
      rm -r /tmp/mongo