database-backupstokumx

Best way to make hot backup MongoDB(TokuMX) database instance


I have a couple of questions regarding backup of remote database of my TokuMX server running in production (there is no sharding and replication). The single clause is don't stop running Tokumx instance.

  1. What's the best way to make hot backup of running TokuMX server (except TokuMX Hot Backup in enterprise version).

  2. The question regarding of suggested backup approach of MongoDB:

    [backup-host]# mongodump --host mongodb-host --port 27017 --db mongodevdb --username mongouser --password mongopwd
    
    • Is this command prefer way to make hot backups?
    • What port should I use when issue this command?
    • Is it good approach to use this command by cron and run it every day?
    • Is there any pitfalls in this command?

Solution

  • Disclaimer: I work at Tokutek, I'm an engineer working on TokuMX.

    There is no "best" way to make a backup of TokuMX, each application is different and it's best to understand all the options and make your own decision.

    The backup options for TokuMX are these:

    1. Enterprise hot backup.
    2. Filesystem-level snapshot (LVM, EBS, xfs_freeze) to copy out everything in the dbpath and logDir.
    3. Using mongodump.

    Please note that fsyncLock does not work, as background threads will still write to the filesystem even if client threads aren't doing anything. Using fsyncLock only can give you a corrupt backup.

    Filesystem snapshots and enterprise hot backup both have the advantage that you're copying serialized, compressed data, so you're avoiding the cost of querying all the collections and transferring uncompressed BSON data over the wire. Additionally, those options won't destroy the information in the cachetable about what data is most important, whereas mongodump will cause everything to be paged in, possibly evicting data that's useful for your application.

    Enterprise hot backup has the additional advantages over filesystem-level snapshots that it is less expensive (you don't need to reserve extra space like you would for a snapshot), it can be throttled to meet I/O quotas, and the resulting state of the backup is the state at the time when the backup completes, rather than when it starts. So if it takes 12 hours to copy data out for the backup, a filesystem-level snapshotted backup will be 12 hours behind the equivalent backup taken with the hot backup plugin.

    For simple uses, mongodump may be the best option, if you aren't concerned about performance, cache invalidation, network bandwidth, or recency. It is also the only option that supports backing up a single database or collection.

    For mongodump, its usage is the same as for MongoDB. You need to use the host and port on which your server is running, the default is 27017. If it's the default you don't need to specify any --port option.

    You can definitely run it every day with cron, I suggest something like this:

    SHELL=/bin/bash
    0 0 * * * /usr/bin/mongodump --host <host> -o "/var/lib/backup/tokumx-backup-$(date +%Y%m%d)"
    

    The main pitfalls of mongodump are just that it is more expensive and it destroys the information in the cachetable that says what data is important. It also won't get a perfectly consistent snapshot across multiple collections like hot backup and filesystem-level snapshot backups will. A mongodump may contain the effects of some writes in one collection and not contain the effects of earlier writes in a different collection.

    You'll also want to define a scheme for expiring old backups, I expect.