If I have a MonetDB database running on RHEL which is in the region of hundreds of million rows (hundreds of GB) with dozens (but not hundreds) of tables and I'm interested in implementing a decent backup strategy for it, specifically with regards to the mechanics of doing so.
Up until now, I've been using a bash script to call msqldump iteratively, once per table in the database and piping the data out to a file like:
msqldump -u [username] -t [tablename] -d [dbname] > /path/[tablename].sql.gz
I've got a .monetdb file set up so I am not prompted for a password on the execution of each call to msqldump, so this script can be called and allowed to complete unattended.
This seems to work in so far as I get a set of files which contain all of the data and schema necessary to restore those tables into a MonetDB database, but it does seem both quite crude (and time-consuming to execute), so I wonder if there is a "better" way?
Should I be concerned about what happens if the content of the database is changing during the msqldump operation for example? Is there a cleaner and/or faster way to obtain a full backup of a MonetDB database, perhaps by stopping the db/farm and just taking copies of the data files themselves and if so, what is the exact methodology for achieving this? Are there people/organisations out there using MonetDB in any kind of large-scale or enterprise fashion and how do they achieve the same kind of backup strategy which might be implemented for an MSSQL database or similar?
I've searched around quite a lot online and here on StackOverflow and haven't been able to find much in the way of guidance on this subject, so I hope someone here might be able to help.
Thanks in advance.
msqldump
is the preferred way to take clean databases snapshots into serialized SQL scripts. Perhaps though, you would prefer to call it only once for the entire database, rather than once per table. This will be faster but more importantly more consistent, should your database schema change over time.
You can also make a physical copy of the database in its binary format, as you hinted. Each database is stored under a folder with the same name, in <dbfarm location>/<dbname>
. This requires the database to be stopped (monetdb stop
) and locked (monetdb lock
) to prevent users from starting it automatically. To restore the database, simply copy it back into the <dbfarm location>
and unlock it (monetdb release
). No further database registration / setup is required. If you want to restore it with a different name, simply change the name of its folder.
With serialized and binary being your two strategies, consider the following:
My preferred approach, with the assumption of a production database being used rather frequently, is the same you propose, but for the whole database at once (and piping it to gzip
, which you forgot in your example):
msqldump -u [username] -d [dbname] | gzip > /path/[databasename].sql.gz
Useful links:
UPDATE: Starting from release Jun2020 (11.37.7) a "hot backup" feature allows to take binary snapshots without stopping the database: https://www.monetdb.org/Documentation/ServerAdministration/Backup/HotBackup