I have a Database with
On disk size 19.032GB
(using show dbs
command)
Data size 56 GB
(using db.collectionName.stats(1024*1024*1024).size
command)
While taking mongodump using command mongodump we can set param --gzip. These are the observations I have with and without this flag.
command | timeTaken in dump | size of dump | restoration time | observation |
---|---|---|---|---|
with gzip | 30 min | 7.5 GB | 20 min | in mongostat the insert rate was ranging from 30K to 80k par sec |
without gzip | 10 min | 57 GB | 50 min | in mongostat the insert rate was very erratic, and ranging from 8k to 20k par sec |
Dump was taken from machine with 8 core, 40 GB ram(Machine B) to 12 core, 48GB ram machine (Machine A). And restored to 12 core, 48 gb machine(Machine C) from Machine A to make sure there is no resource contention between mongo, mongorestore and mongodump process. Mongo version 4.2.0
I have few questions like
Update: 2. Can the bson dump be zipped to make it zip?
yes
% ./mongodump -d=test
2022-11-16T21:02:24.100+0530 writing test.test to dump/test/test.bson
2022-11-16T21:02:24.119+0530 done dumping test.test (10000 documents)
% gzip dump/test/test.bson
% ./mongorestore --db=test8 --gzip dump/test/test.bson.gz
2022-11-16T21:02:51.076+0530 The --db and --collection flags are deprecated for this use-case; please use --nsInclude instead, i.e. with --nsInclude=${DATABASE}.${COLLECTION}
2022-11-16T21:02:51.077+0530 checking for collection data in dump/test/test.bson.gz
2022-11-16T21:02:51.184+0530 restoring test8.test from dump/test/test.bson.gz
2022-11-16T21:02:51.337+0530 finished restoring test8.test (10000 documents, 0 failures)
2022-11-16T21:02:51.337+0530 10000 document(s) restored successfully. 0 document(s) failed to restore.
I am no MongoDB expert, but I have good experience working with MongoDB backup and restore activities and will answer to the best of my knowledge.
mongodump
command without the use of the --gzip
option will save each and every document to a file in bson
format.
This will significantly reduce the time taken for backup and restore operations since it just reads the bson file and inserts the document, with the compromise being the .bson
dump file size
However, when we pass the --gzip
option, the bson data is compressed and it is being dumped to a file. This will significantly increase the time taken for mongodump
and mongorestore
, but the size of the backup file will be very less due to compression.
Yes, it can be further zipped. But, You will be spending additional time since you have to compress the already compressed file and extract it again before the restore operation, increasing the overall time taken. Do it if the compressed file size is very small compared to just gzip.
EDIT:
As @best wishes pointed, I completely misread this question.
gzip
performed by mongodump
is just a gzip
performed on the mongodump side. It is literally the same as compressing the original BSON file manually from our end.
For instance, If you extract the .gzip.bson
file with any compression application, you will get the actual BSON backup file.
Note that zip
and gzip
are not the same (in terms of compression) since they both use different compression algorithms, even though they both compress files. So you will get different results in file size on comparing mongodump gzip and manual zip of files.
Whenever you take a dump, mongodump
tool creates a <Collection-Name.metadata.json>
file. This basically contains all the indexes followed by collection name, uuid
, colmod
, dbUsersAndRoles
and so on.
The number and type of index in the collection will not have an impact during the mongodump
operation. However, after the restoration of data using mongorestore
command, it will go through all the indexes in the metaadata file and try to recreate the indexes.
The time taken by this operation depends on the number of indexes and the number of documents in your collection. In short (No. of Indexes * No. of Documents). The type of the index (Even if it's unique) doesn't have a mojor impact on performance. If the indexes are applied in the original collection using the background: true
option, it's going to take even more time to rebuild the indexes while restoring.
You can avoid the indexing operaion during the mongorestore
operation by passing the --noIndexRestore
option in commandline. You can index later on when required.
In the Production backup environment of my company, indexing of keys takes more time compared to the restoration of data.
The solution depends...
If Network bandwidth is not an issue (Example: Moving data between two instances running in the cloud), don't use and compression, since it will save you time.
If the data in the newly moved instance won't be accessed immediately, perform the restoration process with the --noIndexRestore
flag.
If the backup is for cold storage or saving data for later use, apply gzip
compression, or manual zip
compression, or both (whatever works best for you)
Choose whichever scenario works best for you, but you have to find the right balance between time and space primarily while deciding and secondly, whether to apply indexes or not.
In my company, we usually take non-compressed backup and restore for P-1 and gzip compression for weeks old prod backups, and further manually compress it for backups that are months older.
data
path pointed by your MongoDB instance and change the DB path in the MongoDB instance of the migrated machine. Again, I don't recommend this method as there are many things that could go wrong, although I had no issues with this methodology on my end. But I can't guarantee the same for you. Do this at your own risk if you decide to.smaller dump
, you mean limiting the data to be dumped using the --query
flag, for sure it will since the data to be backed up and restored is very less. Remember the No. of Indexes * No. of Documents
rule.Hope this helped you answer your questions. Let me know if you have any: