I'm using a Java driver, although this question is not language specific, to write partial updates to mongodb documents because using the MMAPv1 storage engine the documents are edited in place (in memory) so this provides better performance. This does add considerable development complexity as I could alternatively save the entire document at once and not worry about the details of what exactly got updated. After updating to WiredTiger I learned that this newer storage engine does not edit documents in place (in memory) but instead allocates new memory for each write (unclear if this means full copy of the document or just diff). Does this mean that it makes no performance difference whether I do a full document write vs a partial one?
After updating to WiredTiger I learned that this newer storage engine does not edit documents in place (in memory) but instead allocates new memory for each write (unclear if this means full copy of the document or just diff).
WiredTiger uses Multiversion Concurrency Control (MVCC) to maintain multiple views of data for the lifetime of readers. WiredTiger’s in-memory format is different from the on-disk format: in-memory it stores diffs to a document, but a full version of the document is constructed when flushed to the data files as part of periodic checkpoints.
Does this mean that it makes no performance difference whether I do a full document write vs a partial one?
Irrespective of how different MongoDB storage engines handle persisting updates to disk, there are still performance benefits in using partial updates rather than full updates where possible (particularly if you are setting field values which are small relative to overall document size).
For example, consider:
If you are sending full document updates each time, you also create scenarios where the order that updates reach the server is significant even when changes might be for distinct field sets. You could add additional application logic such as optimistic versioning to ensure you don't accidentally overwrite field values, but this may add unnecessary complexity depending on your use case.