mongodbtransactionsspring-data

Mongodb doesn't "lock" a document, does it?


I'm trying to check that mongodb does behave the way I expect it to behave and lock a document when it's modified in a transaction. I'm using

<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.3.1</version>

and spring-data with a replicaset of

mongo 7.0.5 community

My goal is to do the following:

I'm using this code :

@Getter
@Setter
class DummyDocument {
    @MongoId
    private String uuid;
    private String lock;
    private List<String> simpleCollection = new ArrayList<>();

    public DummyDocument(String uuid) {
        this.uuid = uuid;
    }
}

public class ConcurrentDocumentAccess {

    @Autowired
    private MongoClient client;

    @Autowired
    private MongoTemplate mongoTemplate;

    @After
    public void cleanup() {
        DummyDocument doc = mongoTemplate.findById("test", DummyDocument.class);
        mongoTemplate.remove(doc);
    }

    @Test
    public void documentLockingTest() {

        // Create a document
        DummyDocument doc = new DummyDocument("test");
        mongoTemplate.save(doc);

        // A query to find the doc
        Query findDoc = new Query().addCriteria(Criteria.where("uuid").is("test"));

        // An update to change the lock value in the doc
        Update lock = new Update().set("lock", "locked");

        // An update of the doc
        Update updateCollection = new Update().addToSet("simpleCollection", "something");

        try (ClientSession session = client.startSession()) {

            session.startTransaction();

            // Acquire lock on doc by writing on it
            UpdateResult lockResult = mongoTemplate.withSession(session).updateFirst(findDoc, lock,
                    DummyDocument.class);
            assertThat(lockResult.getModifiedCount() == 1L).isTrue();

            // Try to update the collection out of the transaction
            UpdateResult changeResult = mongoTemplate.updateFirst(findDoc, updateCollection, DummyDocument.class);
            // assertThat(changeResult.getModifiedCount() == 0L).isTrue();
            
            session.commitTransaction();
        } catch (MongoCommandException e) {
            e.printStackTrace();
        }
        DummyDocument updatedDoc = mongoTemplate.findOne(findDoc, DummyDocument.class);
        assertThat(updatedDoc.getLock()).isEqualTo("locked");
        assertThat(updatedDoc.getSimpleCollection()).doesNotContain("something");
    }
}

What I observe is that the query outside the transaction is forced on the document, then the transaction hangs until it fails with a response : {"errorLabels": ["TransientTransactionError"], "ok": 0.0, "errmsg": "Transaction with { txnNumber: 2 } has been aborted.", "code": 251, "codeName": "NoSuchTransaction"

Is there something I'm doing wrong or is it the expected behavior ?

Update

Well I ran another test and the behavior is still mysterious. When I use a session on the ducument update, the behavior change. This is the code Is use :

    // Create a document
    DummyDocument doc = new DummyDocument("test");
    mongoTemplate.save(doc);

    // A query to find the doc
    Query findDoc = new Query().addCriteria(Criteria.where("uuid").is("test"));
    // findDoc.getQueryObject().toBsonDocument();

    // An update to change the lock value in the doc
    Update lock1 = new Update().set("lock", "lock1");
    Update lock2 = new Update().set("lock", "lock2");

    // An update of the doc
    Update update1 = new Update().addToSet("simpleCollection", "update1");
    Update update2 = new Update().addToSet("simpleCollection", "update2");

    // Create a first session
    ClientSession session1 = client.startSession();
    session1.startTransaction();

    // Create a second session
    ClientSession session2 = client.startSession();
    session2.startTransaction();

    // Acquire lock on doc by writing on it
    try {
        UpdateResult lock1Result = mongoTemplate.withSession(session1).updateFirst(findDoc, lock1,
                DummyDocument.class);
        assertThat(lock1Result.getMatchedCount()).isOne();
        assertThat(lock1Result.getModifiedCount()).isOne();
    } catch (MongoCommandException e) {
        e.printStackTrace();
    } catch (UncategorizedMongoDbException e) {
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }

    // Try to update the collection out of the transaction
    UpdateResult changeResult = null;
    try {
        changeResult = mongoTemplate.withSession(session2).updateFirst(findDoc, update1, DummyDocument.class);
    } catch (MongoCommandException e) {
        e.printStackTrace();
    } catch (UncategorizedMongoDbException e) {
        // This catch a write error
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }
    assertThat(changeResult).isNull();

    // Try to commit the change on the collection before 1st session commit
    try {
        session2.commitTransaction();
    } catch (MongoCommandException e) {
        // This catch a "NoSuchTransaction" error
        e.printStackTrace();
    } catch (UncategorizedMongoDbException e) {
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }

    // Uncomment this to timeout session 1
    // Thread.sleep(120005L);

    // Try to commit the change oof session 1, this should work
    try {
        session1.commitTransaction();
    } catch (MongoCommandException e) {
        e.printStackTrace();
    } catch (UncategorizedMongoDbException e) {
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }

    session1.close();
    session2.close();

    DummyDocument updatedDoc = mongoTemplate.findOne(findDoc, DummyDocument.class);
    assertThat(updatedDoc.getLock()).isEqualTo("lock1");
    assertThat(updatedDoc.getSimpleCollection()).doesNotContain("update1");

Here the document update of the session2 is refused because of a Write Conflict, which is what I expected. So why is there a different behavior between the two methods ? Can I configure the client such that the first implementation behave like the second ?

Here are the logs of the test:

{"t":{"$date":"2024-07-03T19:36:01.127+00:00"},"s":"I",  "c":"WRITE",    "id":51803,   "ctx":"conn218","msg":"Slow query","attr":{"type":"update","ns":"test.dummyDocument","command":{"q":{"_id":"test"},"u":{"_id":"test","simpleCollection":[],"_class":"com.test.concurrency.DummyDocument"},"multi":false,"upsert":true},"planSummary":"IDHACK","totalOplogSlotDurationMicros":142,"keysExamined":0,"docsExamined":0,"nMatched":0,"nModified":0,"nUpserted":1,"keysInserted":1,"numYields":0,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":2}},"FeatureCompatibilityVersion":{"acquireCount":{"w":2}},"ReplicationStateTransition":{"acquireCount":{"w":2}},"Global":{"acquireCount":{"w":2}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":2}}},"flowControl":{"acquireCount":1},"readConcern":{"provenance":"implicitDefault"},"storage":{},"cpuNanos":362193,"remote":"172.18.0.1:34942","durationMillis":0}}
{"t":{"$date":"2024-07-03T19:36:01.220+00:00"},"s":"I",  "c":"WRITE",    "id":51803,   "ctx":"conn218","msg":"Slow query","attr":{"type":"update","ns":"test.dummyDocument","command":{"q":{"_id":"test"},"u":{"$set":{"lock":"lock1"}},"multi":false,"upsert":false},"planSummary":"IDHACK","keysExamined":1,"docsExamined":1,"nMatched":1,"nModified":1,"nUpserted":0,"keysInserted":0,"keysDeleted":0,"numYields":0,"locks":{"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}}},"flowControl":{"acquireCount":1},"readConcern":{"level":"local","provenance":"implicitDefault"},"storage":{},"cpuNanos":316767,"remote":"172.18.0.1:34942","durationMillis":0}}
{"t":{"$date":"2024-07-03T19:36:01.281+00:00"},"s":"I",  "c":"WRITE",    "id":51803,   "ctx":"conn218","msg":"Slow query","attr":{"type":"update","ns":"test.dummyDocument","command":{"q":{"_id":"test"},"u":{"$addToSet":{"simpleCollection":"update1"}},"multi":false,"upsert":false},"planSummary":"IDHACK","numYields":0,"ok":0,"errMsg":"Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction.","errName":"WriteConflict","errCode":112,"locks":{"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}}},"flowControl":{"acquireCount":2},"readConcern":{"level":"local","provenance":"implicitDefault"},"storage":{},"cpuNanos":391742,"remote":"172.18.0.1:34942","durationMillis":0}}

Solution

  • Yes, it is expected behaviour documented at https://www.mongodb.com/docs/manual/core/transactions/#transactions-and-atomicity, particularly:

    Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.

    Mongodb does lock documents, in fact there are 6 different types of locks https://www.mongodb.com/docs/manual/faq/concurrency/ , but it has nothing to do with multidocument transactions. The locks are used internally to deal with concurrent updates on document level.

    The error you are facing in the first scenario "transaction with a concurrent non-transactional write" is down to how non-transactional writes are handled by mongodb. It implicitly starts a transaction for an individual update, so it is always a transaction on WireTiger level.

    WT in turn sets an exclusive lock to all documents in a transaction, and issues a WriteConflict error to any attempt to write to a locked document. For non-transactional writes it is assumed to be a temporary lock, and mongodb tries to resolve it with retried writes.

    The update command calls writeConflictRetry

    namespace mongo {
    
    UpdateResult update(OperationContext* opCtx,
                        CollectionAcquisition& coll,
                        const UpdateRequest& request) {
         .....
        // The update stage does not create its own collection.  As such, if the update is
        // an upsert, create the collection that the update stage inserts into beforehand.
        writeConflictRetry(opCtx, "createCollection", nsString, [&] {
    

    from https://github.com/mongodb/mongo/blob/b943a40130ad53eca379f205c830a27d41d10e86/src/mongo/db/ops/update.cpp#L72

    and writeConflictRetry keeps trying to update the locked document - starts a WT transaction, receives writeConflict, rolls back the transaction, and retries again:

    /**
     * Runs the argument function f as many times as needed for f to complete or throw an exception
     * other than WriteConflictException or TemporarilyUnavailableException. For each time f throws
     * one of these exceptions, logs the error, waits a spell, cleans up, and then tries f again.
     * Imposes no upper limit on the number of times to re-try f after a WriteConflictException, so any
     * required timeout behavior must be enforced within f. When retrying a
     * TemporarilyUnavailableException, f is called a finite number of times before we eventually let
     * the error escape.
     *
     * If we are already in a WriteUnitOfWork, we assume that we are being called within a
     * WriteConflictException retry loop up the call stack. Hence, this retry loop is reduced to an
     * invocation of the argument function f without any exception handling and retry logic.
     */
    template <typename F>
    auto writeConflictRetry(OperationContext* opCtx,
    

    https://github.com/mongodb/mongo/blob/69be24d5c2e37c819a755e52b56f4e5378e0ff92/src/mongo/db/concurrency/exception_util.h#L184

    So what happens in the first scenario is a deadlock by concurrent writes in a synchronous application:

    1. the transaction started
    2. the transactional update locks the document
    3. the non-transactional update hangs retrying to write to the locked document
    4. the transaction expires, rolls back, and releases the locks
    5. the non-transactional update eventually writes to the document
    6. an attempt to commit the transaction throws the error "NoSuchTransaction"

    What would happen with asynchronous updates (e.g. updates from different threads, or clients), is the non-transactional update would still wait, but the transaction would commit successfully without expiring:

    1. the transaction started
    2. the transactional update locks the document
    3. the non-transactional update hangs retrying to write to the locked document
    4. the transaction commits, updates the document, and releases the lock
    5. the non-transactional update writes to the document

    With explicitly started transaction in the second scenario "transaction with concurrent transaction" mongo does not hide the WT writeConflict error and propagates it to the application straight away.