algorithmgoogle-docseditingoperational-transformcollaborative-editing

How does Google Docs deal with editing collisions?


I've been toying around with writing my own Javascript editor, with functionality similar to Google Docs (allowing multiple people to work on it at the same time). One thing I don't understand:

Let's say you've got User A and User B connected directly to each other with a network delay of 10ms. I'm assuming the editor uses a diff system (as I understand Docs does) where edits are represented like "insert 'text' at index 3," and that diffs are timestamped and forced to apply chronologically by all clients.

Let's start off with a document containing the text: "xyz123"

User A types "abc" at the begining of the document at timestamp 001ms, while User B types "hello" between "xyz" and "123" at timestamp 005ms.

Both users would expect the result to be: "abcxyzhello123," however, taking into account network delay:

Of course, "abchelloxyz123" is not the same as "abcxyzhello123"

Other than literally assigning each and every character its own unique ID, I can't imagine how Google would manage to solve this problem effectively.

Some possibilities I've thought of:

I'm reading through http://www.waveprotocol.org/whitepapers/operational-transform, but would love to hear any and all approaches to fixing this problem.


Solution

  • There are different possibilities for realizing concurrent changing of replicas, depending on the scenario's topology and with different trade-offs.

    Using a central server

    The most common scenario is a central server that all clients have to communicate with.

    The server could keep track of how the document of each participant looks. Both A and B then send a diff with their changes to the server. The server would then apply the changes to the respective tracking documents. Then it would perform a three-way-merge and apply the changes to the master document. It would then send the diff between the master document and the tracking documents to the respective clients. This is called differential synchronization.

    A different approach is called operation(al) transformation, which is similar to rebasing in traditional version control systems. It doesn't require a central server, but having one makes things much easier if you have more than 2 participants (see the OT FAQ). The gist is that you transform the changes in one edit so that the edit assumes that the changes of another edit already happened. E.g. A would transform B's edit insert(3, hello) against its edit insert(0, abc) with the result insert(6, hello).

    The difference between rebasing and OT is that rebasing doesn't guarantee consistency if you apply edits in different orders (e.g. if B were to rebase A's edit against theirs the other way around, this can lead to diverging document states). The promise of OT on the other hand is to allow any order if you do the right transformations.

    No central server

    OT algorithms exist that can deal with peer-to-peer scenarios (with the trade-off of increased implementation complexity on the control layer and increased memory usage). Instead of a simple timestamp, a Version vector can be used to keep track of the state an edit is based on. Then (depending on the capability of your OT algorithm, specifically transform property 2), incoming edits can be transformed to fit in the order they are received, or the version vector can be used to impose a partial order on the edits -- in this case history needs to be "rewritten", by undoing and transforming edits, so that they adhere to the order imposed by the version vectors.

    Lastly, there are a group of algorithms based on CRDT, called WOOT, Treedoc or Logoot, which try to solve the problem with specially designed data types that allow operations to commute, so the order in which they are applied doesn't matter (this is similar to your idea of an ID for each character). The trade-offs here are memory consumption and overhead in operation construction.