The operational transform stuff used in Google Wave has a rather curious document format. A document is basically just an xml subset document - characters, start tags and end tags. In addition to that, the document has "annotations", which are meta-data associated with ranges, e.g. start position and end position. The white paper justifies their presence with:
Wave document operations also support annotations. An annotation is some meta-data associated with an item range, i.e., a start position and an end position. This is particularly useful for describing text formatting and spelling suggestions, as it does not unecessarily complicate the underlying structured document format.
I can certainly see how it would be somewhat difficult if an arbitrary range from a document would be selected and for example bolded - XML tag nesting is strict and that would cause a mess of open and close tag insertions.
However, is this really a problem in practise? I mean, does one necessarily have to support such operation, if not making an editor that basically mimics the years old word processing paradigm instead of being a structured editor? Would pure XML operational transform with the document structure as simply HTML5 be that terrible? Is it a performance issue that styles would be in the document as tags? Or does the operational transform model somehow produce unsatisfactory results on text formatting if they are represented with tags?
Also, a side question - how good would the pure "insert character, remove character, retain" operational transform model be on plain text representations? For example, editing HTML5 as text - or editing Wikipedia articles?
There are fundamental problems with using a hierarchical markup language with OT. See below for a worked example: