Open Google Docs, type a sentence, and watch your coworker's cursor move in real time. Open Figma, drag a rectangle, and see another designer nudge it simultaneously. Open VS Code Live Share and pair-program across continents. Real-time collaboration is so pervasive that we've stopped thinking about how extraordinary it is. Multiple humans, separated by hundreds of milliseconds of network latency, editing the same logical data structure concurrently, and the result converges to something correct. The algorithms that make this possible — Operational Transforms and Conflict-free Replicated Data Types — are among the most subtle and error-prone in all of distributed systems.
This post is a deep technical walkthrough of both approaches. We'll cover the actual math behind transform functions, the formal properties CRDTs must satisfy, why most real-world implementations have subtle bugs, and when you should choose one over the other. If you've ever wondered why Google Docs occasionally produces garbled text during high-contention edits, or why Figma chose CRDTs but explicitly not for text, this is for you.
The Fundamental Problem: Concurrent Edits Without Global Order
The core challenge is deceptively simple to state: N users are editing the same document. Each user sees their own local copy and applies operations immediately (for responsiveness — no one wants to wait 200ms for a server round-trip before their keystroke appears). These operations propagate to other users over an unreliable network with variable latency. There is no global clock, no guaranteed ordering, and operations can arrive at different replicas in different orders. The system must ensure that all replicas converge to the same final state.
This is a specific instance of the replicated state machine problem, but with a critical UX constraint: every operation must be applied locally before it's confirmed by any other node. You can't use Paxos or Raft here — those protocols require a consensus round before an operation takes effect, which introduces unacceptable latency for character-by-character typing. The user must see their keystroke instantly. Reconciliation happens after the fact.
The deepest insight in real-time collaboration is that consistency is not about preventing conflicts — it's about resolving them deterministically after they've already occurred on every local replica.
Operational Transforms: The Server-Authoritative Approach
Operational Transforms (OT) were invented at Xerox PARC in 1989 by Ellis and Gibbs, refined in the Jupiter system at Xerox in 1995, and brought to massive scale by Google — first in Google Wave (2009, famously failed as a product but the engineering was brilliant), then in Google Docs (which has used OT since its inception as Writely in 2005). The core idea: when concurrent operations conflict, transform them so they can be applied in any order and still produce the same result.
How Transform Functions Work
Consider a string document. The two fundamental operations are insert(position, character) and delete(position). Suppose the document is 'ABC' and two users act concurrently: User 1 does insert(1, 'X') (insert X at position 1, producing 'AXBC') and User 2 does delete(2) (delete character at position 2, removing 'C', producing 'AB'). User 1 receives User 2's delete(2) — but User 1 has already inserted a character before position 2, shifting everything right. If User 1 naively applies delete(2), they'd delete 'B' instead of 'C'. The document diverges.
The transform function fixes this. Given two concurrent operations a and b that were generated against the same document state, the transform function T(a, b) produces a' — a modified version of a that accounts for b's effect. Symmetrically, T(b, a) produces b'. The critical invariant is: apply(apply(state, a), b') = apply(apply(state, b), a'). This is called the transformation property TP1.
The String OT Transform Table
For a string-based document with insert and delete, the transform function T(op1, op2) has four cases. For insert-vs-insert: if op1.pos < op2.pos, op2's position shifts right by 1 (because op1 added a character before it). If op1.pos > op2.pos, op1's position shifts right by 1. If op1.pos = op2.pos, you need a tiebreaker — typically the user with the lower ID goes first. For insert-vs-delete: if the insert position is before the delete position, the delete shifts right by 1. If after, the insert shifts left by 1. If equal, the insert happens at the same position (the deleted character is gone, the inserted one takes its conceptual place). For delete-vs-delete at the same position: one operation becomes a no-op (the character is already gone).
This sounds manageable for two operations. Now consider three concurrent operations from three users, all generated against the same state. You need to transform each pair, then transform the results against each other. For N concurrent operations, the number of transform compositions grows combinatorially. Each transform must be correct not just in isolation but in composition. This is where OT implementations start to break.
TP1, TP2, and the Diamond Property
TP1 (Transformation Property 1) requires that for any two concurrent operations a and b against state S: S * a * T(b, a) = S * b * T(a, b), where * denotes applying an operation. This ensures two-user convergence. TP2 is the harder condition, required for three or more concurrent operations: T(T(a, b), T(c, b)) = T(T(a, c), T(b, c)). Informally, TP2 says that transforming a against b, then against c-transformed-against-b, must equal transforming a against c, then against b-transformed-against-c. The result of transformation must not depend on the order in which you process concurrent operations.
TP2 is notoriously difficult to satisfy. In 2005, researchers Imine et al. published a formal proof that several published OT algorithms violated TP2 in edge cases. The violations manifest as rare convergence failures — documents that diverge after specific sequences of concurrent operations. These bugs are nearly impossible to find by testing because they require precise timing of three or more concurrent edits hitting specific position relationships.
Google's solution to the TP2 problem in Google Docs was to sidestep it entirely: use a central server that imposes a total order on operations. Every operation goes through the server, which assigns a global sequence number. Clients only need TP1 (transform against the server's canonical order), never TP2. This works, but it means Google Docs fundamentally cannot work peer-to-peer.
- Jupiter (1995): Client-server OT, TP1 only. Each client maintains a state space graph against the server. Simple, correct, but requires a central server
- Google Wave (2009): Attempted peer-to-peer OT with wavelet-based composition transforms. Ambitious engineering, but the complexity of the transform functions for rich text (not just plaintext) contributed to its struggles
- Google Docs: Server-authoritative OT based on Jupiter. The server is the single source of truth. Clients optimistically apply local operations and transform incoming server operations. Battle-tested at massive scale but architecturally centralized
- ShareDB: Open-source OT library (Node.js) using the same server-authoritative model. Powers many collaborative apps but inherits OT's complexity — the JSON OT type alone has thousands of lines of transform code
CRDTs: The Decentralized Approach
Conflict-free Replicated Data Types were formally defined by Shapiro, Preguica, Baquero, and Zawirski in their landmark 2011 paper. The core idea is fundamentally different from OT: instead of transforming operations after the fact, design the data structure itself so that concurrent operations commute by construction. If operations commute, the order of application doesn't matter, and all replicas automatically converge.
State-Based CRDTs (CvRDTs) vs. Operation-Based CRDTs (CmRDTs)
CvRDTs (Convergent Replicated Data Types) work by merging states. Each replica maintains its full state. Periodically, replicas send their entire state to peers, who merge it with their own using a join function. The merge function must form a join-semilattice: it's commutative (merge(A, B) = merge(B, A)), associative (merge(A, merge(B, C)) = merge(merge(A, B), C)), and idempotent (merge(A, A) = A). These three properties guarantee convergence regardless of message ordering, duplication, or delivery delays.
CmRDTs (Commutative Replicated Data Types) work by broadcasting operations. Instead of sending the full state, replicas send the operations they perform. The operations themselves must be commutative — applying op_a then op_b produces the same state as op_b then op_a. CmRDTs are more bandwidth-efficient (sending operations is smaller than sending full state) but require reliable causal broadcast (every operation must be delivered at least once, and causally dependent operations must arrive in order).
The CRDT Zoo: Common Data Structures
- G-Counter (Grow-only Counter): Each replica maintains its own count. The value is the sum of all replica counts. Merge takes the max of each replica's count. Simple, elegant, and the building block for more complex CRDTs
- PN-Counter (Positive-Negative Counter): Two G-Counters — one for increments, one for decrements. The value is P - N. Supports both increment and decrement while remaining conflict-free
- LWW-Register (Last-Writer-Wins Register): Each write is tagged with a timestamp. On conflict, the highest timestamp wins. Simple but lossy — concurrent writes are silently discarded. Used internally by Cassandra
- OR-Set (Observed-Remove Set): Supports both add and remove. Each add generates a unique tag. Remove only removes currently-observed tags. Concurrent add and remove of the same element results in the element being present (add wins). This is the 'correct' set CRDT for most applications
- RGA (Replicated Growable Array): A sequence CRDT for ordered lists and text. Each element has a unique ID (timestamp + replica ID). Insertions specify a position relative to an existing element's ID, not a numeric index. This makes concurrent insertions at different positions commute naturally
CRDTs in Production: Figma, Yjs, and Automerge
Figma uses CRDTs for their multiplayer canvas — but with an important caveat. Figma's CRDT operates on a tree of objects (rectangles, frames, text nodes), where each object has properties (position, color, size). The CRDT resolves conflicts at the property level using last-writer-wins semantics. This works well for graphic design: if two users move the same rectangle simultaneously, one position wins. There's no meaningful 'merge' of two positions. Crucially, Figma does NOT use CRDTs for text editing within text nodes — they use a separate system for that, because text CRDTs have severe problems we'll discuss below.
Yjs (by Kevin Jahns) is the most widely-used open-source CRDT library, powering collaborative features in tools like Jupyter, Overleaf, and many others. Yjs implements a variant of the YATA (Yet Another Transformation Approach) algorithm for sequences, which uses unique element IDs and a specific conflict resolution rule for concurrent insertions at the same position. Automerge (by Martin Kleppmann) takes a more academic approach, implementing a JSON-like CRDT with rich semantics. Automerge 2.0 rewrote the core in Rust for performance, achieving significant memory and speed improvements over the original JavaScript implementation.
Why Most Implementations Are Wrong
Both OT and CRDTs have failure modes that are subtle, hard to test for, and pervasive in production systems. The academic literature is littered with correctness proofs for algorithms that, when implemented, still produce bugs. The gap between the mathematical model and the engineering reality is where things break.
OT: Off-by-One Errors and Position Mapping
The most common OT bug is incorrect position adjustment in transform functions. Consider: User A inserts at position 5, User B deletes at position 5. The transform of A's insert against B's delete should shift A's position left by 1 (to position 4) — or should it stay at 5, since the deleted character was at position 5 and the insert should now land where it was? The answer depends on whether you define insert(pos, char) as 'insert before position pos' or 'insert after position pos,' and whether delete(pos) removes the character at pos or after pos. Different OT systems make different choices, and mixing conventions — or misunderstanding the convention in a complex transform composition — produces the infamous 'wrong character deleted' bug.
The problem compounds with rich text. An OT system for Google Docs-style editing must handle not just insert and delete, but also formatting operations: bold a range, change font of a range, add a hyperlink. The transform function for retain(5)-then-bold(3) against insert(2, 'X') must adjust the retain count and the bold range. The number of operation pairs grows quadratically with the number of operation types. Each pair needs a correct transform function, and each function needs to handle edge cases around range boundaries, empty ranges, and overlapping ranges.
CRDTs: The Interleaving Anomaly
Text CRDTs have a fundamental problem that wasn't widely recognized until 2019: the interleaving anomaly. Suppose User A types 'HELLO' and User B concurrently types 'WORLD' at the same position. A well-behaved system should produce either 'HELLOWORLD' or 'WORLDHELLO' — one user's text followed by the other's. But many CRDT algorithms (including early versions of RGA and Logoot) can produce 'HWEOLRLLOD' — the characters interleaved arbitrarily. Each individual character placement is 'correct' according to the CRDT rules, but the result is semantically garbage.
The interleaving problem occurs because character-level CRDTs have no notion of 'intent grouping.' User A intended 'HELLO' as a word, but the CRDT sees five independent insert operations. When those five operations interleave with User B's five operations, the CRDT has no basis for keeping each user's sequence contiguous. Yjs solves this with a clever trick: consecutive insertions by the same user are stored as a single 'item' internally, and the conflict resolution algorithm keeps items from the same user together. But this is a heuristic, not a guarantee — certain editing patterns can still trigger interleaving.
Tombstone Bloat and Metadata Overhead
When a character is deleted in a CRDT, it can't simply be removed from the data structure. Other replicas may have concurrent operations that reference that character's unique ID (for example, an insert 'after character X' where X is the deleted character). So deleted characters become tombstones — invisible markers that still consume memory and must be included in merge operations. In a document with heavy editing, the number of tombstones can vastly exceed the number of visible characters.
Martin Kleppmann measured that for a real-world editing trace of a 100KB document, Automerge 1.0 consumed over 300MB of memory — a 3000x overhead. The metadata per character (unique ID, parent pointer, tombstone flag, timestamp) dwarfs the character itself. Automerge 2.0 reduced this dramatically through columnar encoding and the Rust rewrite, bringing the overhead to roughly 10x. Yjs is more efficient still, typically 2-5x overhead for typical documents. But even 2x means a 1MB document consumes 2MB of CRDT state — a real concern for mobile apps and memory-constrained environments.
- Tombstone garbage collection requires consensus among all replicas that no future operation will reference the tombstone — this is equivalent to solving the 'stable state' problem, which is as hard as distributed consensus in the general case
- Undo/redo in CRDTs is an open research problem. Undoing an operation means inverting it, but the inverse may conflict with concurrent operations in ways that produce nonsensical results. The 'undo' of a delete is an insert, but where should the character be re-inserted if the surrounding characters have been modified by other users?
- Rich text CRDTs (like Peritext, published by Ink & Switch in 2021) must handle formatting spans that overlap, split, and merge. Two users bold overlapping ranges of text — the merge must produce the union of bold ranges. Two users bold and unbold the same range — who wins? Peritext defines a formal model for rich text CRDT merging, but the implementation complexity is enormous
- Memory overhead of sequence CRDTs scales with the total number of operations ever performed, not the current document size. A document that's been heavily edited for months accumulates massive state even if the final document is small
Head-to-Head: OT vs. CRDTs
- Server dependency: OT typically requires a central server for total ordering (unless you implement the much harder peer-to-peer variant with TP2). CRDTs work fully peer-to-peer with no central coordinator — any replica can accept operations independently
- Scalability: OT scales well vertically (the server processes operations sequentially) but the server is a bottleneck. CRDTs scale horizontally — each replica is independent — but the metadata overhead grows with the number of replicas and operations
- Implementation complexity: OT's transform functions are conceptually simpler but combinatorially explosive (every pair of operation types needs a correct transform). CRDTs are mathematically elegant but require careful data structure design and have subtle correctness issues like interleaving
- Latency: Both provide immediate local application. OT requires a server round-trip for confirmation. CRDTs achieve confirmation when the operation reaches all peers (no single round-trip, but eventual consistency takes longer in adversarial network conditions)
- Consistency model: OT with a central server provides a total order — there's a single canonical history. CRDTs provide strong eventual consistency — all replicas converge, but there's no single canonical ordering of operations
- Undo support: OT handles undo naturally — the server maintains a linear history and can invert operations in order. CRDT undo is an open research problem with no general solution
- Memory: OT state is proportional to the current document size (the server keeps the canonical document). CRDT state is proportional to the total operation history, which can be orders of magnitude larger
When to Use Which
The choice between OT and CRDTs is not about which is 'better' — it's about your system's constraints. If you have a reliable central server and your primary use case is text editing, OT (specifically the server-authoritative Jupiter model) is the pragmatic choice. Google Docs has proven it works at planetary scale. The server simplifies conflict resolution, provides a canonical history for undo/redo, and avoids the metadata bloat of CRDTs. Your implementation will still be complex, but the complexity is bounded.
If you need peer-to-peer synchronization, offline-first editing, or your data model is not linear text (think: graphic objects, tree structures, JSON documents, game state), CRDTs are the right foundation. Figma's choice of CRDTs for canvas objects is correct — LWW-Register semantics on object properties is natural for graphic design. Local-first applications like those advocated by Ink & Switch require CRDTs because there is no server to provide global ordering.
And sometimes the answer is neither. If your application can tolerate last-writer-wins semantics (the last save overwrites everything), don't bother with OT or CRDTs. Most CRUD applications, form editors, and configuration tools fall into this category. The complexity of OT and CRDTs is only justified when you genuinely need character-level or operation-level concurrent editing. Over-engineering your collaboration layer is a real and common failure mode.
The Future: Diamond Types, Loro, and the Convergence of Approaches
The OT vs. CRDT dichotomy is dissolving. Diamond Types (by Joseph Gentle, the former Google Wave engineer) is a Rust-based library that implements what Gentle calls 'event graph' CRDTs — a hybrid approach where the data structure maintains a DAG of operations (like a CRDT) but uses transform-like logic for conflict resolution (like OT). The result outperforms both traditional OT and traditional CRDTs on benchmarks, with significantly less memory overhead than Yjs or Automerge.
Loro is another next-generation CRDT library (Rust core, WASM bindings) that introduces 'Movable Tree' and 'Movable List' CRDTs — data structures that support move operations natively rather than decomposing them into delete-then-insert (which loses intent). Loro also implements an efficient encoding format that dramatically reduces the storage overhead of operation history. Automerge 2.0's Rust rewrite brought similar performance gains, and the team is actively researching solutions to the undo problem and tombstone garbage collection.
The trajectory is clear: the next generation of collaboration algorithms will blur the line between OT and CRDTs, combining the formal convergence guarantees of CRDTs with the operational pragmatism of OT. The math is getting better, the implementations are getting faster, and the remaining hard problems — undo, garbage collection, rich text merging, and intent preservation — are receiving serious research attention. But we're not there yet. Today, every production implementation of real-time collaboration is making tradeoffs, papering over edge cases, and praying that the rare convergence failures don't hit users at scale. The math is beautiful. The engineering is war.
Build Real-Time Collaborative Systems with Accelar
Accelar engineers distributed systems that handle the hardest concurrency and consistency problems — from real-time collaboration infrastructure to conflict resolution algorithms and eventually-consistent data architectures. Whether you're building a collaborative editor, a local-first application, or a system that needs to reconcile state across unreliable networks, we have the distributed systems expertise to get it right. Let's talk.
