Build better apps using CRDTs [eng]
Talk presentation
Have you ever wondered what makes using Linear feel so good? How can you make your app feel responsive, even on bad connections or without one at all? In this presentation, I will introduce CRDTs and explain how you can leverage them to achieve this. I will briefly introduce the libraries and tools that make building React apps using CRDTs a breeze and walk you through updating an existing app to take advantage of them. Additionally, I will share lessons I have learned from building collaborative apps throughout my career.
- Serial entrepreneur and Co-founder of Can of Soup, maintainer of slate-yjs, and active contributor to slate
- I’ve been obsessed with building collaborative apps and distributed systems for my entire career
- I maintained, implemented, and shipped multiple collaborative and offline-first projects and products at companies like Slite and Liveblocks
- Twitter, GitHub, LinkedIn
Talk transcription
Hi. It's so nice to be here. Today, I'm going to talk about how to build better apps using CRDTs. The first question you might have is, why should you care? Aren't apps good enough right now? Well, if you think about the most delightful apps you've tried over the last couple of years, a few, in particular, come to mind for most people. Apps like Linear, Figma, and Notion are memorable for specific reasons—they feel incredibly snappy, offer live collaboration where multiple coworkers can join a document or Figma canvas at once and collaborate simultaneously, are resilient (you won't lose any data even if the network connection drops), and provide reliable and global undo-redo. So, if you accidentally delete a task in Linear or something in Figma, you can confidently hit undo, and your data is back.
However, all these changes have one thing in common: they are incredibly hard to implement because they rely on conflict resolution and very complex conflict resolution at that. Implementing them is not trivial; it's quite complex. A naive implementation might involve starting with the state of a shape in JSON in your database and overwriting them. However, this approach is problematic when two clients modify the same shape simultaneously or when one client is offline and modifies the shape, leading to potential data loss. Merging these changes is challenging, especially when adding undo functionality.
To address these challenges without experiencing the difficulties faced by Google in implementing operational transform algorithms, you can use CRDTs. CRDTs, or Conflict-Free Replicated Data Types, can be thought of as JavaScript objects, numbers, and arrays with a magical merge function and built-in change tracking. There are various types of CRDTs, each suited for specific use cases, including registers, counters, sets, maps, and CRDTs for text use cases.
So, what does a CRDT look like? Let's consider a concrete example, a Last-Red-Win Map. Using a Last-Red-Win Map in the previous example would have resulted in the correct state. Instead of storing the shape as a plain JSON object, you store it as a Last-Red-Win Map, and on the database, instead of overwriting it, you call the state the current state in the database and the magical merge function of the Last-Red-Win Map. This ensures that you end up with the desired state without making extensive changes. In summary, CRDTs provide a solution to the complexity of implementing features like live collaboration, resilience, and undo-redo in apps. By understanding the specific characteristics of different CRDTs and choosing the right one for your use case, you can simplify the implementation of these advanced app features. Thank you.
So instead of just storing the shape, you store the current value. Here's the clever part for the Last-Red-Win Map: you also store a 'last updated at' value, which in this case is A1. You might expect 'last updated at' to be a plain timestamp, and while that might work, CRDTs are built for distributed systems, where you can't trust time. What I mean by that is you can't trust your clients to have the clock exactly in sync. If that happens, using a plain timestamp might lead to incorrect timing, causing issues down the road. To address this, CRDTs often use what I call Lamport timestamps.
Lamport timestamps, used by most CRDTs, reference time based on the last operation seen. In this case, it's A1, so time would be one—the first operation on the object. The next operation, such as setting the color to orange during initialization, would be two, as one was the last seen operation and now it's two. Additionally, you store the client ID, which is uniquely generated, and is used for conflict resolution. Including the client ID helps prevent conflicts when two clients have the same timestamp.
The merge function becomes relatively simple. You iterate over each entry in the two lists from the example, where one client sets the color to green, updating the timestamp to two, and the other client sets the shape to be a circle, changing the timestamp to two with client ID B. You check if B2 is greater than S1 (which it is because two is greater than one) and if A2 is greater than S1 (which it is in this case). So, you set the value accordingly, resulting in the desired state—a green circle.
To visualize, both start with the initial state of the shape, updating timestamps as changes occur. The server calls current state and merge remote state for each change. The order doesn't matter because you end up with the correct state due to the merge function. The fear around CRDTs is often because it sounds complex, but implementing a simple Last-Value CRDT in JavaScript requires only a few lines of code, as seen here.
The great thing about this abstraction is that you can easily build on top of it and optimize it. For example, currently, we send the entire state of the CRDT on every change, which is fine due to the nature of CRDTs—they allow submitting operations once and can handle reconnection well. Clients can send their current state or just timestamps and ask for changes since those timestamps on the other side. This example sends the state A1 and S2—color timestamp A2, shape timestamp 1.
The other client checks, 'Yeah, the color has a newer update, so I won't send anything over. But for the shape, I actually have a change that happened afterward, so I'm going to send it over.' This allows you to build really efficient protocols that are highly resilient. The great thing is that you don't need a central server for this. You can throw changes, randomly merge them together, and there's no need for a central authority that owns the changes. You can have small edge servers worldwide, connecting to local clients and broadcasting changes together to a central server later on to update your database. This approach allows highly performant collaboration without the issues of needing a central server for two users.
However, this was only a very simple example of a CRDT. Textual CRDTs tend to get a bit more complex, but once you dive into them, most CRDTs have a straightforward core and are actually easier to understand than you might think. All CRDTs share the properties I just mentioned, so the order in which you apply changes doesn't matter, and they are idempotent, meaning applying the same changes twice doesn't affect the outcome. These properties eliminate the need for tracking changes on a central server.
For specific CRDTs, consider the tradeoffs they offer. For counters, you could choose a G counter or a PN counter. The G counter is more efficient, but the PN counter can decrement, which the G counter lacks. The same applies to sets; there's a G set, which only allows adding items, and a two-piece set, which supports a remove operation. To implement delete in a G set, you add another G set to hold the set of deleted items.
Different maps, like the Last-Red-Win Map, face challenges when it comes to removal. For instance, you can set a key to undefined, but you can't remove a key without additional metadata, which is where the R map comes in. There are also various text and list CRDTs. Be mindful of the memory footprint of these CRDTs, especially for text CRDTs that tend to store a lot of additional metadata. For instance, with TreeDoc, you might encounter large document sizes after extensive editing, but there are highly optimized RGA implementations that mitigate this issue.
But yeah, how can I use UDTs? They sound awesome. Well, for JavaScript, there is a library called Yjs, which is a highly efficiently implemented list UDT. Around this list UDT core, there are exposed lists of shared types, similar to JavaScript types as mentioned before. There's a map type for storing object-like data, a list type, a text type for storing text, and other utility types like XML elements. Yjs has a thriving ecosystem with a lot of tooling for building great apps. It includes built-in undo-redo functionality and is incredibly fast and memory-efficient. The memory footprint of Yjs, especially for text UDTs, is optimized, making it efficient for usage. Now, let me show you how to use it. We're going to migrate to the app. The packages we're using are Yjs itself, and we're going to use Hocus Pocus, a way to broadcast changes between clients, as Yjs is primarily concerned with content resolution and Hocus Pocus is a wrapper around the Yjs sync protocol.
Let's head over to VS Code and get started. Here, you can see a simple to-do list. We can add and delete items, check them off, but it's not collaborative yet. Now, let's make it collaborative. We'll run Hocus Pocus to broadcast the changes. First, we connect to the Hocus Pocus server to get a Yjs document, which holds all the changes. We use the provider, a wrapper around the document that handles syncing changes. We get a shared type, the actual CRDT holding the changes, and in this case, it's a shared array type called 'to-do list.'
Yjs itself isn't immutable and lacks React integration, so we use a reducer to manually trigger re-renders on every change. This way, we don't have to convert the list and store it in React state. We add a listener to trigger a re-render whenever the list changes. Now, we need to change the state. Instead of updating the state directly, we add the to-do to the shared list. To do that, we create a new item." Note: The text was truncated, and the context of the application code might not be complete. If you have further questions or need more assistance, feel free to ask!
Yes, it's off. Because we don't want to just store JavaScript objects in the list, we want to share types that can be modified by multiple users at once. So, we create a new item map, which is a different shared type used for storing object state. We push that and set the same properties. It has a very suboptimal typing where we can only type the ideas of the thing. We can't use key values. To resolve this, we create a new map and specify that this map can have values of type number, string, or Boolean. For the list, we can type the values in TypeScript, which is a map of the value of a number, a string, or a Boolean. This is done for optimization purposes, allowing us to push multiple items at once, which is more efficient. If we know what to do, you will see that the to-do text works now. And if we add items, you can see that they show up collaboratively. Let me clear the list again for clarity. And because we have CRDTs and we saw the client state, those changes will be saved locally. So we have to clear both tabs and open them again so the changes won't get resynced to the server, making them resilient.
Now, if we add changes, you can see that they are broadcasted to the remote client, which is really nice. The changes won't get updated because we didn't... and that's the general gist of it. You get a provider, get a shared type, in this case, Y-ray, which stores the to-do. You can modify that in JavaScript using a JavaScript map, for example, for the map or a list in JavaScript. The syntax is different because, in this case, you pass an array for optimizations. You have some functions to trigger a re-render so that you don't have to store the entire state in React state. This function can technically be removed; it's just here to make the other functions of the example work right now. And that's how we made this React app collaborative using Yjs. The cool thing about having the re-rendering decoupled from the actual document state is that you can use the new React transition API to deprioritize those re-renders, which is really nice because you can have Yjs handle up to 100 people on the same document efficiently. The bottleneck isn't really Yjs; it's React and the re-rendering because it happens so often when you have 100 users typing in the same document at the same time. But if you combine that with the approach of manually requesting re-renders and using the transition API, you can have highly performant collaborative web apps that can handle hundreds of users.
Okay, that's it. Lessons learned from building apps with CRDTs. Everything is a trade-off. One thing I didn't touch on in this presentation is that while CRDTs are awesome, there's one significant pain point, which is migrations and schema changes. You have to think about it because if you change data and overwrite data, you will lose the additional metadata or change the additional metadata associated with those CRDTs, influencing how remote changes are merged. You can end up with some really painful migration code in the end to handle all of that.
You should definitely spend some time understanding the inner workings or at least the algorithms behind the CRDTs you're using and the trade-offs. I see a lot of people just picking up Yjs and the text type and start building editors around them. But they encounter some edge cases just because of how the conflict resolution works, which is something you have to look into with all tools that do conflict resolution. Spend some time learning about how it works and why changes are merged in a specific way. It's actually beautiful once you understand the inner workings.
Enforcing permissions is a lot harder because you often have multiple changes of multiple users on a single client, especially if you're broadcasting changes over RTC or having an edge server that matches them together and then sends them to a central server. You have to dig into the deeper levels of the protocol, for example, with Yjs, to figure out what actually changed. You need to ensure on the edge, before it gets stored on the server, that the change is allowed to happen. It's harder than in traditional apps, but it's totally doable. You just have to look into it.
Maintaining WebSockets at scale is still a huge pain. Setting up all that infrastructure and finding solutions because you can't usually use normal serverless edge functions for this, as no real edge servers support WebSockets right now. However, good news – Cloudflare has a project called Broadcast, which was launched over a year ago. It's excellent for use cases like this, providing long-running edge servers that are relatively cheap and can accept WebSocket connections. They also have another product called Durable Objects, which lets you have a single globally unique JavaScript object that is smartly replicated at the center of all your users. If you use these alongside Cloudflare Broadcast, you can build something highly performant and relatively quickly.
However, there are still some unsolved issues in the CRDT space because it's relatively new. For example, if you have a JSON document, reparenting is not efficiently solved. There isn't a CRDT that doesn't store a huge amount of additional metadata to allow you to say, 'I want to move from key A.B to key C.E.' You run into additional metadata issues quickly, especially with circular references. Moving and splitting text is also unsolved, not just in CRDTs but also in operational transform because of numerous edge cases. Cleverly structuring your data can help you get around these issues, but you need to understand the inner workings and look into it. That's it from me.