The Life of a FaunaDB Query
FaunaDB is a transactional NoSQL database for mission critical data. The engineering team at Fauna, tasked with building the next generation of high-performance scalable databases, identified three criteria in FaunaDB’s design phase:
- An Internet-first communication protocol;
- Low-latency and consistent read operations;
- High-throughput, minimally latent and consistent write operations.
Each of these three key features can be observed in action by watching a query propagate through a FaunaDB cluster.
The recommended way of interfacing with a FaunaDB cluster is by writing transaction queries with one of our host language query drivers. These drivers sit in between application code and the network, marshalling data traveling over the wire and providing an idiomatic interface tailored for each language. However, we will be demonstrating queries directly in our wire format, which we chose to be JSON. An example of JSON-flavored input and output is shown in Figure 1.
The reasons for choosing JSON are numerous. In addition to being human-readable and flexible, JSON is the lingua franca of the modern Internet. Speaking JSON over HTTP means clients and apps can interface with a FaunaDB cluster, such as our Managed Cloud Service, directly over the open Internet. This means developers need not build layers of middleware that translate between web requests to a cordoned-off database, or be surprised by ancient middleboxes being confused by traffic on a non-standard port. Unsurprisingly, then, the entry point to any FaunaDB node is its HTTP endpoint, which consumes JSON blobs to evaluate.
Enter the JSON
The steps required to take a JSON query and transform it into a FaunaDB transaction can be observed by ways in which the system can fail in this process. First, we validate the well-formedness of the HTTP body and confirm that it is indeed valid JSON (figure 2).
When successful, the JSON is parsed into an abstract syntax tree (AST) for our query evaluator to execute. It is here that the wire protocol's flexibility becomes a liability, however. JSON's objects are unordered key-value pairs and values cannot be constrained by type. Therefore, the components of the AST need to be validated against our language specification to ensure parameters' types match our documentation (figure 4), and that the overall shape of the expression matches a valid form in our query language (figure 3).
Our AST parser also infers the effects of the transaction to be evaluated. Certain transactions may read data (recall figure 1), write new data (figure 5), perform read and write operations (figure 6, which mutates the instance we read in figure 1); or, a query may even contain pure expressions that require no IO whatsoever (figure 7)! In this last case, the database can immediately evaluate the query and return whatever the expression evaluates to. We are not so fortunate in the other two cases, however.
From Local Evaluation to Global Fanout
Because of FaunaDB’s ACID guarantees, any replica in a cluster will have had previous write transactions already propagated to them. And, FaunaDB's strong consistency guarantees the most recently-committed version of an instance are readable. Therefore, read-only transactions don’t need to be synchronized across the entire cluster in the same way that a write transaction would be. In this case, our read interface will pull data either from local storage or nearby nodes in the same replica, yielding a document containing the result of the read, which is then rendered and sent back to the client.
FaunaDB has a similar write interface for handling transactions that require updating state. Queries pass through the write interface but, unlike with the read interface, a JSON result to be sent back to the client is not produced just yet. Instead, whenever the interpreter encounters a write operation to evaluate, the write interface creates a descriptor of each of the values to be updated, in the form of a before-and-after diff of each to-be-modified document. (Later, we will see why the write effect’s context comprises more than just the value to be written to disk.)
After the query has been evaluated, these so-called write effects are collated and passed to our transaction engine, to be applied across the cluster.
At this point, we should pause and briefly consider a larger view of the system. Up to this point, our discussion has just been with respect to a single database node. If you are running our Developer Edition of FaunaDB, that single node is the entirety of the database, offering no data sharding, failover, or redundancy. For local laptop development, that might be enough, but for a production-ready system we need more. [Editor's note, FaunaDB Enterprise is now available for free download.]
FaunaDB nodes running in production environments, on the other hand, form a database cluster out of a constellation of nodes; the cluster is grouped into a flexible number of geography-spanning replicas, that may in turn contain a flexible number of individual FaunaDB nodes, responsible for a subset of the total dataset. When a transaction necessitates a write effect, some substrate connecting all nodes is required in order to have that write propagated from the originating node to all nodes replicating that data. This substrate must do so in a manner that ensures consistency in the presence of crashing nodes, failing networks and other concurrently-processing writes. This is the mandate of the transaction engine.
Having been passed to the transaction engine, the write effects are added to a log of transactions concurrently executing on this particular node. The log can be thought of as the official ordering of transactions on a node; if one write effect occurs in the log before another, it means FaunaDB considers it to have happened before the other. However, every node in the cluster must be of one mind with respect to the contents of this log. For this, we use an implementation of the Raft consensus protocol, optimized for efficient operation over wide-area networks, to obtain this.
Using our Raft implementation, on a set epoch interval, nodes’ logs are disseminated to all others, and are coordinated via our deterministic concurrency control system to agree on the ordering of all the log’s entries’ write effects. Using a modern and optimized consensus protocol rather than the classical distributed transaction technique of two-phase commit with row locking drastically improves performance without sacrificing correctness; Raft requires fewer cross-cluster round trips, which is critical for a geographically-diverse deployment. Further, batching these operations on an epoch boundary rather than coordinating on every write amortizes that cost of global coordination across the system. The end result of this is that all hitherto pending transactions across the whole cluster have been replicated and been given a total ordering, and are ready to be consumed by the final piece of the puzzle, which manages where the transaction’s data physically resides.
At Long Last, Data Hits Disks
The data subsystem is where IO finally occurs: all to-be-processed write effects are pulled off the log, ordered by their logical timestamp. Our concurrency control protocol is optimistic in that it is only at this final stage that we detect whether a write effect’s read dependencies, including the write location itself, has been processed with a later timestamp. If no such writes exist, then the write effect’s diff is applied and synced to durable storage. If any do exist, however, then the transaction needs to be aborted and retried, since its diff now depends on the effects of another transaction.
Once all writes belonging to a transaction have been applied on a quorum of nodes across the cluster, the client will be informed of its success.
In this article we started with a JSON query and followed it through each of FaunaDB’s subsystems, showing you how the moving pieces contribute to a high-performance distributed database.
FaunaDB is a complex system internally, but we have worked hard to design a friendly user interface for both database operators and application developers. Scalable, secure, transactional, global, multi-cloud, multi-tenant, temporal, and highly available, FaunaDB is designed to simplify development of distributed applications while making database operations dramatically easier.
Looking for more hands-on experience with FaunaDB? Check out the recent Fauna Query Language tutorial in the Getting Started w/ FaunaDB blog series for developers. Sign up for a free FaunaDB Cloud account to begin experimenting with strongly-consistent transactions of your own. Also, if you found this post interesting, and building such a system sounds enticing, be sure to check our jobs page to see if anything strikes your fancy!
If you enjoyed this topic and want to work on systems and challenges just like this, Fauna is hiring!