Jun 09, 2018


Blog Categories

Introduction to FaunaDB Clusters

FaunaDB is a mission critical, NoSQL database architected specifically for operationally distributed environments. The database focuses on six key principles: ACID transactions, global scale, reliability, operational simplicity, security and developer friendliness.  

FaunaDB was built to scale horizontally within a datacenter for maximizing throughput while spanning globally distributed sites easily - ensuring reliability and local performance.

FaunaDB was built to be scalable and reliable without compromising operational simplicity.

This blog post will focus on understanding the operational topology and organization of FaunaDB.

Evolution of the Database

The modern database had it commercial roots in the 1960s with the release of the IMS database to customers with deep pockets, such as Caterpillar and the Apollo space program. The next big step in database technology came in the 1970s when Edgar Codd described the theory of relational databases, followed by the release of DB2 on a mainframe. While these innovations made databases more affordable for governments and fortune 500 type businesses, the price prevented a majority of businesses from utilizing databases. Business again forced technology to progress. In the 1980s Unix and the transactional client-server database came together with affordable computer hardware. This technology evolution opened the door to many new business models, such as order inventory systems, HR systems, and point of sale. In the beginning, many of these business models where operational during local business hours, and then extended business hours but, with the popularity of the internet and the availability of global information for business at the turn of the century, the need for always available databases and intense scaling was paramount. This led the way for a radical split in the database landscape between the businesses which spent money scaling transactional databases and those which utilized a new class of databases technology revolving around the CAP theorem (see Figure 1) or an eventually consistent database model.

CAP theorem
Figure 1

This recent class of database creates a networked cluster of commodity based hardware and sacrifices data correctness and the principles of ACID transactions to achieve scale. The belief is that not all business models require 100% correctness, but nearly correct would be good enough. The responsibility was moved to the application program to understand the business model and guarantee sufficient correctness for their business.    

It’s been noted that the CAP theorem leaves some things unstated. Daniel Abadi’s PACELC asks what happens during a partition vs when there are no partitions. There are other discussions that point out how different operations can fall into different categories. You can read Daniel Abadi’s take on FaunaDB’s consensus algorithm here.

Database clustering has been historically a rocket scientist's task. This was because most databases were built for a client-server deployment style. When the internet was born, the need for data distribution came along. Vendors invented complex data replication schemes that required additional hardware and software. They still do so. These techniques are a drain on your finances and impede reliability.

NoSQL databases promised to address scale (albeit at expense of enterprise capabilities like ACID compliance). They never lived up to the promise. Have you ever tried installing a Cassandra or a Mongo Cluster? It still requires rocket science.

FaunaDB was built to be scalable and reliable without compromising operational simplicity. We built clustering into the core of the system to make this work. It wasn't an afterthought. We wanted to build a database that would work just as advertised - deploy a node and go, no more managers of managers or special hardware and software.

Fauna is leading the way to the next evolution of database technology with its mesh cluster. Fauna was built with cloud, containers, and serverless in mind and utilizes new research and technology to provide the next step in database evolution. Fauna takes advantage of commodity and cloud based hardware to achieve scaling while providing 100% adherence to the principles of ACID. This removes the burden of data correctness from the application developer and places that responsibility on the database.   Application programs can focus on the usability of the application rather than the correctness logic, making Fauna a friendly database for developers.

FaunaDB Clustering Basics

FaunaDB uses a mesh-oriented approach to clustering: Deploy a node, point it to its peer, and go. The system takes care of everything else.   

The key components of a FaunaDB cluster are:

  1. Node
  2. Replica
  3. Shard/partition

The example cluster referenced in Figure 2 is what will be exploring in great detail, starting with the inside and working our way out. Along the way we will explore the interaction of each layer and how both scale-out (horizontal scaling) and scale-up (or vertical scaling) is achieved by FaunaDB.


Figure 2

Fauna physical data layout can be described entirely by three simple concepts, node, replica and cluster.

Node

A node is a single computer with its own IP address in which FaunaDB is installed and running. The Fauna nodes can run on all public cloud services, including Amazon, Azure, Bluemix, and Google, in addition to running on a private cloud, an on-premise solution, a virtual machine, a docker image, and several other platforms.

When considering sizing a Fauna system, it is important to look at the individual node sizing. While Fauna scales very well vertically, the price tag for an extremely large complex system requiring a single large server can be cost prohibitive for many companies. This is why Fauna concentrated a significant amount of effort on scaling horizontally on commodity hardware. This class of hardware generally provides a better cost per transaction throughput.

Fauna concentrated a significant amount of effort on scaling horizontally on commodity hardware.

A FaunaDB cluster does not have a single master node. Instead there are some nodes which take on a multiple responsibility. All nodes will store, retrieve, and route requests for data. In addition, one or more nodes may also handle the transaction log. If the transaction throughput is high, the ability to split the transaction workload amongst several nodes is essential. This is just one of the many ways the Fauna server demonstrates its horizontal scaling capabilities for large systems.

Replica

A replica is comprised of a group of one or more node(s). A node can belong only to a single replica. A fundamental property of a replica is that it contains a complete copy of the Fauna data, including end user data, system data, and the transaction log. Within a replica, a particular piece of data is normally found on exactly one node; there is no data redundancy provided by Fauna within a replica. Having multiple replicas, each containing the full set of data, is what provides redundancy in a Fauna database cluster. Another common term for a replica would be a datacenter.

Having multiple replicas, each containing the full set of data, is what provides redundancy in a Fauna database cluster.

A great advantage of Fauna when sizing the system is that each replica is capable of being a different type and size. A company which has a large presence in the South with its own IT infrastructure can host Fauna on-premise with more hardware and capacity. In this scenario, the same company is expanding on both the East and West coasts and the remote offices are small with limited IT infrastructure. These two replicas can be located in the cloud with a smaller hardware footprint. This allows a business to maximum cost effectiveness while enjoying “custom-fit” geographical proximity.

Cluster

A FaunaDB cluster is the highest entity. Under the cluster is both a physical layout and logical layout. To review, the physical layout it is comprised of individual nodes. Every node is a computer with a unique IP address. Nodes are grouped into replicas, with every node belonging to exactly one replica. The main significance for this grouping is that a replica contains a full copy of the data and can vary in size. Within a replica, a particular piece of data is normally found on exactly one node; there is no data redundancy within a replica.

Having multiple replicas, each containing the full set of data, is what provides redundancy in a Fauna database cluster.

The FaunaDB Cluster has a logical data model comprised of databases, collections, and documents. The data model implements a semi-structured, schema free object-relational data model, a strict superset of the relational, document, object-oriented, and graph paradigms.  The model starts with the database which can be laid out in traditional flat fashion, in a hierarchical or a combination of flat and hierarchical as seen in Figure 3.


Figure 3

While many vendors can have multiple databases in their system, few SQL or NoSQL vendors have the concept of a hierarchical database. This hierarchical organization of data provides many benefits in the multi-tenant scenario:  

  • The ability for a single company to share resources between two, three or a yet to be determine number of projects by simply creating a database under or alongside another database.   
  • The simplification of access control, resource allocation and sharing, and data layout, is kept uniform by simply inheriting the properties and resources.  
  • Eliminates the need to manage properties and resources for each database individually.

Below the database exists collections/classes which are similar to a table in a relational database. However, like a document database, full or partially shared schema within a class is optional, not mandatory. Collections/classes contain zero or more records, or semi-structured documents, which can include recursively nested objects and arrays as well as scalar types.

Conclusion

This introduction to both the physical and logical data models of FaunaDB clustering demonstrates the simple elegance behind its design. While many transactional databases increase their complexity when scaling and require specialized hardware, FaunaDB goes the extra mile to keep things simple while scaling both vertically and horizontally on all styles of hardware. It simply addresses not only a technical pain point, but also a critical business issue for all enterprises - large or small.