May 22, 2018


Blog Categories

Unifying Relational, Document, Graph, and Temporal Data Models

Different business requirements drive the need for different data models. Consequently, databases evolved and specialized to keep pace. Today, a software system might use a relational database for transactional data, a graph database for social identity management, and a time series database for analytics, all within the same application. From an operational standpoint, this is a scenario every enterprise would love to avoid.

To address this issue, some databases attempt to offer a multi-model approach. These databases offer multiple modeling techniques (e.g. relational, document, graph, etc.). within the same database. However, these systems introduce model-specific interfaces that are often distinct, and cannot be used in combination. They prevent data from being accessed using the right approach at the right time.

FaunaDB abstracts data in a way that plays well with various models, so mixing and matching graph, relational, temporal and document access in a single query feels natural and doesn’t require context switches.

FaunaDB takes a different approach to this problem. The Fauna uses a multi-model approach that unifies the ability to read and write documents, with the ability to use relational, graph and other styles of data interactions within the same query language, giving developers flexibility to choose the right approach in context. The query language runs with ACID transactions and temporal retention, on top of cloud-native distributed storage, so your workloads run with enterprise class support, no matter which data model you’re using.

Database models have evolved over time. Let’s examine some of the current approaches available in the market.

The Relational Model

The relational model was first described in 1969 and is optimized for the concerns of an era when a megabyte of storage cost hundreds of dollars. Data stored in the relational model can be queried and viewed in formats that support application use-cases, even while the underlying data fields are stored in a normalized schema designed to prevent data duplication. By specifying the individual data items, their types, and their relations to each other, this model supports efforts to maintain data integrity. It also means that as data access patterns change over time, new indexes and queries can work with the existing data. The joins and constraints offered by the relational model facilitate a single system of record serving many different use cases. Databases built on the relational model are able to evolve with their application over time, at the cost of modeling the shape of the data in a schema.

Databases built on the relational model are able to evolve with their application over time, at the cost of modeling the shape of the data in a schema.

This strength can also be a weakness, in that changes to the data model must be thoughtfully considered and carefully applied. Simple schema changes must be coordinated across development and production environments, and when the production environment involves many servers the application tier may require coordinated upgrades. This can be especially burdensome when multiple teams are building features that use different parts of the database, as even though the teams aren’t collaborating on code, they still have to agree about the database schema.

Because relational databases encode data into tables where all records must have the same shape, applications that consume variable data from messy real world sources may see little benefit to encoding a schema at all. Instead the schema emerges downstream as the data is queried and aggregated. The document model has historically been a better fit for this.

Imagine you are building a social media app where users can post content and follow other users. The main screen of the app is the recent content from everyone the user follows.


The data model starts simple, with a table to track users, a table to track content, and a table for the following relationship. But when as soon as there are multiple types of content the photo team and the audio team will have to coordinate schema changes to the Content table. This friction compounds each time some part of the database needs to be changed by more than one team. In a complex fast moving app, this friction can motivate migrating to another data model, like the document model which is designed to accommodate varying data types. The alternative is to deepen the relational model of your application, introducing a class to capture the variation in content types.

You’ll either have to join through the class also, to find publication dates, or remember to update the publication object every time it’s associated content object is changed. Did I mention the upcoming requirement to keep old versions of the content around to facilitate undo? Making each of these tables temporally aware adds additional complexity that must be addressed anytime the database is used.

The Document Model

In the document model, data can be stored with any structure. Typically, it is stored as it is received or presented, for ease of development. Each document is independent, with no joins, constraints, or relations between documents. There are as many query styles as there are document databases with options ranging from map reduce to full text search, and no standard API. The biggest advantages of a document database are the flexibility to store data as it comes without configuring a schema first, and the performance benefits those simplified data access patterns bring. The lack of schema means developers can freely iterate on application features without coordinating with other teams about schema changes, a productivity boost that can especially benefit time-to-market. The simple data access patterns give developers clarity about what the high performance path will be, so as long as apps are built with performance in mind, the document model can lead to a responsive user experience.

The biggest advantage of a document database is the flexibility to store data as it comes without configuring a schema first.

However, over time the ease of no schema can lead to maintenance headaches. Without relational integrity application errors compound as database errors, manifesting as invoices without purchasers, groups without owners, and other hard to repair issues. Without the integrity that comes from relational constraints, the ease of use of a document database is about the shape of the data that is stored, but challenges arise with queries and updates that are incongruent with the originally planned data access paths. At some point document database contents must be migrated to a fresh format, so that conditional application logic to deal with old document structures can be removed. Without periodic data cleanup, done by rewriting the contents of the database in the structures preferred by the latest version of the application code, the code will accrue conditional logic at the cost of maintainability. This downside of the document model strikes later in the development process, so by the time you get there it’s too late to turn back.

In our social media example, a document database would cope just fine with having different types of content to publish. Where it gets challenging is creating the individualized activity feeds. Without support for joins or complex queries, application developers resort to stitching together the results of fan-out queries in code, adding delay and complexity to each page view. An alternative is precaching the timeline view by writing new content to it as it is published. Doing the work on write instead of on read is not usually a first choice, because speculatively building data artifacts for inactive users can be relatively expensive. That making N extra copies of each data item is considered a solution shows where this approach leads. Data format changes and cleanup become even more challenging when data is copied through a pipeline. Code all along the app might have to deal with an intermix of old and new data formats, adding unwelcome maintenance burden.

The Graph Model

The graph model is useful for finding patterns in relationships. If you’re optimizing container shipping it helps to be able to compare potential container routes on attributes like cost and speed, for instance in preparing for regulatory changes or in case of disruption to major shipping routes. Queries like this can sometimes be most efficiently executed by visiting the relevant data objects. Graph databases optimize for querying across deep relationships. They are useful to draw insights from data that might otherwise be hard to find, because they focus on how data items are connected.

Graph APIs in the same context as operational and transactional data are sufficient for many use cases, even without specialized graph storage and processing.

The graph model is typically implemented to optimize fast searches, not scalability. Specialized graph databases rely on scale up infrastructure utilizing a single server with large amounts of memory to make processing large data sets feasible. They are not suited to running on clusters of commodity hardware, so while they are useful for niche cases they have not seen broad adoption. The recursive features of SQL have seen more adoption, showing that graph APIs in the same context as operational and transactional data suffice for many use cases, even without specialized graph storage and processing.

The Temporal Model

Enterprise applications need to track changes, and social applications need to present the latest updates from friends. All sorts of applications can benefit from auto-expiring records, especially with requirements like the GDPR becoming common. Event and snapshot feeds can also be useful for updating external indexers and other integration tasks.

Implementing temporality in relational databases requires adding additional dimensions to your schema to track valid time and transaction time for each record. Recovering from mistakes, auditing change history, and querying old versions of the data are supported natively by FaunaDB’s temporal snapshot storage.

Tracking changes to all classes in the relational model requires adding complexity to all queries. FaunaDB’s unified support for temporal event tracking makes supporting use cases like event sourcing and audit logging as easy as querying the event APIs.

FaunaDB’s Unified Multi-Model Interface

FaunaDB was built from the ground up to provide a unified model that encompasses document-relational, graph, and temporal paradigms, while leaving room for future additions.

Relational: FaunaDB brings the best aspects of the relational model into the cloud era, by allowing flexible documents by default, but offering relational features like constraints, indexes, and joins when you want them. This combination means you can start by simply storing your data as is, and then define indexes and begin using joins as needed. Schema changes aren’t necessary when your application starts storing a new field, so unnecessary coordination among developers is minimized. Relational features give your schema the flexibility to serve additional use cases as your application matures, so you can store data in a format that makes sense today and be assured that unanticipated queries won’t require data migrations.

FaunaDB’s model combines relational correctness and integrity with the ease of use of documents, so your data stays useful as your application evolves.

Document: FaunaDB’s approach gives developers and architects the choice about what data structures to use, and at what granularity to store and index them. Unlike the primitive document model, the combination of FaunaDB’s relational features with its flexible document storage supports applications that evolve and develop far beyond their initial use case. Adding fields requires no coordination with other developers, while at the same time the schema can enforce uniqueness and provide indexes to fit the application needs. Just as RDBMS systems have proven capable of keeping up with application changes over decades, the flexibility of FaunaDB’s unified model means your database can keep up with your application for the long term, not just through its initial development cycle. With a single system of record, efforts can focus on integrity at the source so that all queries benefit. FaunaDB’s model combines relational correctness and integrity with the ease of use of documents, so your data stays useful as your application evolves.

Graph: For graph workloads, FaunaDB object references can be followed for scalable and easy to use graph queries that integrate with OLTP data. By fitting seamlessly into the query language, graph predicates can interoperate with other features like access control and temporality. Running in the same database as the system of record means graph queries take advantage of ACID snapshot isolation, global scalability, and FaunaDB’s other operational capabilities. Graph traversal is a standard part of the query API.

Temporal: Some datasets are inherently time based, like social media feeds and event logs. FaunaDB makes traversing changes in snapshot order easy; for these applications, the time dimension can be handled by FaunaDB’s event view. The tutorial that accompanies the example query below explains how to use event queries to build an activity feed from a complex join across follower relationships and authors. Native support for this type of query can simplify your social media applications so users can keep up to date with the latest changes, without a lot of code on your part to track what they’ve seen.

FaunaDB’s approach gives developers and architects the choice about what data structures to use, and at what granularity to store and index them. 

Unlike single model or multi-model databases, wherein adding new data models to these databases clutters the API and creates problematic trade-offs, FaunaDB abstracts data in a way that plays well with various models, so mixing and matching graph, relational, temporal and document access in a single query feels natural and doesn’t require context switches. FaunaDB accomplishes this by building on top of a robust cluster operations system designed to keep queries and workloads from impacting each other. The ACID transaction pipeline and indexer mechanism provide a flexible foundation for supporting multiple data models, all with intuitive strong consistency guarantees. For more information about how queries are processed by FaunaDB, this blog post about the life of a query goes into details.

Example Query

This example shows a combination of document-relational, graph, and temporal data models, by querying a social graph for accounts the reader is following, joining the posts from those accounts, and using temporal events to present the latest updates to the reader. You can read the details of this query in our activity feed tutorial, where you learn to build a social news feed using FaunaDB.

q.Map(
    q.Paginate(
      q.Join(
        q.Match(
          q.Index("followees_by_follower"),
          q.Select(
            "ref",
            q.Get(q.Match(q.Index("people_by_name"), "carol")))),
        q.Index("posts_by_author")),
      { after: 1520225699165542, events: true }),
    function(event) {
      return q.Get(q.Select("resource", event));
    })

The document for the user named "carol" is looked up in the "people_by_name" index, and the users that Carol follows are loaded via a graph-style "followees_by_follower" relationship index. Those authors are joined to their posts via the "posts_by_author" index, and the set of posts is paginated using event mode. The timestamp argument to after (1520225699165542) corresponds to the snapshot as of which the user had last viewed the feed, so event pagination begins from that point, leaving out earlier articles.

Conclusion

Unifying multiple data models in a single query language requires a product that’s been designed with the right type of abstractions. Early NoSQL databases are characterized by thin abstractions exposing developers to the underlying implementation, with API features that map to physical data layouts such as B-tree indexes, column scans, and quorum CRUD operations. FaunaDB takes a different approach, abstracting the underlying cluster to provide robust quality-of-service management, simple operations, and ACID transactions for all query types. This abstraction allows the database engine to safely interleave workloads with different I/O characteristics. New query capabilities can be added without impacting the performance of existing applications. The result is a simpler stack with one unified database cluster to support all your applications.

The freedom to adapt your query model to changing requirements gives you a solid foundation for the future of your organization's data.

Choosing a database with a unified query model like FaunaDB allows you start with the relaxed constraints of the document model, but add relational constraints and graph queries as necessary. The freedom to adapt your query model to changing requirements gives you a solid foundation for the future of your organization's data.