Relational is More than SQL
Bob Muglia|Sep 15th, 2023
SQL is a good friend to many people, myself included. My first deep dive into SQL was in 1987, just before I became the first technical person at Microsoft to work on SQL Server. It feels like SQL has been around forever. In just a few years, SQL will celebrate its 50th birthday. Developed by IBM in the 1970s and popularized by Oracle and a slew of other database systems in the ‘80s, the relational model and SQL revolutionized the database industry.
SQL and the relational model
Before introducing the relational model and SQL, the hierarchical and network databases from the late 1960s and 1970s were inflexible and difficult to use. I had struggled to use and maintain these systems, and when I learned SQL, it was a breath of fresh air. Compared to earlier databases, the flexibility of the relational model, together with SQL, made it much easier to modify the database schema. It also dramatically simplified the job of preparing reports, typically printed on huge stacks of contiguous paper. Those were the days.
Most importantly, SQL databases made supporting highly consistent ACID transactions easy. This simplified application development and, even today, SQL transactions form the basis of most business systems.
Most people think of SQL and the relational model as synonymous, which is sensible because of SQL’s long-term dominance. However, they are not the same thing. The relational model is a set of mathematical concepts applied to data of different shapes. It is the most general form of database mathematics and provides a mathematical foundation to effectively support any operation within a given domain.
On the other hand, SQL is a database language that supports an important subset of relational capabilities. SQL’s design works with structured data or tables. Decades of evolution have made SQL databases quite good at this. Relational tables are a powerful data model. They are excellent at some things, particularly analytics. Modern analytic SQL databases like Snowflake or BigQuery deliver business insights at incredible scale to thousands of simultaneous users. These SQL databases are the slicer-dicers of historical, analytic data.
Limitations of SQL for data analytics
SQL is now an industry standard for analytic operations. However, there are database problems that SQL does not address when it is awkward or impossible to model data as a table. For example, SQL struggles with complex sets of interconnected objects modeled as a graph. Graph data does not easily fit into the structured tables SQL’s design accommodates. Because SQL is inappropriate for graph data, the industry needs alternative approaches to solve this problem.
New products based on the relational model seek to address these unsolved analytic database problems. RelationalAI drives this effort in close partnership with data warehouse providers – and the Modern Data Stack – to deliver a relational co-processor for applications that require a graph data model.
SQL and operational applications
Operational databases are the data workhorses for business applications used by people worldwide. SQL products like Oracle, Microsoft SQL Server, and PostgreSQL commonly serve as operational databases. The consistent transactional semantics provided by SQL deliver a strong foundation for building business applications.
Operational applications have very different requirements than analytics. The structured table is an ideal data model for many operations when analyzing data. The combination of tables and the relational model makes it easy to query and transform data.
However, the structured table has an impedance mismatch for operational applications with modern object-oriented programming languages like Python, Javascript, Go, and others. These programming languages use a nested, object-oriented, in-memory data model. Writing this data to disk for long-term storage is most naturally modeled as hierarchical, semi-structured data. The popular Javascript Object Notation (JSON) file format perfectly maps to the Javascript memory model millions of programmers worldwide use.
The mismatch in the data model between application programming languages and SQL databases is a decades-long issue. A common solution is called an object-relational mapping layer. ORM solutions like Hibernate for Java have existed for over 20 years. These products try to bridge the gap between SQL-structured tables and modern languages' object-oriented in-memory data structures. While the industry has considerable experience using ORMs, they are an awkward solution that can cause application and performance issues.
More importantly, the rigid structure of SQL tables is inconsistent with the frequent changes required by many modern applications. Languages like TypeScript have popularized an innovative approach known as gradual typing, a combination of dynamic and static types that enables experimentation and rapid development while still providing the performance and verifiability of static typing for the more refined and stable parts of the design.
Unfortunately, if you are using SQL, the development flexibility enabled by modern languages does not extend to the database. SQL databases require a static, fully structured schema. While SQL supports schema evolution, the highly structured design of SQL tables requires planned, thoughtful database migration and updates. Because of the need to model data in structured tables, SQL database schema modifications are a bottleneck for developing many applications.
SQL is a database language that lacks a standardized approach to procedural logic. While many operational SQL databases have stored procedure extensions, the procedural logic for most applications is embedded within an application server connected to a SQL database using a stateful, persistent session. This design approach for SQL made sense 50 years ago, but it is a painful legacy for modern, connectionless cloud services. It generally requires the application code and database to co-reside in the same data center region, and it is a particular challenge for serverless or geographically distributed “edge” applications that cannot support persistent sessions.
Document databases
Because of the highly structured nature of SQL databases, they are inappropriate for applications that primarily work with dynamic, semi-structured data. Examples of this are chat and email applications that support threaded conversations. These applications are naturally hierarchical, and the objects within them have a dynamic set of properties. NoSQL databases, introduced over a decade ago, support these highly dynamic applications. There are different types of NoSQL databases, but document databases support a semi-structured data model based on JSON documents. Three popular document databases today are MongoDB, Amazon DynamoDB, and Microsoft Cosmos.
A document database stores the data within a collection of hierarchical, semi-structured documents. You can think of this as a tree with each branch potentially containing different data properties. With a document database, the atomicity of a transaction is typically a single document, often using an eventually consistent transaction model. The approach works well for intermittently connected, distributed applications like chat.
One major advantage of document databases is that they are usually much easier to scale than an operational SQL database. Organizing data into collections of tree-structured documents provides a natural grouping that simplifies sharding across multiple servers, enabling document databases to support multi-tenant applications at enormous scale.
Document databases' dynamic, semi-structured nature also makes them easier to integrate into modern programming languages. There is no need for ORM mapping layers since in-memory objects can directly serialize into JSON documents.
The simplicity and scale advantages of document databases have ensured their ongoing popularity. However, transactional atomicity and consistency limitations have prevented their broader adoption in business applications. Today’s document databases are inappropriate for the widespread, debit-credit style business transactions that require high consistency across documents and collections.
Introducing FQL, the document-relational database language
Fauna is a battle-tested database service that delivers highly consistent, strictly serialized atomic transactions across documents and multiple collections of documents. Fauna has always been a great database, and its API delivery model means that developers never have to worry about provisioning, sharding, or replication. But until recently, the database language was not easy for developers to use. The introduction of the new FQL language changes that. FQL combines the advantages of the relational model and the simplicity of semi-structured documents.
The document data model is a good match for applications written using dynamic languages like Javascript or Python. FQL is a modern, Typescript-inspired database language for semi-structured documents that integrates naturally with dynamic languages. FQL provides optional schema support to ensure correctness and enable large, complex applications written in typesafe languages like Typescript or C#. Because it is relational, FQL supports queries with join operations across document collections.
With FQL, every request to the database is one strictly serialized transaction. Each transaction supports relational queries together with procedural logic. FQL is a turing-complete database language that supports complex, multi-step transactions, encapsulating the transaction logic within a single database request.
Because each transaction is entirely self-contained and independent, there is no need to maintain any session state. This allows the database to reside in a region that is a long distance from the application. A standard HTTPS connection provides the transport, and JWT-compatible, attribute-based access control tokens ensure security. These capabilities are all designed to work seamlessly with modern developer operations processes and products.
Fauna is a multi-availability zone database that uses a Calvin-derived, consensus-based transaction protocol to ensure business continuity. With Fauna, databases replicate across three availability zones that can be co-located within a single region, a geographic area (i.e., Europe or the United States), or distributed anywhere globally. Fauna’s intelligent routing layer delivers a single global database endpoint that ensures applications always route to the closest database region. Because of Fauna’s API delivery model, all of this comes with no operational overhead for developers.
Beyond SQL
The relational model has endured because it provides a general-purpose, mathematical database foundation. SQL is the standard for working with structured tables, which will continue. But it is time to break the relational model free from tables.
For the first time, Fauna and FQL bring the benefits of the relational model together with semi-structured document data. To make this possible, SQL is the one thing that Fauna had to leave behind. But once you start working with FQL, you won’t miss it.
You can find out more about Fauna here.
If you enjoyed our blog, and want to work on systems and challenges related to globally distributed systems, and serverless databases, Fauna is hiring
Subscribe to Fauna's newsletter
Get latest blog posts, development tips & tricks, and latest learning material delivered right to your inbox.