A relational database is exactly what its name suggests: a tool for storing various types of information that are related to each other in various ways. For example, a relational database for an online store might maintain customer data and maintain related information as well, such as their various addresses, wish lists, orders, etc.
Relational databases have been around a long time, have been used in a broad range of use cases, and remain a popular choice even today because they offer a mature, stable, proven technology that has become increasingly reliable and simple to use. You can expect to find them managing data for e-commerce applications, inventory management, payroll, customer data, and more.
In this article, we’ll explain how relational databases work, discuss their benefits, compare and contrast alternative systems, and describe some more popular systems available today that can handle relational database use cases.
How Relational Databases Work
In the early years of computing, each application typically stored its data in its own unique format. Storing data in so many different ways made it difficult to maintain systems and integrate them as needed. The relational database model arose as a way to standardize how developers could work consistently with data at a high level while leaving the details of the underlying operations and storage format(s) to the software.
Relational models, typically, organize data in tables consisting of rows and columns, much like spreadsheet programs. The following illustrates how a relational database might keep track of children at a summer camp. Each row contains the information for a specific camper, including a unique ID number along with the child’s name, age, and favorite food.
The unique ID’s purpose might not be immediately obvious, but that’s what makes it possible for other tables in the database to track any related data—things like dates of attendance, cabin assignments, activities, certificates, awards, and so forth—by including the camper’s ID number in the subordinate tables. The ability to join many different tables by such IDs is what makes relational databases… well, relational.
In contrast, Relational DataBase Management Systems
(RDBMS for short) are the software systems that manage relational databases. While a relational database itself is some collection of conceptual tables, which are maintained on storage media in various formats, the RDBMS is what makes it possible to interact with the data without knowing the details. For example, RDBMS make it possible to administer multiple relational databases on one or more servers and accept commands to create, read, update, or delete data (called “CRUD” operations) in tables.
Relational databases abstract operations through what’s known as Structured Query Language (SQL). SQL has been standardized across RDBMS systems to perform CRUD (and other) operations in a consistent way, so the same SQL statements can be executed by _any _RDBMS that supports the standard. To return to our example, elementary SQL statements could be executed to insert a new row for Tom, who is 13 and likes spaghetti or update Lisa’s favorite food to ice cream. SQL could also retrieve all the rows in the table, or a filtered and sorted subset of only the children between certain age ranges who share a favorite food.
That said, SQL isn’t a requirement for a system to be classified as being relational. Modern relational databases like Fauna offer all the properties of a relational database without offering SQL as the primary query language. New languages such as GraphQL can offer similar expressiveness, (or more), and sometimes fit better into a developer’s programming environment.
Benefits of Relational Databases
Relational databases bring many benefits to the table, making them a popular choice for databases.
From a development standpoint, relational databases present a more structured view of the world. The relational structure fits well with modeling real-world business entities (e.g. customers, orders, catalog, items etc.) and the business relationships that exist between them.
SQL statements themselves are often simple enough to be human-readable, and lower the barrier to entry for application developers. . Given its age, SQL has become a common language familiar to many developers.
That said, one of the more prominent benefits relational databases provide is “ACID
” guarantees, a major value for business-critical operations. ‘ACID’ stands for Atomicity, Consistency, Isolation, and Durability, all of which refer to important properties of a transaction,
a term used to describe a unit of database work. Relational databases thrived on workloads that require ACID, just as some non-relational systems do today. Let’s explain these one at a time.
Atomicity is that property of a transaction which ensures that it’s “all or nothing”. That is, the transaction either succeeds as a complete unit of work, performing its various CRUD (and other) operations, and is logged as a single commit to the relational database, or fails similarly. In case of failure, atomicity requires that any parts of the transaction already executed are rolled back completely, leaving the relational database exactly as it was before the transaction began.
It’s easy to see why this is important. Consider a user placing an order for an item at an e-commerce store. Buying a complex item might mean that while the user orders a “single” thing, this results in three different items being pulled from internal inventory. The first two might not be a problem, but if the third is out of stock the entire transaction must fail and be rolled back. The relational model supports this type of common business logic.
Consistency goes hand in hand with atomicity, insofar as it requires the data to always be in a consistent internal state when a transaction starts and when it ends. All of the changes to the underlying data for the tables and other structures must be complete and correct, so that each successful transaction “moves” the database data from one well-defined valid state to another with nothing left hanging or yet to be done.
This might sound much like atomicity, but it’s subtly different as demonstrated by many non-relational systems that instead feature partial or eventual consistency. Such systems often reply to commands faster because the actual details of data storage might not be complete when they respond with a success code. This can be a great benefit when creating high-performance web sites that scale to serve large numbers of users simultaneously.
A complete discussion of consistency
is beyond the scope of this article, but suffice it to say that different approaches can make or break a system, depending upon the use cases it serves. Many mission-critical business operations require consistency, so it’s important to consider all the details carefully.
Now we come to isolation
and durability, which are perhaps simpler to understand. When many different users are interacting with the same relational database at the same time, each attempting various transactions against potentially the same set of tables, it’s important that each user see only
his view of the data. Reading data from a table to which a different user is simultaneously trying to write must show the existing data and not the changes to avoid things “leaking” out of a transaction that may yet fail. Isolation
ensures that the “view” across the entire system remains consistent and unchanged by other operations while a given transaction is in process, as if it were running all by itself. For a deeper dive into isolation levels, and their implications, check out this series
of blogs by renowned database expert, Dr. Daniel Abadi.
Durability requires simply that any successfully committed transaction’s details are permanently recorded once completed. The underlying data representation changes must be applied in their entirety so that no loss occurs—even in case of caching, hard drive or other system failures. This typically includes immutably logging the details of what was changed, when, and by whom as well for future auditing needs.
Relational vs Non-relational Databases
Having discussed relational databases in some detail, it’s worth spending time on an important, previously-hinted alternative: non-relational databases (or sometimes called “NoSQL
” databases - because they do not use SQL as their query language). In contrast to the highly structured tables of RDBMS, NoSQL systems instead offer a variety of storage mechanisms optimized for particular use-cases and might not offer ACID guarantees.
There are four main categories of storage offered by non-relational databases: (1) document, (2) key-value pairs, (3) wide-column, and (4) graph. Let’s consider each of these in turn.
So-called “document” databases are arguably the most free-form non-relational offering, insofar as their documents store potentially hierarchical collections of objects with properties. To revisit our previous example, whereas a relational database would store data on a single camper across a series of related tables, a document database might store all the details for that camper in a single document. Said document could contain a “Personal Information” including the properties we saw earlier (e.g., name, age, and favorite food) while other objects, perhaps named “Attendance”, “Cabin Assignments”, “Activities”, etc. could maintain other properties.
As such, the document database format is well-suited to storing less structured, more free-form information. It’s a great starting choice for any new development project if you’re not sure what the data models are going to be. Non-relational databases store just about anything and allow schemas to evolve easily over time without enforcing much structure.
As their name suggests, ”key-value databases
” store key-value pairs or KVP for short. A KVP can be any simple information dyad; i.e., some item serves as a key that is associated with some other item as a value. While the document database version of our earlier example could store the information for a single child in a single document, this would be much
trickier in a key-value database. For example, you could store Lisa’s most basic information with the following KVP:
But you’d be shoe-horning a key-value NoSQL database into a use case for which it’s ill-suited at best. Such databases are intended for more simplistic purposes such as caching temporary data for web sessions, capturing simple telemetry points from devices in the field, etc. If you’re wondering why anyone would accept such a limited data model, the answer is its primary benefit: blistering-fast performance. Hashing and other algorithms allow simple KVP structures to be read and written far faster than many alternatives.
”Wide-column stores”, sometimes called “column databases”, sit in an interesting place between document databases and relational databases. They’re not document databases in that they don’t support indefinite hierarchies of objects with properties, but they’re not relational databases in that they don’t require all the rows in a table to have the same columns. And despite their name, such databases can often be row-oriented instead of column-oriented.
A key factor to understanding the wide-column data model is a keyspace, which is essentially a collection of different quasi-table structures typically called “column families”. Each column family can have any number of rows, but each row (1) must have a unique row key that sets it apart from all others, and (2) may have any number of columns of information, each typically containing a KVP and a time stamp indicating its last update.
For example, our summer camp database might have multiple keyspaces to store the same information as before, while the specific keyspace for personal information could have one row for each child with columns for name, age, favorite food, and so forth. The challenge for storing such data in a column-store is finding the right approach for partitioning the information into different keyspaces and choosing values that truly individuate the rows.
Finally, we come to ”graph databases”, which are most easily understood in terms of their nodes and relationships. Nodes are what store data values, often as little “mini-documents” ala document databases, or “bags” of KVP, etc. Relationships connect the nodes and, as such, are effectively first-class citizens in the graph data model. Whereas a relational database joins different types of data across tables that share unique IDs, a graph database relates one node to others through explicit relationships of various types.
It would be trivially simple to store the example data we’ve been discussing in a graph database, each camper’s personal information being a single node with any related information stored in other nodes connected by the appropriate relationships. Additionally, it would be simple to store relationships between children as well, tracking who is friends with whom, which campers don’t get along with which other campers, etc.
The very structure of graph databases makes them ideally suited for many uses cases the other types of databases don’t easily handle. Examples include organizational charts, social networks, network equipment and routing diagrams, road and other maps, driving directions, etc.
Relational Database Examples
There are many different relational database offerings available in the market today. Some require on-premise hardware, which means you’re responsible for all the provisioning, maintenance, and upkeep. Others offer cloud-hosting options but effectively give you remote databases to which you can connect but must still administer and manage yourself. Still others remove all the administrative burdens, the vendor handling all the “grunt work”, so that you can focus on creating your applications.
When architecting your database, it’s important not only to consider your options for the present but provide a roadmap for easy scaling later. The following table compares some common examples in light of the various factors we’ve discussed.
||Yes, but with weaker transaction isolation
||Yes, but with the usual PostgreSQL caveats
||Scales vertically, but horizontally limited by writing only to the primary node. No automated elasticity.
||Scales vertically, but same limitations with PostgreSQL. Automated elasticity limited by its “chunky” VM-level granularity.
||Scales vertically and horizontally with fully automated elasticity.
||Flexible Server pricing in Azure
||Relational and Non-relational
Spin Up a Relational Database In Minutes
In this article we’ve discussed how relational databases
and non-relational offerings can satisfy different use cases along with their strengths and weaknesses. To summarize, relational databases are best at enforcing relational integrity while offering ACID guarantees and easy programming through SQL. But all of that comes at a cost of limited flexibility and performance.
In contrast, non-relational systems offer far greater performance and often improved flexibility at the cost of some (or all) of the ACID guarantees so important to business-critical applications. They work best for a different set of use cases.
That’s why Fauna combines _both _the flexibility and performance of non-relational systems with relational querying and transactional guarantees without compromising simplicity and scale. Fauna is a true “Data API” or “serverless” cloud database. It’s easy to setup, offers its own powerful Fauna Query Language(FQL), and can support all of the workflow we’ve previously discussed.