Let’s begin with the obvious question: what is a ‘cloud database’? Databases aren’t new, having been a go-to technology for decades, but cloud databases are a recent development. The answer is initially quite simple: while traditional database software manages information via on-premise hardware, the defining attribute of a ‘cloud database’ is that it is cloud hosted. As with other cloud technologies, this means most of the unpleasant burdens of installing, maintaining, backing up, planning/testing disaster recovery, etc. all become somebody else’s problem.
In the simplest case, a cloud database is managed by a provider who assumes responsibility for the hardware, software, maintenance, support staff, etc. Users can simply connect and work with it. A wide range of such services are available today which can be classified by data model (relational, document, etc.) and delivery model (hosted, DBaaS or Data API/serverless
The sections that follow examine the resulting decision space, evaluate the benefits and types of cloud databases, discuss migration considerations, and describe some popular cloud database systems.
What are the benefits of using a cloud database?
Cloud databases are increasingly popular because they offer clear advantages. Beyond avoiding so much of the usual ”dirty work”, cloud database advantages include variable pricing with typically lower costs, superior agility, better/easier scalability (both up and down), higher overall reliability, and even better performance.
Cloud database pricing depends in part on the type of system, remotely hosted servers typically proving more expensive than database-as-a-service (DBaaS) offerings. Providers offer a wide range of hosting and pricing options
to accommodate varying requirements for total storage, input/output operations per second (IOPS and similar metrics), data reliability, high-availability, etc.
The primary cloud database advantage is pay-as-you-go or pay-per-use pricing, meaning you pay only for what you use rather than incurring larger fixed costs for each new server. Cloud databases thus make it possible to increase capacity as demand rises and reduce capacity as demand falls. This “dial-a-load” flexibility offers savings in terms of time, money, and frustration.
Agility and Scalability
Dynamically adjusting capacity to meet demand also allows greater business agility than ever before. Software developers and database administrators (DBAs) no longer wait days, weeks, or even months before new hardware is ready because cloud databases can be provisioned in minutes and discarded just as quickly/easily when no longer needed.
With cloud databases, your pace of innovation is no longer limited by your on-premise resources. An account with a cloud database provider is all you need to stand up whatever infrastructure is required. And of course, developers and DBAs are only one piece of the software pipeline: quality assurance, operations, compliance, and every other stage of the DevOps pipeline benefits similarly.
When it comes to databases, ’reliability’ is a loaded term. It encompasses not only simple (or complex) data backup but also duplication of data for safety and/or geographic distribution, de-duplication of data to preserve vital storage space, disaster recovery plans, high-availability concerns, and cold storage for eventual archival.
Using mere hosted servers/databases means you are on the hook to meet all such demands to avoid potentially-company-ending failures. In contrast, cloud database services offer multiple tiers of service, allowing you to match your risk tolerance against your budget. The best type of work is the work you don’t have to do, so reliability is another clear advantage for cloud databases.
Performance and Availability
Cloud providers’ raison d’être, their reason for being, is to provide remotely hosted servers and services. Their core competency requires keeping up on the details necessary to achieve and maintain top performance. Unless your organization has a similar core competency, you’ll likely enjoy better performance leaving things to the experts.
Otherwise, you’ll need to maintain the hardware, keep it cool, intelligently patch operating systems and software, handle drive and network failures, power failures, Internet service outages… In short, a very long list of things can fail and redundancies, backups, and staff on site for every possible contingency is costly.
What are the different types of cloud databases?
Cloud databases can be classified based on their data model as well as their delivery model. In terms of data models, cloud databases of different types can accommodate, with varying success, all the usual database workflows including online transaction processing (OLTP) for daily operations, extract-transform-load (ETL) operations to build alternative views for reporting or replicate information, data warehousing, etc.
But these days, many businesses also expect “polyglot persistence
”, referring to the need to bring together a variety of different data models rather than forcing everything into rows and columns. It’s important to match your needs for relational tables, documents (usually in JSON), key-value pairs (KVP), or even acyclic graph data (AGD often used for social networking and other applications) to providers’ offerings to find the best fit.
When it comes to delivery models, cloud databases sort into three buckets: hosted databases, database as a service, and a brand new approach called serverless database or “Data API”.
The simplest deployment model is very similar to on-premise systems, the only difference being the database software runs remotely. Cloud providers offer a range of hosted databases from dedicated hardware to which you simply connect, virtual machines (VMs) with varying resources (potentially shared with other clients), and even container technologies for a more lightweight “footprint” than a VM.
The down side of this model, however, is that it leaves you stuck with much (if not all) of the administrative burdens. Such providers guarantee the hardware, VMs, and containers, but it generally falls to your staff to deal with everything else, often including updates to the operating systems, container applications, etc.
Database As A Service (DBaaS)
A more convenient model is “database as a service” or DBaaS
. With this approach, the provider shoulders far more of the burdens, typically including provisioning, backup, scaling, monitoring, and other management services. Users don’t have to fiddle with infrastructure concerns and can simply connect and use the systems they need, again saving time, money, and frustration.
It is important to mention, however, that DBaaS offerings still typically involve some administrative burdens. Consumers must often choose between tiers of performance or storage, tweak or tune scalability parameters as needed, etc.
Data API or Serverless
Thanks to modern Internet protocols/APIs and advances in software design, data clients are increasingly web browsers and mobile devices. This change has driven the evolution of database systems toward the notion of “Data APIs” or fully serverless systems
, increasing the need for the aforementioned polyglot persistence along the way.
With this new approach, clients
(browser, serverless functions, microservices, mobile apps etc.) aggregate and leverage functionality from any number of different cloud providers. A typical application might request data from one or more databases, leverage payment systems, push to multiple social networks—all via shared standards rather than proprietary client-server protocols.
This provides the maximum flexibility for applications and lifts the last remaining administrative burdens. Consumers can simply expect their API requests to be serviced, no matter the workload or data model, and pay only for what they actually use.
Things to Consider
To be fair, cloud databases offer significant advantages, but they aren’t a panacea. The following four factors in particular should be examined carefully before taking the plunge.
The industry is clearly trending toward serverless architecture. But different cloud models trade control for convenience, and sometimes you just can’t give up the former.
For example, if you’re in a heavily regulated industry or face other strong compliance challenges, you might be hard-pressed to move certain systems to some cloud models or to the cloud at all. As a general rule, hosted databases offer the most control but less convenience compared to DBaaS, which itself offers more control but less convenience than serverless.
The need for control usually means greater needs for your own staffing while simultaneously diminishing business agility and dynamic scalability. As such, striking a balance between internal and external resources can be both tricky and critical for success.
The underlying details of the database technology involved can be equally important. SQL (or relational) databases were designed to offer strong guarantees for data integrity in multi-user operations but at the cost of hardware resources and performance. NoSQL databases
were originally designed for flexibility, scalability and performance but at the cost of data integrity and lag for consistency across all nodes—though new systems like Fauna offer a document-centric model combined with the transactional properties of a relational database
As a result, not every type of cloud database will work for every type of application. Systems that require robust transactions, which reliably update multiple data stores or rollback completely in case of failure, need databases that support relational characteristics. Similarly, systems that must perform reliably in real-time but also present a flexible schema for iterative development work well with NoSQL while being unable to tolerate the variations in SQL performance.
With identity theft and data breaches seemingly in the news every day, security is a high priority in any migration plan. Data encryption (at rest, in memory, and in transit) is important, particularly in regulated industries, as is authentication and authorization.
But cloud databases must offer more. It’s one thing when the systems are behind your own firewall, but it’s another when they’re out on the public Internet available to the entire world. Look for cloud vendors who can offer attack detection, isolation for mission-critical data/systems, audit logs and other records to detect and trace the magnitude of data breaches, etc.
As such, a multi-step approach is advised, working from least to most important. Begin with your least mission-critical applications (or services) and rebuild those database schemas, moving those to the cloud. As the difficulty of moving data and operations increases, weigh the costs and benefits, engaging only with vendors who meet your data-security standards along the way. You might not be able to move everything to the cloud, or at least not today, but good planning will help you maximize your benefits while minimizing your risks.
Examples of Cloud Databases
PostgreSQL is a traditional, relational SQL database that may be cloud hosted. It provides the robust relational integrity some applications require but is an older technology, is non-distributed, inflexible, and less scalable. Amazon RDS
is one popular provider of such a service as is Heroku
In contrast, MongoDB and DynamoDB are both newer NoSQL databases that offer greater flexibility and scalability at the cost of traditional relational integrity, though it bears mention MongoDB offers some, limited transactional guarantees. Examples of popular providers for NoSQL include MongoDB Atlas
and Amazon DynamoDB
is a true data API or serverless cloud database that better supports the OLTP workflows and use cases mentioned previously. It offers the best of cloud database models
, combining the flexibility and scalability of NoSQL systems with the transactional guarantees of SQL systems for proper relational data integrity.
Fauna’s easy setup with out-of-the-box support for GraphQL, low multi-region latency, and transparent scaling (it’s an API!) make it ideal for modern applications with relational or document-centric workloads, enabling developers to ship code more quickly without wasting precious time managing infrastructure.
Fauna’s globally distributed backbone also keeps applications performant by ensuring that data is replicated across regions, available close to users, while also increasing availability, making Fauna an obvious go-to choice for almost any application.