Survive Cloud Vendor Crashes with Netlify and FaunaDB
I recently posted about FaunaDB’s unified data model, and how we take advantage of our global ACID transactions as a foundation for users to interact with the data, using the correct query model for their applications. This week, Werner Vogels, the CTO of Amazon.com, wrote a post about Amazon’s choice to maintain a fleet of databases, each purpose-built for a particular workload. From the vendor’s standpoint, this may make sense as a way to segment your customers. However, for users, it introduces an abundance of unnecessary data movement between different databases - your application becomes a series of tubes. Set aside the question of whether you are better off with a handful of narrowly tailored databases (or a general purpose operational database). This post is about the choice between investing in a technology that binds you to a single cloud vendor versus running your critical services across multiple clouds.
If you want to be above the fray when major changes happen, you need to architect your business for vendor independence.
One of the obvious problems with keeping enterprise data in cloud services is vendor lock-in. Once you are hooked on a cloud provider, you’re at the mercy of their business decisions. Whether or not a particular decision impacts you is less important than the fact that things are out of your control. It should be expected that businesses will make choices we can’t predict. The weakness here is part of the cloud architecture. If you want to be above the fray when major changes happen, you need to architect your business for vendor independence.
Let’s look at a few scenarios for cloud vendor independence:
- Fully independent: Every service is deployed across multiple cloud vendors with robust failure handling. Even in the case of a complete vendor outage, your application goes on operating.
- Tightly coupled: Your app is tightly coupled to multiple clouds, with a service mix that uses best-of-breed specialty services from many vendors.
- Mission critical multi-cloud: Critical services are run across multiple cloud vendors. In this case, some application features might suffer downtime during a vendor outage, but critical services continue running.
- One cloud: Your app is tightly coupled to a single cloud vendor, using vendor specific services for critical operations.
Viewed purely through the lens of vendor risk management, being fully independent is the safest course of action. However, this option has costs derived from having to work with virtual machines instead of higher-level services. For instance, instead of using a queuing service, you are likely to end up managing a cluster of virtual machines on each cloud, running a queue. This operational complexity will be multiplied by every service your application utilizes.
All of this does not even scratch the surface of the complexity introduced by building intelligent load balancing and failover into your architecture. To become fully independent of cloud vendors, your system must not only run on multiple vendors; it must run across them all. In the end, you want a single unified system that can route around vendor failures, not multiple siloed copies of the system. Orchestrating a correct and robust multi-cloud service is an engineering challenge. Doing it for every service in your mix is a serious undertaking. The number of enterprises that complete this heavy lifting is small because, for most businesses, the risk/reward tradeoff doesn’t make sense. However, architects should think about this scenario so they can understand the delta between where they are now, and where they could be for maximum independence.
Tightly Coupled Services
Tightly coupled services represents another extreme. Using best-of-breed products across available providers might offer the fastest time-to-market, but also the greatest risk. Rather than depending on specialized services offered within one cloud, you are reaching across vendors in a way that multiplies rather than reduces your risk. If any one of your cloud providers suffers downtime, your business could go down.
This wild-cloud architecture might be the right choice for startups and small experiments where the biggest risk is not getting to market at all. However, once your product has traction and starts to scale, you’ll end up undoing these architectural choices as your attitude towards risk shifts. Businesses that scale up without addressing vendor risk will find that when they start to care about costs, their negotiating leverage is limited. For the best deal, you want more than one supplier.
Mission Critical Multi-Cloud
Once you have identified your critical services, you can focus on making them robustly multi-cloud.
Mission critical multi-cloud is a reachable goal with many of the benefits of the most risk-averse option, but also allows the flexibility to utilize cutting edge cloud services. The key is differentiating between mission critical services and everything else. For instance, if your ad-targeting engine fails your app can continue to function with un-targeted ads. If user uploaded video is not being encoded due to a service outage, that is not as bad as the user being unable to see their profile or existing videos.
Once you have identified your critical services, you can focus on making them robustly multi-cloud. Recognizing and isolating the risk also gives you more room to utilize cutting edge specialty cloud services outside of your critical path. If your database and web-servers run on a seamless multi-cloud mix, the risk of utilizing a vendor specific photo classifier is lower, while the ease of integration goes up. If your critical services transparently abstract a mix of vendors, you can plug-in specialty services like photo or voice recognition without incurring additional transit fees. The risk profile of a vendor outage is lower in this environment because it won’t impact mission critical services.
By architecting your application to isolate vendor risk, in the end you can be more confident exploring cutting-edge vendor capabilities because you know any downside impact will be contained to non-critical services.
By architecting your application to isolate vendor risk, in the end you can be more confident exploring cutting-edge vendor capabilities because you know any downside impact will be contained to non-critical services. If your application’s image recognition feature has downtime, that’s entirely different than if your identity and authorization capability is broken. Don’t let shiny new cloud features lull you into storing your data with a single vendor.
Stuck in One Cloud
The last option is to stick to a particular vendor’s cloud and cross your fingers that none of the surprises are bad. For new teams, this can be seductive, but hopefully mere awareness of the option to run your critical services like databases and web front ends in a robust multi-cloud architecture gives you a new sense of the hidden costs of playing in one cloud.
Robust multi-cloud mission critical services
The two most common mission critical services are databases and web servers. Luckily, there are emerging alternatives that take the elbow grease out of operating multi-cloud architectures.
A multi-cloud architecture allows even more options about where to place servers for maximum end-user performance.
Perhaps the most mission-critical component of any technology business is its database. To avoid cloud-specific databases like AWS DynamoDB or Google Firebase, you’ll either need to run something on bare virtual machine instances, or look for a database with a strong multi-cloud offering. Fauna Cloud runs on AWS and GCP, and with Azure nodes on the roadmap, and FaunaDB runs anywhere you can run a JVM. Keep in mind that simply deploying to the datacenters is only a start. You also need correctness and operational simplicity, or your move to multi-cloud has set your application back in other ways.
FaunaDB’s multi-region ACID transactions mean queries hitting different clouds will always see consistent results, so your data is safe and correct even in the case of a cloud vendor outage. Correctness at global scale makes FaunaDB a good fit for distributed ledger applications which must maintain transactionality over multiple sites. Multi-region ACID transactions also make it easy to control the locality of your data, so users can query the closest datacenter leading to snappier, more responsive applications.
Operational simplicity means one node type (running anywhere you can run a JVM) that stays fully online throughout topology changes. Single-step administrator commands include safety checks to prevent operations that could cause data loss. For example, if you try to remove a node that would create data loss, the admin tools will return an error instead of erasing data.
Your uptime depends on your web servers. If you’re running multi-cloud, then one provider’s downtime won’t impact your users. A multi-cloud architecture allows even more options about where to place web servers for maximum end-user performance. This way, best practices such as running a stateless web-tier deployed to multiple clouds with a content delivery network (CDN) are not out of reach.
Netlify is a web hosting provider that runs across multiple clouds with a high-performance CDN, and simplifies all aspects of deploying your sites. In addition to web files, Netlify supports continuous deployment from git, serverless functions-as-a-service, dynamic forms, and identity authentication. They are rapidly becoming a standard for simple sites, and are adding features to support complex web application hosting as well.
Netlify shares their story of moving to a multi-cloud architecture, and the benefits they’ve seen in this blog post:
As a result of the migration, we can now swap between different cloud providers without any user impact. This includes the databases, web servers, API servers, and object replication. We can easily move the entire brains of our service between Google, Amazon, and Rackspace in around 10 minutes with no service interruptions.
As a Netlify customer, if any one of the cloud providers goes down, Netlify’s origin servers and content delivery network will continue to serve traffic and users will be happy.
Using the latest cloud innovations is valuable, but do you want to put all your eggs in one basket? While relying on a single vendor’s cloud exposes your business to unnecessary risk, running every service in triplicate isn’t practical either. A good compromise is to identify your most critical data and services and migrate them to a multi-cloud approach first. The good news is that there are a handful of products like FaunaDB and Netlify designed for multi-cloud, and combining them with best of breed cloud vendor services is a good solution. This architecture combines the assurance that comes from multi-cloud critical services, with the freedom to explore specialized vendor offerings you get by having your data already stored in multi vendor clouds.
With a transactionally multi-cloud database like FaunaDB, your data is available with no impact on uptime even if a replica crashes. See a this blog post for a demo of FaunaDB running smoothly despite replica failures. More interested in theory? Read about how we achieve distributed ACID transactions with our consensus algorithm.
If you enjoyed this topic and want to work on systems and challenges just like this, Fauna is hiring!