What is a distributed database and when should you use one

When should you consider using a distributed database?

Distributed databases store and maintain data over many instances for the purposes of scale, locality and reliability. They were developed in reaction to the advent of the web and mobile age which vastly increased the usage, mobility and uptime requirements for modern applications. Distributed databases are used in various applications such as financial institutions, telecommunications, gaming, IoT and any organization that requires high availability, scalability, and reliability from their database systems.

Geographical distribution of data

With the ubiquity of laptops & mobile devices as well as multiple office locations, users can be geographically distributed and on the move. To keep applications operating quickly, local access to data is required. A globally distributed database with auto-routing can maintain transactional consistency while ensuring the lowest latencies through routing to the closest cloud region.

For instance think of a global bank (i.e JPMorgan Chase) that operates in different countries. A banking application can access data fast and efficiently from a local replica of a distributed database. The app also process transactions via the nearest copy, resulting in the lowest latency for the users while maintaining global consistency of the data.

In other words, account balance always remains consistent as data is synced across all replicas of the distributed database.

Modern e-commerce platforms like Amazon, Etsy, and Shopify also enhance user experience through faster data access. They use distributed databases. The data is always read from the closest geolocated replica ensuring fast and efficient application load time. A distributed database also helps better manage their inventory across different regions.

Scalability

Modern applications can reach thousands of users. Traditional single-server databases can only scale-up so high. Suppose you anticipate rapid growth in data volume or user load; in that case, distributed databases can scale out horizontally by adding more nodes to handle even the highest volume workloads. This approach is often more accessible and cost-effective than vertically scaling a single server.

For example, as e-commerce platforms grow, they can add more nodes to their distributed databases, allowing them to scale horizontally. As mentioned previously this is more cost-effective and flexible than upgrading a single server's capacity, especially when dealing with the unpredictable growth patterns typical of e-commerce.

High availability and data resilience

Data resilience and high availability are critical aspects of a distributed database, ensuring that applications remain operational even when parts of the database system fail, and data cannot be lost due to single server corruption.

Data is replicated and stored in a distributed database across multiple nodes or locations. This replication means that if one node experiences a hardware failure, network issue, or any other type of outage, the system can continue functioning by redirecting requests to other nodes with copies of the same data. This failover process is typically automatic and quick, minimizing or even eliminating downtime, data loss, and the impact on end-users.

In the financial industry, services like online banking, trading platforms, and payment processing must be operational 24/7. Any downtime can result in financial loss for both customers and the institution, but can also damage the institution's reputation.

Take, for example, a global bank that uses a distributed database for its online banking services. The bank's database is distributed across data centres in New York, London, and Tokyo—key financial hubs that handle millions of daily transactions.

If the New York data center were to go offline due to a power outage, the distributed database system would automatically reroute all traffic intended for New York to the London and Tokyo data centers without manual intervention. Customers should not notice the switch as their online banking services continue to function normally, allowing them to make transactions, check balances, and perform uninterrupted banking activities.

Routing traffic for lower latency

Another advantage of distributed databases is that they excel in optimizing performance by rerouting traffic, effectively reducing latency. By strategically distributing data across multiple nodes, these databases dynamically redirect requests, ensuring quicker access and minimizing delays, thereby enhancing overall system responsiveness and user experience.

Consider a popular multiplayer online game that operates through distributed servers across the globe — for instance, servers located in the United States, Europe, and Asia. These servers handle user authentication, game state synchronization, matchmaking, and in-game purchases.

If the European server cluster encounters a problem — such as a software bug, DDoS attack, or hardware failure — the distributed database system would automatically reroute European players to servers in adjacent regions, such as the ones in Asia or the United States, depending on which has the capacity and lowest latency for the best user experience. The database's replication ensures that players' profiles, in-game items, and progress are consistently available across all servers, so the players' game experience remains uninterrupted when the failover occurs.

Distributed database challenges

While distributed databases offer many benefits they are not without their challenges.

Routing: In a distributed database routing is one of the key challenges. To work effectively distributed databases have to ensure efficient data access without the user needing to be aware or mindful of their physical location.
Data consistency: Another challenge that distributed databases need to overcome is data consistency. Typically they need to choose either eventual consistency to not hurt latency but then risk accessing stale data or choose strongly consistent and have to suffer longer latencies.
ACID support & features: It’s much harder to maintain speed and scale when multiple changes need to occur in a single transaction.
Cost: Functionality gets limited to maintain speed & scale. Things like reading your own writes are difficult and thus removed so apps must then do a read operation, to gather the latest value, before a write operation in order to be accurate, but this drives up both the number of operations as well as adds overall transactional latency (2 trips to the DB instead of 1).
Provisioned: Most options require choosing cluster sizes and/or the amount of read/write operations that are allowed. This can lead to challenges with either cost (over provisioned) or user experience (throttled during spikes) or complexity (professional admins must understand and monitor the infrastructure)

Why choose Fauna as your distributed database

Choosing Fauna as your distributed database comes with a set of distinct advantages, especially when your application requires a flexible, serverless setup with strong consistency and easy scalability. Here are the ways in which Fauna resolves the above challenges and some reasons why you might choose Fauna for your distributed database needs:

Strong consistency: Fauna provides strict serializability and consistency (ACID) across replicas distributed over multiple geographic regions, the highest level of transactional consistency. Fauna does this with its proprietary distributed transactional engine which is based on the Calvin paper , a research paper on distributed computing from Yale University. This ensures that all transactions are processed in a way that guarantees safety across distributed systems, making it a reliable choice for critical business operations.
Serverless: With Fauna, you don't need to provision or manage any servers, which simplifies operations and can lead to cost savings, as you pay only for what you use. The auto-routing function eliminates the requirement for code detection and routing at the application level. Additionally, Fauna is exposed as a REST API, which tackles complexities associated with routing and connection pooling. These features can be especially advantageous for startups and businesses with variable workloads.
Document-relational: Fauna distinguishes itself in the realm of distributed databases with its unique document-relational model. This innovative approach effectively merges the best features of document-based databases—namely, their flexible, JSON-like document handling—with the robust querying and relational capabilities, like ACID compliance over multiple operations and documents, traditionally found in SQL databases. This fusion offers developers an unparalleled level of versatility, enabling them to easily manage complex data relationships and execute sophisticated queries. Such a blend is particularly beneficial for modern, scalable applications, providing a database solution that is both adaptable to changing requirements and robust enough to handle complex data structures. Opting for Fauna means tapping into a database that is as dynamic and multifaceted as the applications it supports.
Global scalability and Cost: Fauna is built for global distribution from the ground up. It's designed to scale effortlessly across multiple cloud regions, which means you can deploy applications globally without worrying about the complexity of managing data replication and consistency. Additionally, Fauna’s indexing and billing structure minimize the number of copies and operations needed to accomplish tasks to both speed up transactions as well as lower cost.

In summary, distributed databases are ideal for applications requiring high availability and data resilience, geographical data distribution, and scalability. They are well-suited in use cases involving high volume transactions, distributed edge computing, and geographically dispersed user bases, and they cater to various sectors like finance, retail, gaming, IoT, and B2B SaaS. They offer resilience against failures and adaptability to growing data and user demands.

Fauna stands out as a preferred distributed database due to its key features, including a serverless model, out-of-the-box global distribution, and a focus on developer experience and productivity. We achieve these attributes through our core innovations, which encompass a document-relational model, a distributed transaction engine based on the Calvin architecture, sophisticated routing algorithms, and a database-as-an-API model. This makes Fauna particularly suitable for businesses seeking a flexible, efficient, and robust data management solution capable of easily scaling and adapting to evolving application needs.

Related posts

Table of Contents