Can MongoDB Really Deliver ACID?
MongoDB has become a very popular database. To be quite honest, I hadn’t noticed MongoDB’s rise in popularity over the past few years. This is largely due to the fact that I had been on the transactional side of data management and MongoDB was clearly on the analytical/reporting side. MongoDB has historically been a NoSQL document store that favored availability over consistency on the CAP spectrum and unable to do transactions. I had thought their inability to do transactions was a good thing since there have been reports of MongoDB losing data and that they were recently the target of a massive ransomware attack. Despite that, they have found a way to attract many users over the last few years and seem to strongly appeal to some segments of the developer community.
Furthering my interests in transactional systems, I recently joined Fauna. Fauna offers what might be best described as a “NoSQL v2.0” database. FaunaDB was created by some very smart people out of Twitter (not that I’m biased), who set out to design the type of database they wished they had in order to properly handle the massive amounts of mission critical data at Twitter. They wanted to create an operational, fully transactional database that could scale and perform like NoSQL systems, with the reliability and data correctness of traditional SQL/relational systems. Some of the core design principles for FaunaDB included distributed ACID transactions (with the strictest level of isolation), high security, multi-tenancy, multi-model interface (relational, graph, etc.), horizontal scalability, high availability, temporality, and operational simplicity.
Fauna was designed from the ground up to be able to do distributed transactions, while MongoDB added that capability to their “legacy” database.
As I began reading more about the current offerings of NoSQL solution providers, I was surprised to hear that MongoDB was planning to announce the ability to handle ACID transactions in their newest release. Fauna was designed from the ground up to be able to do distributed transactions while MongoDB added that capability to their “legacy” database. That really piqued my interest.
Conveniently, soon after I started at Fauna, MongoDB hosted their big user event, MongoDB World, in New York City. I was able to attend the keynote and a few sessions that described some of the details of MongoDB’s new transactional capability. During his keynote session, Eliot Horowitz, MongoDB’s CTO and Cofounder, formally announced that MongoDB v4.0 supports multi-document ACID transactions.
“That’s the first major point about transactions in MongoDB...they are incredibly familiar...they are just like transactions in any traditional relational database you’ve ever used. They’re not some weird variant on transactions, they’re not some cobbled-together thing to make it sound like we have transactions”, Eliot said.
There are different levels of ACID, particularly in isolation, so saying you are ACID is actually just the starting point.
So far so good. It’s important to note that the term “transactions” has certain implications for developers and database vendors. When you say “transactions”, you imply conforming to the ACID principles (Atomicity, Consistency, Isolation, and Durability). There are different levels of ACID, particularly in isolation, so saying you are ACID is actually just the starting point.
During this announcement, MongoDB executives did acknowledge that there are a few restrictions for the first implementation of transactions. In V4.0, MongoDB’s transactions will be limited to single-shards, though they announced that V4.2 will include the ability to do transactions across multi-shard implementations. Later, while exploring MongoDB’s documentation, I found another restriction: while MongoDB has supported multiple storage engines for data in the past, only WiredTiger (one of MongoDB’s storage engines) could be used if transactions were desired.
Transactions, true transactions (not the odd variants), are very difficult to do correctly.
When I think about transactions, I think of it in the context of distributed systems. Transactions, true transactions (not the odd variants), are very difficult to do correctly. Databases are constantly being barraged by reads and writes from many users. When running on a single machine, managing all activities within all the transactions that are running at the same time is difficult; doing it on a distributed database running across a large number of nodes becomes a real tricky proposition. It seems like MongoDB has come to realize that, which is why they are being careful about limiting the initial scope for transactions to single shards only.
MongoDB bought a little headstart in their efforts for implementing transactions. They inherited some of their underlying, enabling technology for performing transactions through their acquisition of WiredTiger and by adding WiredTiger as their most recent storage engine. It makes sense that the only way to support transactions in MongoDB was to force the use of WiredTiger as the storage engine.
“My general belief is that most users, above 50%, are going to use transactions, but for a very small percentage of operations, less than 1% of applications.”
If the CTO believes that MongoDB’s transactional capability will be used so little, why go through all the trouble of adding transactions?
That is very interesting to me. If the CTO believes that MongoDB’s transactional capability will be used so little, why go through all the trouble of adding transactions?
During a breakout session I attended, I heard some rather puzzling things having to do with how MongoDB uses clocks and consensus protocols in their new transactionality. I will need to confirm it all through MongoDB’s documentation but what I heard those senior level engineering leaders say confirmed my belief that adding transactions into a reporting/analytics platform is extremely difficult and it may drive you to do some bizarre things within your product’s core.
I’ll get into some of the details of those in my next blog post. In the meantime, for those of you using MongoDB, I would love to hear from you.
- What do you like about MongoDB?
- What don’t you like about it?
- Are you thinking of using MongoDB’s new transactional capability? Why or why not?
- Were there any other announcements out of MongoDB World 2018 that were of particular interest to you?
If you are looking for a database with the high performance and scalability of NoSQL but you aren’t ready to (or just can’t) give up data accuracy/consistency and system reliability, make sure you check out Fauna. Our database was built from the ground up for mission critical transactional applications. We know how difficult transactions are to do correctly, which is why we started with transactions -- we didn’t try to bolt them onto an older system. FaunaDB offers strict ACID compliance and high performance and global scalability -- all on a highly reliable platform with powerful multi-model query capabilities, easy programming and simple operations. And FaunaDB Cloud is perfect for use with modern serverless applications.
If you enjoyed this topic and want to work on systems and challenges just like this, Fauna is hiring!