CAP, ACID, What Can We Do?
Starting from the CAP theorem and ACID properties, this article discusses the trade-offs among consistency, availability, and transaction design in distributed…
This article uses the CAP theorem and ACID properties as a starting point to discuss the design of distributed (storage) systems.
When designing distributed systems, especially distributed storage systems, the first issue to consider is CAP: how to make trade-offs between C (Consistency) and A (Availability). While thinking about this, I found that although Consistency has many classifications, such as Linearizability, Sequential Consistency, and Causal Consistency, Availability does not have a corresponding classification. This runs counter to how we understand the CAP theorem: after reducing Consistency, what level of Availability can we obtain? Going one step further, after reducing Consistency, can we really improve Availability?
CAP and ACID
The main reasons we need to scale a single-machine system into a distributed system are:
Performance requirements exceed the limits that a single-machine system can provide
Availability requirements exceed the limits that a single-machine system can provide
For the former, we can use sharding to distribute the load across multiple single-machine systems. For the latter, we must consider the constraints of the CAP theorem in the design.
CAP
The first relatively formal statement and proof of the CAP theorem was published in paper [28]. The CAP theorem is discussed for a Single Data Object with multiple Replications, and its model is similar to Distributed Shared Memory. In CAP, P stands for Network Partition Tolerance. Because typical network conditions today generally cannot be considered reliable, systems built on such networks must assume that Partitions can occur. Therefore, in the CAP theorem, we cannot give up P and choose C and A; see [55] for a detailed discussion. C in the CAP theorem refers to Linearizability Consistency. Its specific meaning is described below in the section on Linearizability; for now, we can regard it as a very strong consistency guarantee. A in the CAP theorem refers to 100% Read & Write Availability. Informally, this means that a (read or write) request sent to any node at any time can receive a “correct” response without waiting for the result of communication with other nodes in the system.
Because Partitions do not occur frequently [11], C and A can both be achieved under such circumstances. Interestingly, Google’s continuous improvements to its network infrastructure have made the probability of problems caused by network communication lower than the probability of problems caused by bugs in its network environment. At that point, the network can even be considered reliable [17]. However, when a Partition occurs, we must make a trade-off between Linearizability Consistency and 100% Availability.
This leads to a key question: how do we determine whether a Network Partition is occurring? In practice, we usually use a message timeout mechanism: if communication between two nodes times out, even after some retries, we consider a Network Partition to be occurring. With this method, we find that Latency and Availability are the same issue. When Latency is low, nodes can communicate normally and provide Linearizability Consistency. When Latency is high, we consider a Network Partition to have occurred. We can either wait until Latency drops enough for communication to complete within the Threshold, meaning the Network Partition is resolved, thereby giving up 100% Availability while guaranteeing Linearizability Consistency; or we can give up communication between nodes and respond directly, thereby giving up the guarantee of Linearizability Consistency while providing 100% Availability. An increasing number of systems do not provide Linearizability Consistency even when P has not occurred, in order to provide stronger performance guarantees. In other words, when P occurs, we trade off between C and A; when P does not occur, we trade off between C and L (Latency). In this way, the CAP theorem is extended into the PACELC theorem (If there is a Partition, how does the system trade off Availability and Consistency; Else, when the system is running normally in the absence of partitions, how does the system trade off Latency and Consistency?), see [1].
Considering this further, the fundamental reason we need to trade off between C and A, or between C and L, is that nodes need to perform synchronous communication to guarantee C. In the extreme case, we can make the entire system avoid synchronous communication in order to achieve extreme A and L. Now consider a write request. After it is written to one node and a success acknowledgment is received, if that node suffers a permanent failure before performing asynchronous communication with other nodes, the content written by this request will be permanently lost. This shows that Consistency and Durability are similar to some extent [43].
From a broader perspective, in an unreliable distributed system, we need to make trade-offs between Safety properties and Liveness properties [29]. For example, Consistency is a Safety property, while Availability is a Liveness property. As another example, the FLP result [27] tells us that in the Consensus problem, if any node can be unreliable, it is impossible to guarantee both Safety and Liveness. Important Impossibility results in distributed systems include [15, 27, 28, 46].
The Paxos algorithm can tolerate failures of nodes in a minority set of the system. Intuitively, we think Paxos provides highly available service at the system level while also providing Linearizability Consistency. This seems to contradict the CAP theorem. Consider CAP’s definition of Availability, which requires that requests to any node receive an immediate (real-time) response. Suppose a network partition divides the system into a majority set and a minority set. For Paxos, although nodes in the majority set can still reply to requests correctly and immediately, nodes in the minority set cannot. CAP defines Availability this way for a reason, because when a network partition occurs, a client may not be able to access nodes in the majority set.
[37] proposes some criticisms and improvements of the CAP theorem. Consider the fact that Availability is an observed result (Metric) of a service rather than a property of a system, while Consistency and Partition are system models. These two cannot be unified, and CAP’s definition of Availability is not rigorous. In [16], Brewer (informally) proposes that Availability and Consistency in the CAP theorem are not binary discrete variables, but variables that vary continuously from 0% to 100%. This contradicts the formal description of CAP in [28]. This means we need to rethink the precise definition (formal description) of the CAP theorem. In [37], A in CAP is defined as an algorithm’s sensitivity to Latency, C is defined as the concurrent consistency model used by the algorithm, and P is defined as a sudden increase in Latency. In this way, A is no longer an observed result of a service, but an intrinsic property of an algorithm; P’s definition can also be integrated with A. Under this framework, eventual consistency models can also be modeled and reasoned about well, ultimately leading to the conclusion that replica algorithms with eventual consistency can still terminate when Partition occurs permanently. This matches our intuition that using eventual consistency protocols can improve Availability. The paper further summarizes the lower bounds (not sure whether they are infimums) of the time required to achieve three consistency models. Assuming the message propagation time is \(d\), see the table below:
| Consistency Level | Write latency | Read latency |
|---|---|---|
Linearizability | \(\mathcal{O}(d)\) | \(\mathcal{O}(d)\) |
Sequential Consistency | \(\mathcal{O}(d)\) | \(\mathcal{O}(1)\) |
Causal Consistency | \(\mathcal{O}(1)\) | \(\mathcal{O}(1)\) |
The read and write latencies of Sequential Consistency can be swapped.
ACID
The ACID properties are the properties that must be guaranteed when multiple transactions execute concurrently [33]:
Transaction Atomicity: the multiple events that make up a transaction either all succeed or all fail (all-or-nothing)
Database Consistency: before and after executing a transaction, the state of the database remains consistent at the application level
Isolation: concurrently executing transactions do not affect each other
Durability: events in committed transactions are not lost
Atomicity in ACID has a completely different meaning from Atomicity in the Concurrent Programming field, so to distinguish this, I call the A in ACID Transaction Atomicity. Similarly, Consistency in ACID is also completely different from Consistency in the Concurrent Programming field, but Isolation in ACID is somewhat related to Consistency in the Concurrent Programming field. I call Consistency in ACID Database Consistency here because A, I, and D in ACID are lower-level properties, while C is an upper-level application property; see Chapter 7 of [38] for details.
Differences Between the Two
The CAP theorem focuses more on object consistency and availability issues in a distributed shared-memory model. It is a problem from the traditional Concurrent Programming field under unreliable systems. Traditional Consistency classifications are summarized in [51]. Several common consistency models, from weak to strong, are:
Read-your-writes/Monotonic Reads/Monotonic Writes
PRAM (that is, FIFO, equivalent to the sum of the three above)
Causal (equivalent to PRAM plus Writes-follow-reads)
Sequential
Linearizability
The ACID properties concern the concurrent execution of multiple transactions in database systems, which is a traditional problem in the database field. Isolation in the ACID properties has the following common levels from weak to strong [12]:
Read Uncommitted
Read Committed
Cursor Stability
Repeatable Read
Snapshot Isolation
Serializable
Linearizability and Serializability
Linearizability is the ultimate goal of Consistency in Concurrent Programming, and Serializability is the ultimate goal of Isolation in database transactions. Both play important roles in distributed storage systems.
Linearizability
Linearizability is a constraint on the execution order of concurrent operations on a single object with multiple replicas. It can also be regarded as a constraint on the return values of individual operations given a concurrent operation history.
The original paper on Linearizability is [35], but personally I think Chapter 16 of [45] is easier to understand. In addition, Chapter 9 of [38] also gives an informal explanation of Linearizability.
The formal definition of Linearizability is as follows [35]:
A history \(H\) induces an irreflexive partial order \(<_{H}\) on operations:
A history \(H\) is linearizable if it can be extended (by appending zero or more response events) to some history \(H'\) such that:
\(complete(H')\) is equivalent to some legal sequential history \(S\)
\(<_{H} \subseteq <_{S}\)
Intuitively, the result of Linearizability execution is equivalent to executing these events (Event, understood here as read or write operations) one by one, non-concurrently, in real-time order.
Serializability
Serializability is a constraint on the concurrent execution of database transactions. A database transaction consists of multiple operations (or just one) involving multiple objects (or just one object). Serializability requires that the result of executing these transactions be equivalent to the result of executing these transactions one by one, non-concurrently. It is worth noting that Serializability does not constrain the execution order of these transactions.
For a detailed description of Serializability, see Chapter 2 of [13].
Differences Between the Two
Linearizability is mainly applied in a distributed shared-memory model, constraining what values are legal for individual operations on a single object to return. Linearizability is a constraint on Recency, requiring the execution order of concurrent operations in a history to reflect their Real-Time order. Here, Real-Time does not refer to Real-Time in the field of real-time systems, but to the concept of actual time with respect to a global clock.
Serializability is the Isolation requirement for concurrent execution of database transactions. It constrains the result of executing concurrent transactions to be equivalent to the result of executing these transactions “one after another,” but it does not constrain the execution order of these concurrent transactions. Database transactions usually involve multiple operations on multiple objects.
The combination of Serializability and Linearizability is called Strict Serializability or One-copy Serializability.
For a comparison of the two, see also [39].
What Can We Do?
After clarifying the CAP theorem and ACID, let us consider what can be achieved in distributed system design and how to make trade-offs:
Under the strong consistency constraints of traditional databases, how high can availability be, and how low can latency be?
Under the constraint of 100% Read Write Availability, how high can consistency be?
What exists between these two extremes?
For these questions, we always start with the simplest model: a replicated Key-Value Store. Then we consider how to add distributed transactions. Of course, for a distributed database, we should at least also have Secondary Indexes, Data Constraint Checks, and other features, but this article will not expand on them further.
Distributed Systems Under Strong Consistency Constraints
Our goal is to implement a Key-Value Store with Linearizability Consistency while also supporting distributed transactions at the Serializability Isolation Level.
How to Implement Linearizability Consistency
Our goal is to implement a Key-Value Store with Linearizability Consistency. Because Linearizability has the Local Property, we can further simplify the problem. First, we consider how to implement an Atomic Read/Write Register. More specifically, it should be an MRMW Atomic Register (Multi-Reader Multi-Writer Atomic Register).
Page 333 of [38] summarizes whether various Replication methods can implement Linearizability:
Single-leader replication (potentially linearizable)
Consensus algorithms (linearizable)
Multi-leader replication (not linearizable)
Leaderless replication (probably not linearizable)
Section 4 of Chapter 16 of [45] also summarizes how to implement Linearizability:
Atomicity Based on a Total Order Broadcast Abstraction
Atomicity of Read/Write Objects Based on Server Processes
Atomicity Based on a Server Process and Copy Invalidation
Atomicity Based on a Server Process and Copy Update
Overall, there are several approaches:
Total Order Broadcast
Quorum read/write
Write-invalidate & Read-through
Write-through
Total Order Broadcast guarantees that each replication receives totally ordered messages in the same order. Obviously, this matches the definition of Linearizability Consistency. Using a Consensus algorithm, such as Raft or ZAB, is equivalent to first ordering messages on the Leader node and then propagating both the messages and the order itself to all replications; in effect, it is equivalent to Total Order Broadcast. For the classification and implementation methods of Total Order Broadcast, see [23].
To implement Linearizability using a Quorum approach, a Repair must be performed before every Read/Write operation; see page 334 of [38], Chapter 4 of [20], and [4]. Note that only Read/Write operations can be implemented this way. Operations such as Compare-And-Set cannot be implemented in this form and must use a distributed Consensus algorithm; see Fig. 1 of [34] for details.
Read-through and Write-through approaches are mainly suitable for Cache scenarios. Using such methods in distributed storage scenarios carries a relatively high risk of data loss.
The cost of achieving Linearizability is the latency from the Client to the Leader plus the latency from the Leader to the slowest node in the majority set.
How to Implement Serializability Isolation Transactions
With a distributed Key-Value Store already in place, our next goal is to implement distributed transactions at the Serializability Isolation Level.
In single-machine database systems, the main methods for implementing transactions at the Serializability Isolation Level, ordered by Scalability from weak to strong, are:
True (single-threaded) serialized execution
Strict 2 Phase Locking
Serializable Snapshot Isolation Algorithm
Among them, serialized execution and Strict 2PL (Strict 2 Phase Locking) are pessimistic concurrency methods, while SSI (Serializable Snapshot Isolation) is an optimistic concurrency method. The basic idea of SSI is to use Snapshot Isolation, but before commit, check whether any known write operation conflicts with the current transaction’s read operations under Serializability semantics. If so, Abort; if not, Commit and Serializability can be guaranteed [21].
The main problem that must be solved to implement distributed transactions is Atomic Commitment, which is the problem of multiple nodes making an all-or-nothing decision to commit or abort a transaction. Atomic Commitment is divided into Blocking and Non-blocking forms. The Blocking Atomic Commitment problem can be transformed into the Consensus problem [31]. A typical implementation of Blocking Atomic Commitment is 2 Phase Commit, but I think the Paxos Commit Algorithm [31] should be used. The Non-blocking Atomic Commitment problem is strictly harder than the Consensus problem [32]. A typical implementation is 3PC (3 Phase Commit), but because 3PC cannot guarantee correctness when nodes fail, almost no one uses 3PC in production environments.
For a system with Linearizability Consistency, we do not need to worry about consistency among the multiple Replications of each Shard, so the implementation of transactions that do not cross Shards is the same as the implementation of transactions in a single-machine database. For transactions that cross Shards, one feasible solution is to use the Paxos Commit Algorithm to coordinate multiple Shards, with each Shard using Strict 2PL. Except for the need for an additional external component to record the relationships between transactions and locks across the entire system for deadlock detection and handling, this solution is basically the same as transaction processing in traditional databases. Implementing SSI requires solving an important problem: how to generate a consistent Snapshot across Shards. I do not have much experience in this area, and I hope to complete this part after reading [26, 36, 48, 49, 53] in the future (though it is also possible that this will remain an unfilled hole).
Problems with Strong Consistency Systems
Although Linearizability Consistency lets us use the system as if it had only one replica, the cost is not only that the algorithm is Network Latency Sensitive, but also that Scalability decreases. For a system with Linearizability Consistency, adding replicas cannot improve system performance. Linearizability Consistency is very expensive; even multicore CPUs do not use Linearizability Consistency [47]. Therefore, we should provide Linearizability Consistency only when necessary, for example by using an additional system such as Zookeeper.
Strict 2PL also has relatively low concurrency, which results in low transaction-processing performance. Although SSI has higher concurrency than Strict 2PL, the probability of Abort is also relatively high in high-concurrency scenarios.
Distributed Systems Under High Availability Constraints
From the conclusions summarized in [3, 37], if we want to implement a system that is 100% Read & Write Available (or, in other words, whose time complexity for read and write operations is not affected by network latency between nodes), the strongest consistency level it can support is Causal Consistency. In particular, considering the size of Version Vectors, it may not even support full Causal Consistency. On the other hand, from the conclusions in [8], support for distributed transactions can at most reach RC (Read Committed), MAV (Monotonic Atomic View), or P-CI (Predicate Cut Isolation).
I deliberately avoid the concept of Eventual Consistency here, because there is still no unified formal definition of Eventual Consistency. The disagreements mainly focus on two aspects: How Eventual and What Consistency. In [19, 51], the total order in Linearizability Consistency is extended to a partial order and decomposed into two dimensions, Visible and Arbitration, to describe multiple kinds of Consistency in a unified way. In particular, events that appear only in the Arbitration relation but not in the Visible relation are equivalent to being lost or overwritten.
Causal Consistency is highly valuable in many scenarios. For example, in a social network, A says that their child is missing (a1), then A posts that the child has been found (a2), and B comments that this is great news (b). If C can see only messages a1 and b at this point, a problem arises. From the perspective of Causal Relation, a1→a2→b. An Eventual Consistency system does not provide guarantees for Causal Relation, but Causal Consistency must guarantee this.
Even so, mainstream NoSQL systems currently implement only Eventual Consistency [22]. This is because implementing a general-purpose Causal Consistency mechanism is relatively expensive. For example, if a Client first reads a large amount of data and then performs a write, a general-purpose system cannot know which read operations this write causally depends on, so it must capture all read operations. Even if the user explicitly specifies which read operations a write depends on, the transitivity of the Causal Relation must also be considered, namely which read operations those read operations depend on, and so on.
How to Implement Causal Consistency
Single Object (or Per-Key) Causal Consistency is easy to implement. Technologies such as MVCC can store multiple versions of an object simultaneously. When writing, it is enough to specify which version the write depends on to implement Single Object Causal Consistency. Note that to maintain Session Level Guarantees, communication must occur with the same replica throughout the duration of the Session. Personally, I think a system should generally provide Single Object Causal Consistency, because this burden is not heavy.
Unfortunately, unlike Linearizability Consistency, Causal Consistency does not have the Locality property. Even if Single Object Causal Consistency is implemented, it does not make the entire system satisfy Causal Consistency.
At present, how to implement Causal Consistency is a hot academic topic [2, 6, 22, 24, 25, 40–42, 54]. On the one hand, this is because Causal Consistency is the strongest consistency level that can be supported while maintaining 100% Availability. On the other hand, it is because the guarantees provided by Eventual Consistency are simply too weak [10, 30].
How to Implement Highly Available Distributed Transactions
Although [8] points out that, while guaranteeing high availability, distributed transactions can at most support RC (Read Committed), MAV (Monotonic Atomic View), or P-CI (Predicate Cut Isolation), it does not point out what algorithms can be used to achieve this. In particular, [8] states that the reason they wrote the paper is that although many algorithms implement these distributed transactions, they do not support high availability; many systems claim to be highly available, but after using these algorithms that are not highly available, they are no longer highly available in the strict sense.
[18] proposes an implementation method for distributed transactions without a Coordinator, and [7] further discusses Coordination Avoidance. The Eiger system provides low-latency read-only transactions and write-only transactions respectively [41], but it still requires more careful examination to know whether it is highly available. The Anna system claims to implement Read Committed-level distributed transactions across Shards [54], but it does not disclose more technical details. Its authors have also published RAMP transactions [9], which provide low-latency Read Atomic Isolation distributed transactions. Overall, implementing highly available distributed transactions remains an open problem.
Other Topics
[7] provides a formal discussion of what kinds of constraint checks require Coordination. In general, constraint checks that do not require Coordination can only check constraints with the Locality property, not global constraints, such as non-negative counters, auto-increment primary keys, and so on.
One direction that still needs exploration is how to switch between different levels of Consistency and Isolation Level [50].
Recommended Readings
While writing this article, I read a large amount of related material. Here I provide a recommended reading order for the literature.
First, read Chapters 5, 7, and 9 of [38]. These chapters respectively introduce Replication methods, database transactions, Consistency and Consensus, and the relationships among them. This gives readers an initial understanding of the CAP theorem, database transactions, consistency models in Concurrent Programming, and the consensus problem in distributed systems.
Then I recommend reading [22] to gain an initial understanding of currently popular NoSQL systems.
For literature related to the CAP theorem, I recommend first reading [28], which gives informal and formal definitions and proofs of the commonly understood CAP theorem. Then I strongly recommend reading [37]. Its author is also the author of [38], and this paper proposes criticisms and improvements of existing work on the CAP theorem. These criticisms and improvements are highly enlightening and practical. After that, you can choose items of interest from the following list:
[16] is Brewer’s review and supplement to the CAP theorem 12 years after proposing it
[1] extends the CAP theorem into the PACELC theorem and incorporates Latency into the interpretation of Availability
[29] extends the CAP theorem into an Impossibility problem involving Safety and Liveness properties
[55] explains that P cannot be sacrificed in the CAP theorem (it should be part of the algorithm design model); this issue is also briefly touched on in [37]
[51] provides a very detailed formal summary of Consistency, but I strongly recommend finishing [19] before reading it, because its description method follows the approach in [19]; otherwise it can be difficult to understand
[52] provides a brief introduction and summary of Eventual Consistency
[30] provides a slightly deeper discussion of the Eventual aspect of Eventual Consistency
[19] is a very detailed and formal summary of Eventual Consistency
For Linearizability Consistency, I recommend reading Chapter 16 of [45]. Its original paper is [35], and some discussion of its performance can be found in [5]. For Eventual Consistency, I recommend reading [10, 14, 52]. For some formal descriptions, I recommend reading [19, 51].
For Serializability, I recommend reading Chapter 2 of [13].
For Consistency in Concurrent Programming, I recommend reading
For (single-machine) database transactions, I recommend reading Chapter 7 of [38]. For Serializable Snapshot Isolation, see [21, 44]. For new methods of implementing single-machine database transactions, see [53].
The consistency of distributed transactions is strongly related to the distributed consensus problem. I recommend reading Chapter 9 of [38] and [31] by Lamport. For Non-blocking Atomic Commitment, I recommend reading [32].
References
[1] Abadi, D. 2012. Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story. Computer. 45, 2 (2012), 37–42. DOI:https://doi.org/10.1109/MC.2012.33.
[2] Akkoorath, D.D. et al. 2016. Cure: Strong Semantics Meets High Availability and Low Latency. 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) (Jun. 2016), 405–414.
[3] Attiya, H. et al. 2017. Limitations of Highly-Available Eventually-Consistent Data Stores. IEEE Transactions on Parallel and Distributed Systems. 28, 1 (2017), 141–155. DOI:https://doi.org/10.1109/TPDS.2016.2556669.
[4] Attiya, H. et al. 1995. Sharing memory robustly in message-passing systems. Journal of the ACM. 42, 1 (1995), 124–142. DOI:https://doi.org/10.1145/200836.200869.
[5] Attiya, H. and Welch, J.L. 1994. Sequential consistency versus linearizability. ACM Transactions on Computer Systems. 12, 2 (1994), 91–122. DOI:https://doi.org/10.1145/176575.176576.
[6] Bailis, P. et al. 2013. Bolt-on causal consistency. Proceedings of the 2013 international conference on Management of data - SIGMOD ’13. (2013), 761. DOI:https://doi.org/10.1145/2463676.2465279.
[7] Bailis, P. et al. 2015. Coordination Avoidance in Database Systems. Pvldb. 8, 4 (2015), 185–196. DOI:https://doi.org/1402.2237v2.
[8] Bailis, P. et al. 2013. Highly Available Transactions: Virtues and Limitations. Proceedings of the VLDB Endowment. 7, 3 (2013), 181–192. DOI:https://doi.org/10.14778/2732232.2732237.
[9] Bailis, P. et al. 2014. Scalable atomic visibility with RAMP transactions. Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD ’14. (2014), 27–38. DOI:https://doi.org/10.1145/2588555.2588562.
[10] Bailis, P. and Ghodsi, A. 2013. Eventual consistency today. Communications of the ACM. 56, 5 (2013), 55. DOI:https://doi.org/10.1145/2447976.2447992.
[11] Bailis, P. and Kingsbury, K. 2014. The Network is Reliable. Queue. 12, 7 (2014), 20:20-20:32. DOI:https://doi.org/10.1145/2639988.2639988.
[12] Berenson, H. et al. 1995. A critique of ANSI SQL isolation levels. ACM SIGMOD Record. 24, 2 (1995), 1–10. DOI:https://doi.org/10.1145/568271.223785.
[13] Bernstein, P.A. et al. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley Pub. Co.
[14] Bernstein, P.A. and Das, S. 2013. Rethinking eventual consistency. Proceedings of the 2013 international conference on Management of data - SIGMOD ’13. (2013), 923. DOI:https://doi.org/10.1145/2463676.2465339.
[15] Borowsky, E. and Gafni, E. 1993. Generalized FLP impossibility result for t-resilient asynchronous computations. Proceedings of the twenty-fifth annual ACM symposium on Theory of computing - STOC ’93. 5, (1993), 91–100. DOI:https://doi.org/10.1145/167088.167119.
[16] Brewer, E. 2012. CAP twelve years later: How the “rules” have changed. Computer. 45, 2 (2012), 23–29. DOI:https://doi.org/10.1109/MC.2012.37.
[17] Brewer, E. 2017. Spanner, TrueTime & The CAP Theorem. White Papers. 2015, 4/4/2015 (2017), 1–7.
[18] Burckhardt, S. et al. 2012. Eventually Consistent Transactions. Proceedings of the 22n European Symposium on Programming (ESOP). Springer. 67–86.
[19] Burckhardt, S. 2014. Principles of Eventual Consistency. Foundations and Trends® in Programming Languages. 1, 1–2 (2014), 1–150. DOI:https://doi.org/10.1561/2500000011.
[20] Cachin, C. et al. 2011. Introduction to reliable and secure distributed programming. Springer.
[21] Cahill, M.J. et al. 2009. Serializable isolation for snapshot databases. ACM Transactions on Database Systems. 34, 4 (Dec. 2009), 1–42. DOI:https://doi.org/10.1145/1620585.1620587.
[22] Davoudian, A. et al. 2018. A Survey on NoSQL Stores. ACM Computing Surveys. 51, 2 (Apr. 2018), 1–43. DOI:https://doi.org/10.1145/3158661.
[23] Défago, X. et al. 2004. Total order broadcast and multicast algorithms: Taxonomy and Survey. ACM Computing Surveys. 36, 4 (Dec. 2004), 372–421. DOI:https://doi.org/10.1145/1041680.1041682.
[24] Didona, D. et al. 2017. Okapi: Causally Consistent Geo-Replication Made Faster, Cheaper and More Available. (Feb. 2017).
[25] Du, J. et al. 2014. GentleRain : Cheap and Scalable Causal Consistency with Physical Clocks. SOCC ’14 Proceedings of the ACM Symposium on Cloud Computing. (2014), 1–13. DOI:https://doi.org/10.1145/2670979.2670983.
[26] Dutta, P. et al. 2005. How fast can eventual synchrony lead to consensus? Proceedings of the International Conference on Dependable Systems and Networks. March (2005), 22–27. DOI:https://doi.org/10.1109/DSN.2005.54.
[27] Fischer, M.J. et al. 1983. Impossibility of distributed consensus with one faulty process. Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems - PODS ’83 (New York, New York, USA, Apr. 1983), 1–7.
[28] Gilbert, S. and Lynch, N. 2002. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News. 33, 2 (2002), 51. DOI:https://doi.org/10.1145/564585.564601.
[29] Gilbert, S. and Lynch, N. 2012. Perspectives on the CAP Theorem. Computer. 45, 2 (2012), 30–36. DOI:https://doi.org/10.1109/MC.2011.389.
[30] Golab, W. et al. 2014. Eventually consistent: not what you were expecting? Communications of the ACM. 57, 3 (2014), 38–44. DOI:https://doi.org/10.1145/ 2576794.
[31] Gray, J. and Lamport, L. 2004. Consensus on Transaction Commit. 1, April 2004 (2004). DOI:https://doi.org/10.1145/1132863.1132867.
[32] Guerraoui, R. 1995. Revisiting the relationship between non-blocking atomic commitment and consensus. Distributed Algorithms (1995), 87–100.
[33] Haerder, T. and Reuter, A. 1983. Principles of transaction-oriented database recovery. ACM Computing Surveys. 15, 4 (1983), 287–317. DOI:https://doi.org/10.1145/289.291.
[34] Herlihy, M. 1991. Wait-free synchronization. ACM Transactions on Programming Languages and Systems. 13, 1 (1991), 124–149. DOI:https://doi.org/10.1145/114005.102808.
[35] Herlihy, M.P. and Wing, J.M. 1990. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems. 12, 3 (Jul. 1990), 463–492. DOI:https://doi.org/10.1145/78969.78972.
[36] Jung, H. et al. 2011. Serializable Snapshot Isolation for Replicated Databases in High-Update Scenarios. PVLDB. 4, 11 (2011), 783–794.
[37] Kleppmann, M. 2015. A Critique of the CAP Theorem. (2015).
[38] Kleppmann, M. 2017. Designing data-intensive applications. O’Reilly Media, Inc.
[39] Linearizability versus Serializability: 2014. http://www.bailis.org/blog/linearizability-versus-serializability/. Accessed: 2018-05-08.
[40] Lloyd, W. et al. 2011. Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS. Proceedings of the Symposium on Operating Systems Principles. (2011), 1–16. DOI:https://doi.org/10.1145/2043556.2043593.
[41] Lloyd, W. et al. 2013. Stronger Semantics for Low-Latency Geo-Replicated Storage. Proceedings of the Symposium on Networked Systems Design and Implementation. April (2013), 313–328. DOI:https://doi.org/10.1145/2602649.2610533.
[42] Mehdi, S.A. et al. 2017. I Can’t Believe It’s Not Causal! Scalable Causal Consistency with No Slowdown Cascades. 14th {USENIX} Symposium on Networked Systems Design and Implementation, {NSDI} 2017, Boston, MA, USA, March 27-29, 2017. (2017), 453–468.
[43] On Consistency and Durability: http://www.bailis.org/blog/on-consistency-and-durability/. Accessed: 2018-05-08.
[44] Ports, D.R.K. and Grittner, K. 2012. Serializable snapshot isolation in PostgreSQL. Proceedings of the VLDB Endowment. 5, 12 (Aug. 2012), 1850–1861. DOI:https://doi.org/10.14778/2367502.2367523.
[45] Raynal, M. 2013. Distributed Algorithms for Message-Passing Systems. Springer, Berlin, Heidelberg.
[46] Saks, M. and Zaharoglou, F. 1993. Wait-free k-set agreement is impossible. Proceedings of the twenty-fifth annual ACM symposium on Theory of computing - STOC ’93. (1993), 101–110. DOI:https://doi.org/10.1145/167088.167122.
[47] Sewell, P. et al. 2010. X86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors. Communications of the ACM. 53, 7 (Jul. 2010), 89. DOI:https://doi.org/10.1145/1785414.1785443.
[48] Shao, J. et al. 2016. Read Consistency in Distributed Database Based on DMVCC. 2016 IEEE 23rd International Conference on High Performance Computing (HiPC). (Dec. 2016), 142–151. DOI:https://doi.org/10.1109/HiPC.2016.11.
[49] Sovran, Y. et al. 2011. Transactional storage for geo-replicated systems. Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles - SOSP ’11. (2011), 385. DOI:https://doi.org/10.1145/2043556.2043592.
[50] Tripathi, A. and Thirunavukarasu, B.D. 2015. A transaction model for management of replicated data with multiple consistency levels. 2015 IEEE International Conference on Big Data (Big Data) (Oct. 2015), 470–477.
[51] Viotti, P. and Vukolić, M. 2016. Consistency in Non-Transactional Distributed Storage Systems. ACM Computing Surveys. 49, 1 (Jun. 2016), 1–34. DOI:https://doi.org/10.1145/2926965.
[52] Vogels, W. 2008. Eventually Consistent. Queue. 6, 6 (2008), 14. DOI:https://doi.org/10.1145/1466443.1466448.
[53] Wang, T. et al. 2016. Efficiently making (almost) any concurrency control mechanism serializable. The VLDB Journal. 26, 4 (May 2016), 537–562. DOI:https://doi.org/10.1007/s00778-017-0463-8.
[54] Wu, C. et al. 2018. Anna : A KVS For Any Scale. 34th IEEE International Conference on Data Engineering. (2018).
[55] You Can’t Sacrifice Partition Tolerance: 2010. https://codahale.com/you-cant-sacrifice-partition-tolerance/. Accessed: 2018-04-17.
Afterword
It is clear that the later parts of this article are rather scattered and did not really achieve the effect I had in mind. This is because I do not have a very clear understanding of how to implement a Causal Consistency storage system. My original intention in writing this article was to clarify the relationship between the CAP theorem and ACID properties and to find a clear line of thought for designing distributed storage systems. While researching, I found that the description of the CAP theorem is not clear, and people have many disputes about it. The ACID properties themselves are also not very clear, and extending them into a distributed environment introduces new problems. Some new methods for implementing database transactions also do not map cleanly to the ACID properties. Non-blocking distributed systems are also an emerging field. Although there is currently some theoretical preparation, there is still no clear path for algorithms and implementations.
For distributed storage systems, the first consideration should be the trade-off between performance and consistency. The order of consideration should be as follows:
Applications that emphasize consistency should consider supporting Linearizability Consistency, but they must also consider whether they can accept the accompanying decreases in Latency and Scalability. Applications that require high performance or high availability should consider using Eventual Consistency, because the theory related to Causal Consistency is still not mature enough. However, when needed, highly available implementations of Causal Consistency can also be explored. For transaction support, if Linearizability Consistency is not implemented, one can consider supporting only single-machine transactions, or multi-machine read-only and write-only transactions.