CAP, ACID, What Can We Do?

This article uses the CAP theorem and ACID properties as a starting point to discuss the design of distributed (storage) systems.

When designing distributed systems, especially distributed storage systems, the first issue to consider is CAP: how to make trade-offs between C (Consistency) and A (Availability). While thinking about this, I found that although Consistency has many classifications, such as Linearizability, Sequential Consistency, and Causal Consistency, Availability does not have a corresponding classification. This runs counter to how we understand the CAP theorem: after reducing Consistency, what level of Availability can we obtain? Going one step further, after reducing Consistency, can we really improve Availability?

CAP and ACID

The main reasons we need to scale a single-machine system into a distributed system are:

Performance requirements exceed the limits that a single-machine system can provide
Availability requirements exceed the limits that a single-machine system can provide

For the former, we can use sharding to distribute the load across multiple single-machine systems. For the latter, we must consider the constraints of the CAP theorem in the design.

CAP

The first relatively formal statement and proof of the CAP theorem was published in paper [28]. The CAP theorem is discussed for a Single Data Object with multiple Replications, and its model is similar to Distributed Shared Memory. In CAP, P stands for Network Partition Tolerance. Because typical network conditions today generally cannot be considered reliable, systems built on such networks must assume that Partitions can occur. Therefore, in the CAP theorem, we cannot give up P and choose C and A; see [55] for a detailed discussion. C in the CAP theorem refers to Linearizability Consistency. Its specific meaning is described below in the section on Linearizability; for now, we can regard it as a very strong consistency guarantee. A in the CAP theorem refers to 100% Read & Write Availability. Informally, this means that a (read or write) request sent to any node at any time can receive a “correct” response without waiting for the result of communication with other nodes in the system.

Because Partitions do not occur frequently [11], C and A can both be achieved under such circumstances. Interestingly, Google’s continuous improvements to its network infrastructure have made the probability of problems caused by network communication lower than the probability of problems caused by bugs in its network environment. At that point, the network can even be considered reliable [17]. However, when a Partition occurs, we must make a trade-off between Linearizability Consistency and 100% Availability.

This leads to a key question: how do we determine whether a Network Partition is occurring? In practice, we usually use a message timeout mechanism: if communication between two nodes times out, even after some retries, we consider a Network Partition to be occurring. With this method, we find that Latency and Availability are the same issue. When Latency is low, nodes can communicate normally and provide Linearizability Consistency. When Latency is high, we consider a Network Partition to have occurred. We can either wait until Latency drops enough for communication to complete within the Threshold, meaning the Network Partition is resolved, thereby giving up 100% Availability while guaranteeing Linearizability Consistency; or we can give up communication between nodes and respond directly, thereby giving up the guarantee of Linearizability Consistency while providing 100% Availability. An increasing number of systems do not provide Linearizability Consistency even when P has not occurred, in order to provide stronger performance guarantees. In other words, when P occurs, we trade off between C and A; when P does not occur, we trade off between C and L (Latency). In this way, the CAP theorem is extended into the PACELC theorem (If there is a Partition, how does the system trade off Availability and Consistency; Else, when the system is running normally in the absence of partitions, how does the system trade off Latency and Consistency?), see [1].

Considering this further, the fundamental reason we need to trade off between C and A, or between C and L, is that nodes need to perform synchronous communication to guarantee C. In the extreme case, we can make the entire system avoid synchronous communication in order to achieve extreme A and L. Now consider a write request. After it is written to one node and a success acknowledgment is received, if that node suffers a permanent failure before performing asynchronous communication with other nodes, the content written by this request will be permanently lost. This shows that Consistency and Durability are similar to some extent [43].

From a broader perspective, in an unreliable distributed system, we need to make trade-offs between Safety properties and Liveness properties [29]. For example, Consistency is a Safety property, while Availability is a Liveness property. As another example, the FLP result [27] tells us that in the Consensus problem, if any node can be unreliable, it is impossible to guarantee both Safety and Liveness. Important Impossibility results in distributed systems include [15, 27, 28, 46].

The Paxos algorithm can tolerate failures of nodes in a minority set of the system. Intuitively, we think Paxos provides highly available service at the system level while also providing Linearizability Consistency. This seems to contradict the CAP theorem. Consider CAP’s definition of Availability, which requires that requests to any node receive an immediate (real-time) response. Suppose a network partition divides the system into a majority set and a minority set. For Paxos, although nodes in the majority set can still reply to requests correctly and immediately, nodes in the minority set cannot. CAP defines Availability this way for a reason, because when a network partition occurs, a client may not be able to access nodes in the majority set.

[37] proposes some criticisms and improvements of the CAP theorem. Consider the fact that Availability is an observed result (Metric) of a service rather than a property of a system, while Consistency and Partition are system models. These two cannot be unified, and CAP’s definition of Availability is not rigorous. In [16], Brewer (informally) proposes that Availability and Consistency in the CAP theorem are not binary discrete variables, but variables that vary continuously from 0% to 100%. This contradicts the formal description of CAP in [28]. This means we need to rethink the precise definition (formal description) of the CAP theorem. In [37], A in CAP is defined as an algorithm’s sensitivity to Latency, C is defined as the concurrent consistency model used by the algorithm, and P is defined as a sudden increase in Latency. In this way, A is no longer an observed result of a service, but an intrinsic property of an algorithm; P’s definition can also be integrated with A. Under this framework, eventual consistency models can also be modeled and reasoned about well, ultimately leading to the conclusion that replica algorithms with eventual consistency can still terminate when Partition occurs permanently. This matches our intuition that using eventual consistency protocols can improve Availability. The paper further summarizes the lower bounds (not sure whether they are infimums) of the time required to achieve three consistency models. Assuming the message propagation time is \(d\), see the table below:

Consistency Level	Write latency	Read latency
Linearizability	\(\mathcal{O}(d)\)	\(\mathcal{O}(d)\)
Sequential Consistency	\(\mathcal{O}(d)\)	\(\mathcal{O}(1)\)
Causal Consistency	\(\mathcal{O}(1)\)	\(\mathcal{O}(1)\)

Consistency Level

Write latency

Read latency

Linearizability

\(\mathcal{O}(d)\)

Sequential Consistency

\(\mathcal{O}(d)\)

\(\mathcal{O}(1)\)

Causal Consistency

\(\mathcal{O}(1)\)

The read and write latencies of Sequential Consistency can be swapped.

ACID

The ACID properties are the properties that must be guaranteed when multiple transactions execute concurrently [33]:

Transaction Atomicity: the multiple events that make up a transaction either all succeed or all fail (all-or-nothing)
Database Consistency: before and after executing a transaction, the state of the database remains consistent at the application level
Isolation: concurrently executing transactions do not affect each other
Durability: events in committed transactions are not lost

Atomicity in ACID has a completely different meaning from Atomicity in the Concurrent Programming field, so to distinguish this, I call the A in ACID Transaction Atomicity. Similarly, Consistency in ACID is also completely different from Consistency in the Concurrent Programming field, but Isolation in ACID is somewhat related to Consistency in the Concurrent Programming field. I call Consistency in ACID Database Consistency here because A, I, and D in ACID are lower-level properties, while C is an upper-level application property; see Chapter 7 of [38] for details.

Differences Between the Two

The CAP theorem focuses more on object consistency and availability issues in a distributed shared-memory model. It is a problem from the traditional Concurrent Programming field under unreliable systems. Traditional Consistency classifications are summarized in [51]. Several common consistency models, from weak to strong, are:

Read-your-writes/Monotonic Reads/Monotonic Writes
PRAM (that is, FIFO, equivalent to the sum of the three above)
Causal (equivalent to PRAM plus Writes-follow-reads)
Sequential
Linearizability

The ACID properties concern the concurrent execution of multiple transactions in database systems, which is a traditional problem in the database field. Isolation in the ACID properties has the following common levels from weak to strong [12]:

Read Uncommitted
Read Committed
Cursor Stability
Repeatable Read
Snapshot Isolation
Serializable

Linearizability and Serializability

Linearizability is the ultimate goal of Consistency in Concurrent Programming, and Serializability is the ultimate goal of Isolation in database transactions. Both play important roles in distributed storage systems.

Linearizability

Linearizability is a constraint on the execution order of concurrent operations on a single object with multiple replicas. It can also be regarded as a constraint on the return values of individual operations given a concurrent operation history.

The original paper on Linearizability is [35], but personally I think Chapter 16 of [45] is easier to understand. In addition, Chapter 9 of [38] also gives an informal explanation of Linearizability.

The formal definition of Linearizability is as follows [35]:

A history \(H\) induces an irreflexive partial order \(<_{H}\) on operations:

\[e_{0} <_{H}e_{1}\ \text{if}\ res(e_{0})\ \text{precedes}\ inv(e_{1})\ \text{in}\ H\]

A history \(H\) is linearizable if it can be extended (by appending zero or more response events) to some history \(H'\) such that:

\(complete(H')\) is equivalent to some legal sequential history \(S\)
\(<_{H} \subseteq <_{S}\)

Intuitively, the result of Linearizability execution is equivalent to executing these events (Event, understood here as read or write operations) one by one, non-concurrently, in real-time order.

Serializability

Serializability is a constraint on the concurrent execution of database transactions. A database transaction consists of multiple operations (or just one) involving multiple objects (or just one object). Serializability requires that the result of executing these transactions be equivalent to the result of executing these transactions one by one, non-concurrently. It is worth noting that Serializability does not constrain the execution order of these transactions.

For a detailed description of Serializability, see Chapter 2 of [13].

Differences Between the Two

Linearizability is mainly applied in a distributed shared-memory model, constraining what values are legal for individual operations on a single object to return. Linearizability is a constraint on Recency, requiring the execution order of concurrent operations in a history to reflect their Real-Time order. Here, Real-Time does not refer to Real-Time in the field of real-time systems, but to the concept of actual time with respect to a global clock.

Serializability is the Isolation requirement for concurrent execution of database transactions. It constrains the result of executing concurrent transactions to be equivalent to the result of executing these transactions “one after another,” but it does not constrain the execution order of these concurrent transactions. Database transactions usually involve multiple operations on multiple objects.

The combination of Serializability and Linearizability is called Strict Serializability or One-copy Serializability.

For a comparison of the two, see also [39].

What Can We Do?

After clarifying the CAP theorem and ACID, let us consider what can be achieved in distributed system design and how to make trade-offs:

Under the strong consistency constraints of traditional databases, how high can availability be, and how low can latency be?
Under the constraint of 100% Read Write Availability, how high can consistency be?
What exists between these two extremes?

For these questions, we always start with the simplest model: a replicated Key-Value Store. Then we consider how to add distributed transactions. Of course, for a distributed database, we should at least also have Secondary Indexes, Data Constraint Checks, and other features, but this article will not expand on them further.

Distributed Systems Under Strong Consistency Constraints

Our goal is to implement a Key-Value Store with Linearizability Consistency while also supporting distributed transactions at the Serializability Isolation Level.

How to Implement Linearizability Consistency

Our goal is to implement a Key-Value Store with Linearizability Consistency. Because Linearizability has the Local Property, we can further simplify the problem. First, we consider how to implement an Atomic Read/Write Register. More specifically, it should be an MRMW Atomic Register (Multi-Reader Multi-Writer Atomic Register).

Page 333 of [38] summarizes whether various Replication methods can implement Linearizability:

Single-leader replication (potentially linearizable)
Consensus algorithms (linearizable)
Multi-leader replication (not linearizable)
Leaderless replication (probably not linearizable)

Section 4 of Chapter 16 of [45] also summarizes how to implement Linearizability:

Atomicity Based on a Total Order Broadcast Abstraction
Atomicity of Read/Write Objects Based on Server Processes
- Atomicity Based on a Server Process and Copy Invalidation
- Atomicity Based on a Server Process and Copy Update

Overall, there are several approaches:

Total Order Broadcast
Quorum read/write
Write-invalidate & Read-through
Write-through

Total Order Broadcast guarantees that each replication receives totally ordered messages in the same order. Obviously, this matches the definition of Linearizability Consistency. Using a Consensus algorithm, such as Raft or ZAB, is equivalent to first ordering messages on the Leader node and then propagating both the messages and the order itself to all replications; in effect, it is equivalent to Total Order Broadcast. For the classification and implementation methods of Total Order Broadcast, see [23].

To implement Linearizability using a Quorum approach, a Repair must be performed before every Read/Write operation; see page 334 of [38], Chapter 4 of [20], and [4]. Note that only Read/Write operations can be implemented this way. Operations such as Compare-And-Set cannot be implemented in this form and must use a distributed Consensus algorithm; see Fig. 1 of [34] for details.

Read-through and Write-through approaches are mainly suitable for Cache scenarios. Using such methods in distributed storage scenarios carries a relatively high risk of data loss.

The cost of achieving Linearizability is the latency from the Client to the Leader plus the latency from the Leader to the slowest node in the majority set.

How to Implement Serializability Isolation Transactions

With a distributed Key-Value Store already in place, our next goal is to implement distributed transactions at the Serializability Isolation Level.

In single-machine database systems, the main methods for implementing transactions at the Serializability Isolation Level, ordered by Scalability from weak to strong, are:

True (single-threaded) serialized execution
Strict 2 Phase Locking
Serializable Snapshot Isolation Algorithm

Among them, serialized execution and Strict 2PL (Strict 2 Phase Locking) are pessimistic concurrency methods, while SSI (Serializable Snapshot Isolation) is an optimistic concurrency method. The basic idea of SSI is to use Snapshot Isolation, but before commit, check whether any known write operation conflicts with the current transaction’s read operations under Serializability semantics. If so, Abort; if not, Commit and Serializability can be guaranteed [21].

The main problem that must be solved to implement distributed transactions is Atomic Commitment, which is the problem of multiple nodes making an all-or-nothing decision to commit or abort a transaction. Atomic Commitment is divided into Blocking and Non-blocking forms. The Blocking Atomic Commitment problem can be transformed into the Consensus problem [31]. A typical implementation of Blocking Atomic Commitment is 2 Phase Commit, but I think the Paxos Commit Algorithm [31] should be used. The Non-blocking Atomic Commitment problem is strictly harder than the Consensus problem [32]. A typical implementation is 3PC (3 Phase Commit), but because 3PC cannot guarantee correctness when nodes fail, almost no one uses 3PC in production environments.

For a system with Linearizability Consistency, we do not need to worry about consistency among the multiple Replications of each Shard, so the implementation of transactions that do not cross Shards is the same as the implementation of transactions in a single-machine database. For transactions that cross Shards, one feasible solution is to use the Paxos Commit Algorithm to coordinate multiple Shards, with each Shard using Strict 2PL. Except for the need for an additional external component to record the relationships between transactions and locks across the entire system for deadlock detection and handling, this solution is basically the same as transaction processing in traditional databases. Implementing SSI requires solving an important problem: how to generate a consistent Snapshot across Shards. I do not have much experience in this area, and I hope to complete this part after reading [26, 36, 48, 49, 53] in the future (though it is also possible that this will remain an unfilled hole).

Problems with Strong Consistency Systems

Although Linearizability Consistency lets us use the system as if it had only one replica, the cost is not only that the algorithm is Network Latency Sensitive, but also that Scalability decreases. For a system with Linearizability Consistency, adding replicas cannot improve system performance. Linearizability Consistency is very expensive; even multicore CPUs do not use Linearizability Consistency [47]. Therefore, we should provide Linearizability Consistency only when necessary, for example by using an additional system such as Zookeeper.

Strict 2PL also has relatively low concurrency, which results in low transaction-processing performance. Although SSI has higher concurrency than Strict 2PL, the probability of Abort is also relatively high in high-concurrency scenarios.

Distributed Systems Under High Availability Constraints

From the conclusions summarized in [3, 37], if we want to implement a system that is 100% Read & Write Available (or, in other words, whose time complexity for read and write operations is not affected by network latency between nodes), the strongest consistency level it can support is Causal Consistency. In particular, considering the size of Version Vectors, it may not even support full Causal Consistency. On the other hand, from the conclusions in [8], support for distributed transactions can at most reach RC (Read Committed), MAV (Monotonic Atomic View), or P-CI (Predicate Cut Isolation).

I deliberately avoid the concept of Eventual Consistency here, because there is still no unified formal definition of Eventual Consistency. The disagreements mainly focus on two aspects: How Eventual and What Consistency. In [19, 51], the total order in Linearizability Consistency is extended to a partial order and decomposed into two dimensions, Visible and Arbitration, to describe multiple kinds of Consistency in a unified way. In particular, events that appear only in the Arbitration relation but not in the Visible relation are equivalent to being lost or overwritten.

Causal Consistency is highly valuable in many scenarios. For example, in a social network, A says that their child is missing (a1), then A posts that the child has been found (a2), and B comments that this is great news (b). If C can see only messages a1 and b at this point, a problem arises. From the perspective of Causal Relation, a1→a2→b. An Eventual Consistency system does not provide guarantees for Causal Relation, but Causal Consistency must guarantee this.

Even so, mainstream NoSQL systems currently implement only Eventual Consistency [22]. This is because implementing a general-purpose Causal Consistency mechanism is relatively expensive. For example, if a Client first reads a large amount of data and then performs a write, a general-purpose system cannot know which read operations this write causally depends on, so it must capture all read operations. Even if the user explicitly specifies which read operations a write depends on, the transitivity of the Causal Relation must also be considered, namely which read operations those read operations depend on, and so on.

How to Implement Causal Consistency

Single Object (or Per-Key) Causal Consistency is easy to implement. Technologies such as MVCC can store multiple versions of an object simultaneously. When writing, it is enough to specify which version the write depends on to implement Single Object Causal Consistency. Note that to maintain Session Level Guarantees, communication must occur with the same replica throughout the duration of the Session. Personally, I think a system should generally provide Single Object Causal Consistency, because this burden is not heavy.

Unfortunately, unlike Linearizability Consistency, Causal Consistency does not have the Locality property. Even if Single Object Causal Consistency is implemented, it does not make the entire system satisfy Causal Consistency.

At present, how to implement Causal Consistency is a hot academic topic [2, 6, 22, 24, 25, 40–42, 54]. On the one hand, this is because Causal Consistency is the strongest consistency level that can be supported while maintaining 100% Availability. On the other hand, it is because the guarantees provided by Eventual Consistency are simply too weak [10, 30].

How to Implement Highly Available Distributed Transactions

Although [8] points out that, while guaranteeing high availability, distributed transactions can at most support RC (Read Committed), MAV (Monotonic Atomic View), or P-CI (Predicate Cut Isolation), it does not point out what algorithms can be used to achieve this. In particular, [8] states that the reason they wrote the paper is that although many algorithms implement these distributed transactions, they do not support high availability; many systems claim to be highly available, but after using these algorithms that are not highly available, they are no longer highly available in the strict sense.

[18] proposes an implementation method for distributed transactions without a Coordinator, and [7] further discusses Coordination Avoidance. The Eiger system provides low-latency read-only transactions and write-only transactions respectively [41], but it still requires more careful examination to know whether it is highly available. The Anna system claims to implement Read Committed-level distributed transactions across Shards [54], but it does not disclose more technical details. Its authors have also published RAMP transactions [9], which provide low-latency Read Atomic Isolation distributed transactions. Overall, implementing highly available distributed transactions remains an open problem.

References

[1] Abadi, D. 2012. Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story. Computer. 45, 2 (2012), 37–42. DOI:https://doi.org/10.1109/MC.2012.33.

[2] Akkoorath, D.D. et al. 2016. Cure: Strong Semantics Meets High Availability and Low Latency. 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) (Jun. 2016), 405–414.

[3] Attiya, H. et al. 2017. Limitations of Highly-Available Eventually-Consistent Data Stores. IEEE Transactions on Parallel and Distributed Systems. 28, 1 (2017), 141–155. DOI:https://doi.org/10.1109/TPDS.2016.2556669.

[4] Attiya, H. et al. 1995. Sharing memory robustly in message-passing systems. Journal of the ACM. 42, 1 (1995), 124–142. DOI:https://doi.org/10.1145/200836.200869.

[5] Attiya, H. and Welch, J.L. 1994. Sequential consistency versus linearizability. ACM Transactions on Computer Systems. 12, 2 (1994), 91–122. DOI:https://doi.org/10.1145/176575.176576.

[6] Bailis, P. et al. 2013. Bolt-on causal consistency. Proceedings of the 2013 international conference on Management of data - SIGMOD ’13. (2013), 761. DOI:https://doi.org/10.1145/2463676.2465279.

[7] Bailis, P. et al. 2015. Coordination Avoidance in Database Systems. Pvldb. 8, 4 (2015), 185–196. DOI:https://doi.org/1402.2237v2.

[8] Bailis, P. et al. 2013. Highly Available Transactions: Virtues and Limitations. Proceedings of the VLDB Endowment. 7, 3 (2013), 181–192. DOI:https://doi.org/10.14778/2732232.2732237.

[9] Bailis, P. et al. 2014. Scalable atomic visibility with RAMP transactions. Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD ’14. (2014), 27–38. DOI:https://doi.org/10.1145/2588555.2588562.

[10] Bailis, P. and Ghodsi, A. 2013. Eventual consistency today. Communications of the ACM. 56, 5 (2013), 55. DOI:https://doi.org/10.1145/2447976.2447992.

[11] Bailis, P. and Kingsbury, K. 2014. The Network is Reliable. Queue. 12, 7 (2014), 20:20-20:32. DOI:https://doi.org/10.1145/2639988.2639988.

[12] Berenson, H. et al. 1995. A critique of ANSI SQL isolation levels. ACM SIGMOD Record. 24, 2 (1995), 1–10. DOI:https://doi.org/10.1145/568271.223785.

[13] Bernstein, P.A. et al. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley Pub. Co.

[14] Bernstein, P.A. and Das, S. 2013. Rethinking eventual consistency. Proceedings of the 2013 international conference on Management of data - SIGMOD ’13. (2013), 923. DOI:https://doi.org/10.1145/2463676.2465339.

[15] Borowsky, E. and Gafni, E. 1993. Generalized FLP impossibility result for t-resilient asynchronous computations. Proceedings of the twenty-fifth annual ACM symposium on Theory of computing - STOC ’93. 5, (1993), 91–100. DOI:https://doi.org/10.1145/167088.167119.

[16] Brewer, E. 2012. CAP twelve years later: How the “rules” have changed. Computer. 45, 2 (2012), 23–29. DOI:https://doi.org/10.1109/MC.2012.37.

[17] Brewer, E. 2017. Spanner, TrueTime & The CAP Theorem. White Papers. 2015, 4/4/2015 (2017), 1–7.

[18] Burckhardt, S. et al. 2012. Eventually Consistent Transactions. Proceedings of the 22n European Symposium on Programming (ESOP). Springer. 67–86.

[19] Burckhardt, S. 2014. Principles of Eventual Consistency. Foundations and Trends® in Programming Languages. 1, 1–2 (2014), 1–150. DOI:https://doi.org/10.1561/2500000011.

[20] Cachin, C. et al. 2011. Introduction to reliable and secure distributed programming. Springer.

[21] Cahill, M.J. et al. 2009. Serializable isolation for snapshot databases. ACM Transactions on Database Systems. 34, 4 (Dec. 2009), 1–42. DOI:https://doi.org/10.1145/1620585.1620587.

[22] Davoudian, A. et al. 2018. A Survey on NoSQL Stores. ACM Computing Surveys. 51, 2 (Apr. 2018), 1–43. DOI:https://doi.org/10.1145/3158661.

[23] Défago, X. et al. 2004. Total order broadcast and multicast algorithms: Taxonomy and Survey. ACM Computing Surveys. 36, 4 (Dec. 2004), 372–421. DOI:https://doi.org/10.1145/1041680.1041682.

[24] Didona, D. et al. 2017. Okapi: Causally Consistent Geo-Replication Made Faster, Cheaper and More Available. (Feb. 2017).

[25] Du, J. et al. 2014. GentleRain : Cheap and Scalable Causal Consistency with Physical Clocks. SOCC ’14 Proceedings of the ACM Symposium on Cloud Computing. (2014), 1–13. DOI:https://doi.org/10.1145/2670979.2670983.

[26] Dutta, P. et al. 2005. How fast can eventual synchrony lead to consensus? Proceedings of the International Conference on Dependable Systems and Networks. March (2005), 22–27. DOI:https://doi.org/10.1109/DSN.2005.54.

[27] Fischer, M.J. et al. 1983. Impossibility of distributed consensus with one faulty process. Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems - PODS ’83 (New York, New York, USA, Apr. 1983), 1–7.

[28] Gilbert, S. and Lynch, N. 2002. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News. 33, 2 (2002), 51. DOI:https://doi.org/10.1145/564585.564601.

[29] Gilbert, S. and Lynch, N. 2012. Perspectives on the CAP Theorem. Computer. 45, 2 (2012), 30–36. DOI:https://doi.org/10.1109/MC.2011.389.

[30] Golab, W. et al. 2014. Eventually consistent: not what you were expecting? Communications of the ACM. 57, 3 (2014), 38–44. DOI:https://doi.org/10.1145/ 2576794.

[31] Gray, J. and Lamport, L. 2004. Consensus on Transaction Commit. 1, April 2004 (2004). DOI:https://doi.org/10.1145/1132863.1132867.

[32] Guerraoui, R. 1995. Revisiting the relationship between non-blocking atomic commitment and consensus. Distributed Algorithms (1995), 87–100.

[33] Haerder, T. and Reuter, A. 1983. Principles of transaction-oriented database recovery. ACM Computing Surveys. 15, 4 (1983), 287–317. DOI:https://doi.org/10.1145/289.291.

[34] Herlihy, M. 1991. Wait-free synchronization. ACM Transactions on Programming Languages and Systems. 13, 1 (1991), 124–149. DOI:https://doi.org/10.1145/114005.102808.

[35] Herlihy, M.P. and Wing, J.M. 1990. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems. 12, 3 (Jul. 1990), 463–492. DOI:https://doi.org/10.1145/78969.78972.

[36] Jung, H. et al. 2011. Serializable Snapshot Isolation for Replicated Databases in High-Update Scenarios. PVLDB. 4, 11 (2011), 783–794.

[37] Kleppmann, M. 2015. A Critique of the CAP Theorem. (2015).

[38] Kleppmann, M. 2017. Designing data-intensive applications. O’Reilly Media, Inc.

[39] Linearizability versus Serializability: 2014. http://www.bailis.org/blog/linearizability-versus-serializability/. Accessed: 2018-05-08.

[40] Lloyd, W. et al. 2011. Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS. Proceedings of the Symposium on Operating Systems Principles. (2011), 1–16. DOI:https://doi.org/10.1145/2043556.2043593.

[41] Lloyd, W. et al. 2013. Stronger Semantics for Low-Latency Geo-Replicated Storage. Proceedings of the Symposium on Networked Systems Design and Implementation. April (2013), 313–328. DOI:https://doi.org/10.1145/2602649.2610533.

[42] Mehdi, S.A. et al. 2017. I Can’t Believe It’s Not Causal! Scalable Causal Consistency with No Slowdown Cascades. 14th {USENIX} Symposium on Networked Systems Design and Implementation, {NSDI} 2017, Boston, MA, USA, March 27-29, 2017. (2017), 453–468.

[43] On Consistency and Durability: http://www.bailis.org/blog/on-consistency-and-durability/. Accessed: 2018-05-08.

[44] Ports, D.R.K. and Grittner, K. 2012. Serializable snapshot isolation in PostgreSQL. Proceedings of the VLDB Endowment. 5, 12 (Aug. 2012), 1850–1861. DOI:https://doi.org/10.14778/2367502.2367523.

[45] Raynal, M. 2013. Distributed Algorithms for Message-Passing Systems. Springer, Berlin, Heidelberg.

[46] Saks, M. and Zaharoglou, F. 1993. Wait-free k-set agreement is impossible. Proceedings of the twenty-fifth annual ACM symposium on Theory of computing - STOC ’93. (1993), 101–110. DOI:https://doi.org/10.1145/167088.167122.

[47] Sewell, P. et al. 2010. X86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors. Communications of the ACM. 53, 7 (Jul. 2010), 89. DOI:https://doi.org/10.1145/1785414.1785443.

[48] Shao, J. et al. 2016. Read Consistency in Distributed Database Based on DMVCC. 2016 IEEE 23rd International Conference on High Performance Computing (HiPC). (Dec. 2016), 142–151. DOI:https://doi.org/10.1109/HiPC.2016.11.

[49] Sovran, Y. et al. 2011. Transactional storage for geo-replicated systems. Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles - SOSP ’11. (2011), 385. DOI:https://doi.org/10.1145/2043556.2043592.

[50] Tripathi, A. and Thirunavukarasu, B.D. 2015. A transaction model for management of replicated data with multiple consistency levels. 2015 IEEE International Conference on Big Data (Big Data) (Oct. 2015), 470–477.

[51] Viotti, P. and Vukolić, M. 2016. Consistency in Non-Transactional Distributed Storage Systems. ACM Computing Surveys. 49, 1 (Jun. 2016), 1–34. DOI:https://doi.org/10.1145/2926965.

[52] Vogels, W. 2008. Eventually Consistent. Queue. 6, 6 (2008), 14. DOI:https://doi.org/10.1145/1466443.1466448.

[53] Wang, T. et al. 2016. Efficiently making (almost) any concurrency control mechanism serializable. The VLDB Journal. 26, 4 (May 2016), 537–562. DOI:https://doi.org/10.1007/s00778-017-0463-8.

[54] Wu, C. et al. 2018. Anna : A KVS For Any Scale. 34th IEEE International Conference on Data Engineering. (2018).

[55] You Can’t Sacrifice Partition Tolerance: 2010. https://codahale.com/you-cant-sacrifice-partition-tolerance/. Accessed: 2018-04-17.

Afterword

It is clear that the later parts of this article are rather scattered and did not really achieve the effect I had in mind. This is because I do not have a very clear understanding of how to implement a Causal Consistency storage system. My original intention in writing this article was to clarify the relationship between the CAP theorem and ACID properties and to find a clear line of thought for designing distributed storage systems. While researching, I found that the description of the CAP theorem is not clear, and people have many disputes about it. The ACID properties themselves are also not very clear, and extending them into a distributed environment introduces new problems. Some new methods for implementing database transactions also do not map cleanly to the ACID properties. Non-blocking distributed systems are also an emerging field. Although there is currently some theoretical preparation, there is still no clear path for algorithms and implementations.

For distributed storage systems, the first consideration should be the trade-off between performance and consistency. The order of consideration should be as follows:

Transaction Consistency

Applications that emphasize consistency should consider supporting Linearizability Consistency, but they must also consider whether they can accept the accompanying decreases in Latency and Scalability. Applications that require high performance or high availability should consider using Eventual Consistency, because the theory related to Causal Consistency is still not mature enough. However, when needed, highly available implementations of Causal Consistency can also be explored. For transaction support, if Linearizability Consistency is not implemented, one can consider supporting only single-machine transactions, or multi-machine read-only and write-only transactions.

CAP and ACID

CAP

ACID

Differences Between the Two

Linearizability and Serializability

Linearizability

Serializability

Differences Between the Two

What Can We Do?

Distributed Systems Under Strong Consistency Constraints

How to Implement Linearizability Consistency

How to Implement Serializability Isolation Transactions

Problems with Strong Consistency Systems

Distributed Systems Under High Availability Constraints

How to Implement Causal Consistency

How to Implement Highly Available Distributed Transactions

Other Topics

Recommended Readings

References

Afterword