CAP: Don't settle for eventual consistency

fanf2 | 89 points

> CP systems can be made to be highly available in practice.

Yes don't, if you happen to have Google's level infrastructure. Complete with private fiber connection down to GPS synchronized clocks in your data centers. Otherwise sometimes you don't have a choice. Now I am not being sarcastic as I understand Google offers Spanner as an API in GCP so you could technically do that by paying for it.

https://cloud.google.com/spanner/

Also in general it is important to keep in mind that keeping consistency in a large distributed system is fighting against the laws of physics. There is nothing wrong with fighting them, and it sure is fun to do but it takes effort and money. Often is possible you can architect your solution to work with eventually consistency. There are CRDTs and other things that can help there. Eventually consistent system have also been around for a while so it is not a totally new and uncharted territory.

The bottom line is consider the underlying assumptions. What is true for Google maybe not be and probably isn't true for your use case.

rdtsc | 7 years ago

Nope. Eventual Consistency seems to be the best policy.

The constant "C" isn't changing (or if it is, it's not much). that means there's a definite amount of time for a DB to be consistent to cover transit around the world. When the 3 DB machines are in the same rack, sure, we can approach 11.8"/nanosecond latencies, but the latencies are still there.

The larger and more global (and soon, interstellar), the more these C related lags become. And it is glaringly obvious to me, that "eventual consistency" is the big solution here.

The only case where it doesn't seem to be good, is transactional stuff like banking and investments. In those cases, seem to require CP and hardware like atomic clocks to verify every copy of the DB is on the same page.. Or perhaps an internal blockchain would be more appropriate, since it enforces consensus. But I digress on that one.

kefka | 7 years ago

It seems like this has the same opinion as CockroachDB said a couple days ago in "The Limits of the CAP Theorem"[1].

Do both of these really boil down to misunderstanding the "A" in CAP? Until the first time I actually used an AP system, I thought "A" was talking in the sense of "five-9's highly available". I'm far from an expert on this, so please correct me if I get this wrong, but here's my take-away from these two articles:

Since CP systems can be made "highly available" (in the five-9's sense of the term), I think what they are both saying is the only real benefit of AP is low latency.

CP still depends on some coordination, and despite tricks with locks and consensus algorithms, it's often still limited (even for reads) by the highest latency between nodes.

AP, on the other hand, can respond to both read and writes much faster because it doesn't need to care about the consistency -- the trade-off is the application must handle eventual consistency.

AP can also handle certain types of network partitions that "highly-available" CP systems can't (when DB nodes can't all talk to each other, but are still able to talk to client(s)) but in practice that type of failure almost never happens (at least not when your DBs all live in datacenters), so it's not a good reason to choose AP over CP.

Also, not all CP systems can be made "highly available" (in the sense of five-9's), so it's not always an apples-to-apples comparison, and that I think causes a lot of confusion.

(Again: Not an expert. Please correct me if I got anything wrong here.)

[1] https://news.ycombinator.com/item?id=14646063

gregmac | 7 years ago

Why does this read to me as an advertisement for cloud spanner?

justinsaccount | 7 years ago

I think the point of this post is: You can use https://cloud.google.com/spanner/ to work with a CP model (typically much less of a headache than AP), while sacrificing so little availability that it's essentially CAP.

The big concern I'd have (assuming using a Google-hosted database was practical) is the SLA. Unless I'm misreading https://cloud.google.com/spanner/sla, it seems like the SLA is...not very strong. Given how they discuss availability elsewhere, it seems like they're totally unwilling to put their money anywhere close to where their mouth is.

That said, it does seem like going with Spanner to have the ease (and power) of consistency while also having reliability and scalability would be something to consider in a whole lot of situations. (Though I'd be reluctant to jump on it this early.)

mnarayan01 | 7 years ago

(Unmediated) Human personal perception has always been CP. But then we humans are akin to mobile agents that visit co-local (thus non-partitionable) data spaces and perceive a coherent 'classical' (as in Physics) world that is equally available.

But AP is (by necessity) baked into the story of collective human perception. Collectively, we are more akin to static clusters that communicate information about data spaces.

("Your full deposite will be available to you the following business day. Your book balance is x. Your available balance is y < x." Dealing with AP is a day to day common experience as old as organized hills.)

I think the root of the surprising fact that programmers find it difficult to reason about AP data spaces (given the fact that that is how we humans achieve 'civilization' and 'culture') is due to the personal perspective nature of iterative programming. We see this again when we consider the quite related issue of correct understanding of memory models.

eternalban | 7 years ago

Transactions are an answer for atomicity, not consistency. It's possible to have transactions and not be consistent. Hopefully this is just the mistake of a non-technical product manager and not the engineering team that works on Cloud Spanner.

urethrafranklin | 7 years ago

Sounds nice in theory, I would like to see few years of history of production systems using Spanner before we have enough evidence that nothing goes wrong with this approach.

StreamBright | 7 years ago

Sigh, quoting from an earlier comment I made here: (https://news.ycombinator.com/item?id=14648745)

It is true that if you assume your client app is not important that a CP system is the right choice. And I would also say this /was/ true up till about 2004 when Gmail was released. But it definitely stopped being true in 2007 when the iPhone was released and you started having installed apps.

Since then, users have slowly grown to expect both mobile apps and SPAs to work regardless of whether the servers work, regardless of load balances, regardless of connectivity.

If you look at the market trends, things are increasingly going in this direction. From self-driving cars, to IoT devices, to drone delivery, to even traditionally server-dependent productivity tools like gDocs and others - people need to get work done even if the internet to your server doesn't exist.

Will banking applications still need mostly server-dependent behavior? Yes. Is CP still important? Yes. But it is biased to say that CP systems are better. Choose the right tool for the right job. CP systems are definitely the right choice for a strongly consistent database, but they aren't the right choice for everything. My database is an AP system, but it should not be used for many apps out there. Neither of these are "better", they are just tradeoffs you have to decide upon.

marknadal | 7 years ago