database distributed-computing database-partitioning cap

Understanding the "P" in "CAP" with DDBMSes

In the CAP Theorem, the "P" (Partitioning) component essentially states that the system works well despite physical network partitions.

I guess the "C" (Consistency) and "A" (Availability) make perfect sense to me in a DDBMS context: with consistency all clients must have a consistent view of the data regardless of the DB node they are being served. And with availability, all clients must be able to obtain a response from a some DB node for reads/writes (i.e., all nodes are never down at the same time).

But for some reason, I'm choking on the partitioning piece of CAP, and what it's significance is, especially with respect to DDBMSes.

With a distributed database, you by definition have multiple (clustered) nodes. Depending on network & systems architecture, physical devices, etc., you're going to have performance issues when replicating or communicating (semi-joins, etc.) between nodes anyways. So is the "P" in CAP simply some way of speeding up performance in a DDBMS; performance that would otherwise be hindered without the P-guarantee?

Also, how does the "P" relate to a single node (non-clustered) DB? I feel like it's completely irrelevant in that context. Thanks in advance!

Solution

CAP theorem says that in a distributed system, you cannot have universal correctness, availability, and partition tolerance during failures. Correctness means data read from any node has no conflicting values at any other node. Availability means that all healthy nodes can be used by clients. Partition tolerance means that the system can be split into subsets which cannot communicate with each other and still function.

Say you have 3 machines. One of them is unable to contact the others, or in other words, the cluster is split into 2 partitions. If the system can handle this scenario, then it is partition tolerant. However, you must either give up total correctness or total availability:

Drop correctness: All nodes remain up, but the split off node and the remaining cluster nodes may contain conflicting data, sometimes known as split brain.

Drop availability: One of the partitions goes offline. This protects data integrity, since any successful read will not have a conflicting value anywhere else.

From a database system perspective, this means you must have different strategies for dealing with failure. A database that can't handle partition failures means that if any node goes down, the behavior is undefined. A database that sacrifices correctness during failures will force the application to deal with consistency issues when the failure is resolved, but more nodes can remain available. A database that gives up availability will allow the application logic to assume that the data is always consistent, but some otherwise healthy nodes will be inaccessible during the failure.