How we used CAP theorem in our system design?

You get to choose only two

What is CAP theorem?

Among Consistency, Availability, Partition-tolerance
— “pick any two”

  • CP: Consistent and Partition tolerant system
    (let-go: Availability)
  • AP: Available and Partition tolerant system
    (let-go: Consistency)
  • AC: Available and Consistent system
    (let-go: Partition tolerance)

“You can’t control what is not in your limits”

In other words, the CAP theorem states that in the presence of a network partition, one has to choose between Consistency and Availability.

Consider a simple system as shown below, where writes are through master and reads are from slave.

Simple client-server system

Let’s say we have a network partition(can be at any point in the system),

Network Partitioned System

Now, let’s say one of the write operations failed due to the network partition,

(writes are through master and reads from slave)

Now we can do either of the two things,

Availability vs Consistency

Which is right?
There is no right or wrong, choosing what suites best for the use case is the definitive resolution.

Who makes the decision?
If the answer is, “the development team” then maybe it’s not the right thing to do. Most times it is the business to take call. Either of the approaches should/can be chosen depending on the following factors,

  • What is the cost of achieving Consistency or how much would it cost in the case of Inconsistency?
  • How important is it to achieve high Availability or is it ok to throw Errors?
  • How much Latency is tolerated?
    (high latency approaching is equal to no availability )
  • How Complex can a solution get?

Is there an alternative?

PACELC theorem

If there is Partition,
how does the system trade-off
between Availability and Consistency
Else
how does the system trade-off
between Latency and Consistency

Note: As mentioned before high Latency can be termed as no Availability

What we desire?

High Availability and high Consistency.

But we know we can’t have both Availability(low Latency) and Consistency when there is network Partition 😟

So what did we do?
We thought, how about having high Availability(low Latency) over Consistency for the time being and getting Eventually Consistent?

Getting Eventually Consistent

How did we achieve Eventual Consistency?

  • Async response to the user for time consuming API calls.
  • Doing all book keeping work in background.
  • Reaching consensus in db cluster and cache, while not keeping the user await.

So at the day end, it was win-win for Development and Business teams👏

Conclusion

  • It’s important to consider CAP theorem in system design.
  • All Point of failures in the whole System must be looked upon.
  • Technologies like Casandra, Zoo Keeper, Kafka can be leveraged in achieving Eventual Consistency at data level.
  • Choosing between AP and CP is not easy. In most cases its a Business call.

The End!

Note: Please feel free to express your thoughts. Corrections, feedback, claps 👏 are highly encouraged 😝

I Code. I Paint. I Ride