Distributed storage and the CAP Theorem

Categories: Featured

In this brave new world distributed processing and distributed storage are the big talks. Maybe you haven’t seen it at your organization yet, but it is there, and it is coming close to you.

One important idea around this kind of distributed stuff, to be more specific distributed storage is the CAP Theorem[1]. I will not show all the considerations about it (but at the end of the post you will find the original paper that prove this theorem).

First things first: what does CAP stand for? CAP is the short for Consistency, Availability and Partition Tolerance.

The main idea is that all distributed storage system should provide consistency, which means that the data is the same in all the parts; availability means that your distributed system is available or in other words for every request that it receive it will be able to send back a response; and finally partition tolerance means that your distributed system can keep working even if part of it stop (by network failure, node failure, etc).

Now that you know what CAP means let’s see what the CAP Theorem says: it says that you cannot have a system that with Consistency, Availability and Partition Tolerance at the same time, the most that you can have is a system with two of the three attributes.

So you can have a distributed storage system that is CA, or CP or AP. The next natural question is why?

I will let the proof for the paper[2] (which does with in the correct way) but in a nutshell: imagine that a system with two nodes receive a request:

  1. The data arrives and is stored in all notes, without any problem;
  2. The data arrives is stored in node 1 but cannot communicate with note 2 so the newest data is not there.

Based on that in a full CAP system the item 2 would never, because from 3 one: or you system will not be consistent after the call or it will not be available during the call or it will simply not work when there is a failure in the network communication.

So let’s understand what the three possible configurations means:

  • Consistent and Available(CA): the system will guarantee that you all data received will be consistent after a request and as soon as the system is running all the request will be completed. But as soon as a communication problem happens between the nodes it will simple stop, because it cannot survive without a communication between the nodes;
  • Consistent and Partition Tolerant (CP): there is a small difference between CP and CA, although it could look like the same. In the CP the system will be consistent for all the request received, and it will be alive when there is a communication fail between the nodes. But it will not the available all the time, when there is a communication fail between the notes it can simply not answer to all the requests. In practice it looks like the same as CA but it could be slightly different, suppose that the system still accept reads in the communication is down: so it is not simply stopped it still survive in some calls.
  • Available and Partition Tolerant (AP): and finally the AP system, that will be available as long as, at least one, node is still alive and will be tolerant to faults. But you may end up reading old data or overwriting some data. Let’s suppose that you write in node1 while there is no communication and someone else reads from node 2, this read will get some old data.

In a simplified way this is what CAP Theorem means. The second good question is: why do I care? Well most of the time you don’t need to implement that by yourself, but the raising of the distributed storage is really important to understand it – most of them are classified according to this theorem.

I really recomend to look at the reference if you are interesting in the proof of the theorem. And if you are interested how a distrubuted storage could be implemented I would recomend to look around the open source projects[3]. They are really spending a lot of time on that and trying solutions to achieve something like: “most the data, most of the time”.

[1] Brewer, Eric A. Towards Robust Distributed Systems. (http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf)

[2] Gilbert, S.; Lynch, N. Brewer’s Conjecture and the Feasiability of Consistent, Available, Partition-Tolerant Web Services.

[3] Example of open source project that implements AP with eventually consistency Riak(http://wiki.basho.com/).

As we are talking about a new Era, let’s listen to Era – Ameno, enjoy.

«
»

    Leave a Reply

    Your email address will not be published. Required fields are marked *