Sometimes, we have to find a way to work with the unreliable. How?
Last time, we looked at why consensus algorithms are needed (link). This time, we will examine Leslie Lamport’s Paxos.
What is Paxos? Paxos is a family of distributed algorithms used to reach consensus. In other words, the various versions of Paxos allow a group of independent nodes to agree on a single result (i.e., consensus). A network is in consensus when a majority of its nodes agree on an outcome. The Paxos that we will examine address Fail-Stop failures and not Byzantine-Fault failures. Fail-Stop errors are when unhealthy nodes stop transmitting. Byzantine-Faults also account for unhealthy nodes sending incorrect information. Paxos means this version of Paxos going forward.
Paxos assumes a few things. It expects the consensus network to have 2m + 1 nodes. Furthermore, the nodes must be capable of peer-to-peer communications with each other. Additionally, each node is assumed to have a way to persist data (e.g., saving data to a hard drive). Paxos ensures that there will be a complete record of agreements (i.e., transactions) on multiple nodes. Therefore, error recovery is instantaneous.
A network is in consensus when a majority of its nodes agree on an outcome.
Paxos defines three roles: proposer, acceptor, and learner. Proposers kick off new Paxos rounds and put forward proposals. A Paxos consensus round is a single instance of Paxos running to achieve consensus on a single proposition. Acceptors participate in the consensus resolution. Learners are not part of the consensus process, but they still learn of the agreed-upon result. In other words, proposers put up values, acceptors accept the proposals, and learners find out about the accepted values. A node can be any combination of the three roles.
We will illustrate Paxos with a simple example of deciding on a color shirt to wear. We have three nodes in our consensus network (i.e., nodes 1-3). Each node is an acceptor and listener. Additionally, node 1 is also a proposer.
The proposer kicks things off by declaring the start of a Paxos round. It sends a Prepare message with the round number to all acceptors. The round number must be unique and higher than any round number that proceeds it. So, Node 1 sends the Prepare messages with round number 42.1. 42 is the actual round number, and the appended 1 is the ID of the proposer. The ID ensures that the round number is unique. It is needed if there are multiple proposers in the network.
Upon receiving the Prepare message, each acceptor checks to see if the attached round number is the highest one it has seen. If it is, the acceptor replies with a Promise message. It affixes the round number to the Promise message. The acceptor is promising to ignore all messages with a smaller round number. After most acceptors have replied, the proposer advances a value to be accepted. It sends Accept messages to all acceptors, pinning the current round number and the proposal. In our case, node 1 proposes Green.
Paxos defines three roles: proposer, acceptor, and learner.
Each acceptor receives the Accept message and checks whether the attached round number is still the highest round number it has seen. If it is, the acceptor accepts the proposed value. It notes (e.g., writes to disk) that it agreed to Green for round 42.1. It replies with an Accepted message containing the round number and accepted value. It sends the reply to the proposer and all other peers. When a majority of acceptors have accepted the proposal, the round ends. The network has reached a consensus.
Paxos is simple enough when everything works. But what happens when problems occur? For example, what happens if the proposer stops working? Well, that depends on when it stopped working. If the proposer fails before sending the Accept messages, the round stops progressing. After a while, another proposer (e.g., Node 2) starts a new consensus round.
Now, what if the proposer fails while sending out the Accept messages? In our example, node 1 fails after sending the Accept message to node 3. Node 3 accepts the proposal, but Node 2 does not since it never got the Accept message. After a while, Node 2 makes itself a proposer and starts a new round. Node 3 acknowledges Node 2's Prepare message and responds with a special Promise message. That message has the current round number and last accepted transaction (i.e., the preceding round number and accepted value). Node 2 gets that message and sees that a previous transaction was attached. So, it proposes the same thing for its round.
OK, but what if an acceptor fails instead of a proposer? In those cases, the Paxos round is unaffected as long as most acceptors work.
We only looked at a single Paxos round with only one proposer proposing per round. In reality, a consensus network will run many consensus rounds. Additionally, more than one proposer may initiate in the same cycle. For such scenarios, we might use Multi-Paxos to address the additional complexities of such a system. Multi-Paxos is beyond the scope of this post, however.