Chord: Finding a Needle in a Haystack

Chuong Ngo
Technical Consultant

How do you find a needle in a haystack? Let's keep it simple. Why complicate things?

Peer-to-peer (P2P) computing distributes tasks or workloads across multiple machines. It allows many computers to pool their resources to do something that one computer may not be able to do alone. In other words, a P2P network of commonplace computers can rival supercomputers. Napster popularized P2P in 1999. More recent applications of the architecture include SETI@Home and Bitcoin.

Bring A Magnet

P2P applications need to break up tasks, workloads, and resources into smaller pieces and distribute them to their participant machines (i.e., peer nodes). That means that we have to retrieve those pieces at some point. Efficiently locating the node that has what you are looking for is a fundamental problem of P2P systems. In other words, indexing is hard. Nodes continually joining and leaving the network (i.e., churn) makes it even harder.

Napster was a P2P file-sharing application that allowed users to share MP3 files. It was not a fully distributed system, however. Peer nodes held the MP3 files, but a centralized server indexed the files. So, there was a single point of failure. In 2001, the Ninth Court of Appeals shut down the centralized servers, and the entire network stopped functioning. Following Napster’s shutdown, there was much research to find a better way.

Simple And Scalable

Chord is a simple, scalable lookup protocol for dynamic P2P networks. Conceptually, it builds a virtual, global hash table with message chaining. It arranges the peer nodes in a ring. Each node keeps track of the resource keys that match their node ID or for which they are the successor. For example, if we have a small P2P network with 6 nodes, Node 2 is responsible for all keys larger than 1 and less than or equal to 2.

Node 2 owns keys 1-2, including 2.

Each node has a routing table (i.e., finger table) of keys and associated nodes. When looking for a particular resource, a node checks its routing table for the resource’s key. If they find it, they contact that node to get it. Job done. If the key is not in its routing table, the node asks another node if they know where the key is. That node references its routing table and sends back the appropriate node ID. If it also doesn’t know, that node asks another node. A chain of responses starts when the correct node is found. So, each node only has to know about a subset of what is out there.

Not So Fast

A naive implementation of Chord can have nodes always contact their immediate successor (i.e., the node to the clockwise position). So, most of the ring may be traversed when searching for a key. In other words, the operation takes O(N) time. We can do better. Each node’s routing table entry will contain the successor node s such that:

s = successor( n + 2i - 1 )

So, each node has more nodes it can ask when looking for a key. We also know that the node tracking a key is the successor for that key. So, we don’t have to ask nodes blindly. If a node doesn’t have the right node in its routing table, it asks the node closest to it that it does have. So, the operation now takes O(log(n)) time. More simply, it is a binary search.

Node 1 is looking for Key 4.2.

Node 1, in the above example, is looking for Key 4.2 (which is tracked by Node 5). It tries to find the appropriate node in its routing table. It fails. So it messages the closest node that it knows about (Node 4) and asks for the key. Node 4 has Node 5 in its routing table, so it asks for the key from that node and returns it.

In An Ever-changing World

Finding a key seems simple enough, but what about network changes? For example, what happens if a new node joins the network? A joining node bootstraps itself with an arbitrary, existing node that it got beforehand. The new node asks the existing node for its predecessor. Finding the predecessor is like finding a key, except that no key is sought. Then, the new node takes ownership of the keys for which it is the successor.

Node 4 wants to join the network. It bootstraps with the help of Node 6.

Eventually, the new node's successor is located, and the node updates its successor pointer appropriately. When that successor node learns about the new node, it updates its predecessor pointer. All nodes in the network will periodically run a stabilization protocol where they ask their successor nodes for the successor’s predecessor and the nodes update their routing tables appropriately. So, after some time, the old predecessor of the new node’s successor will reach out to that successor and ask for its predecessor. The old predecessor will learn about the new node and update its successor pointer. It then reaches out to the new node. The new node will update its predecessor pointer to point to the old predecessor node. All nodes will update their routing table entries periodically, as well.

Node 4’s successor is found. All nodes update their pointers as necessary. Node 4 takes ownership of all appropriate keys.

In the example network, Node 4 wants to join the network. It bootstraps with Node 6. Node 4 asks Node 6 for its immediate successor. After a few hops, Node 6 identifies Node 5 as Node 4’s immediate successor. Node 5 updates its predecessor pointer to point to Node 4, and Node 4 points its successor pointer towards Node 5. Node 4 takes ownership of all keys owned by Node 5 for which Node 4 is the successor (keys 3-4, including 4).

After some time, Node 3, Node 5’s old predecessor will learn of the existence of Node 4 when it asks Node 5 for its predecessor. Node 3 updates its successor pointer to point to Node 4 and then reaches out to Node 4. Node 4 updates its predecessor pointer to point to Node 3. Over time, all peer nodes will update their routing tables. The network has reached stability again.

Dealing with a node leaving is simpler. When a node exits the network, its successor takes ownership of the orphaned keys. A successor list facilitates this. Each node maintains a list of its r nearest successors. If a node’s successor fails to respond to queries within a certain amount of time, the node replaces it with the next closest successor that is still responsive.

In our example network, Node 4 suddenly disconnects. When Node 3 queries Node 4 and gets no response, Node 3 replace Node 4 with Node 5.

Node 4 leaves the network, and Node 5 takes over the orphaned keys.

In The End

Chord was an early solution to locating resources in a P2P network without a centralized server. While it works and can scale, Chord’s performance in high-churn environments is lacking, and its standard routing algorithm goes in only one direction. It also does not account for latency. Next time, we’ll examine an alternative protocol, Kademlia.

Banner image credit to

Related Posts