Understanding Chain Replication

July 28, 2017 Tech 本文总阅读量次

I learned the idea of chain replication from hibari,

Hibari is a production-ready, distributed, ordered key-value, big data store. Hibari uses chain replication for strong consistency, high-availability, and durability. Hibari has excellent performance especially for read and large value operations.

The term “strong consistency” indeed caught my attention as I already know a few key-value storage services with only eventually consistency, e.g., openstack swift. I read its doc to find out the key tech sitting in the core is called “chain replication”. I did some investigation about this concept which actually back to very early days in 2004 in a OSDI paper.

The idea is actually very easy to understand. The service maintains a set of chains. Each chain is a sequence of servers, where one server is called the head, and one is called the tail; all servers in between are middle servers. The figure in the very beginning shows such an example with two middle servers. Each write request is directed to the head server, and the update is pipelined from the head server to the tail server though the chain. Read requests are directed to only tail servers. What a client can read from the chain is definitely replicated across all servers belonging to the chain, and therefore, strong consistency is guaranteed.

Though the idea sounds straightforward, there are few practical issues. First of all, the traffic load at tail servers is higher than other servers, since they handle both write and read traffics. A load balancing aware chain organization algorithm is needed to balance the load across all servers. For instance, one server may be middle server of one chain and meanwhile tail server of another chain (see Fig. 3 in the Hibari paper). Another problem is failure handling. There should be a way of detecting failed servers, which turns out to be non-trivial in such distributed world. There are also plenty of issues about recovering from failures, replication, and migration. In conclusion, this “simple” idea comes with a bunch of tough issues.

There are only few open source projects based on chain replication, such as Hibari and CorfuDB. One fundamental reason may be the cost paid for strong consistency is too high. One killer application for object storage is handling highly massive objects such as user data in social network companies. However, the chain can never cross data centers in order for low latency. The idea of using chained servers is not really new. HDFS also use a pipeline to optimize data transfer latency while achieving strong consistency. Therefore, if the number of files is not a issue, storing them directly on HDFS might be a reasonable choice, given the advantage of naive integration with other Hadoop components.