Reading the New Apache HBase MOB Compaction Policy

In case you want to understand more on MOB (Moderate Object Storage), you may refer to this issue. Basically, hbase was first introduced with capability of storing mainly small objects (<100k). Moderate objects stand for files from 100k to 10m.

Recently, there is a blog introducing the new compaction policy for MOB files. The problem with the initial approach is multiple compaction. For instance, the goal is to compact the objects created in one calendar day into one big file. The compaction process starts after the first hour. The objects created in the first hour are compacted into a temporal file. Then, the objects created in the second hour, and the temporal file created for the first hour are compacted into a new temporal file…

In this way, finally, all objects created in one day is compacted into one file. However, the objects in the first hour is compacted quite a few of times, wasting IO. The new method is based on partition. For instance, we may compact the objects in each hour of day, which is the first stage. Then, the temporal files in each hour are compacted into the final file, which is the second stage. This saves a lot of IO in comparison with the initial approach. Actually, this improvement is quite straightforward.

What I found really insightful is about the compaction partitioned by the created time. Note that the creation time of each object is never changed during its life time. Therefore, suppose a set of objects is compacted into a big file which say contains objects between 2017-08-23 ~ 2017-08-24. After a while, some objects in that set may be deleted (with tombstone in hbase), or replaced with newer versioned metadata. However can we remove these objects physically? The answer is easy. We search for all objects created between 2017-08-23 ~ 2017-08-24, which should result in a subset of the original set of objects. We then extract the remain objects into a new big file, and delete the old big file. There are two other essential points to achieve the clear process described above: (1) the metadata should be 1:1 mapping with the objects. In other words, there should be no more than 1 metadata pointing to the same object. (2) the creation time and the pointer to file should be always updated atomically.