Reading the chapter 13.5 “Arranging data on disk” in the book “DATABASE SYSTEM: IMPLEMENTATION” makes me think of a question: How data should be arranged on a SSD (Solid-State Drive)? This is indeed an old question, so after doing some research with Google, I find some very good explanations.
An Overview of Pages, Blocks and FTLs in a Solid-State Drive (SSD)
The two articles above describes how SSD works differently from a HDD. Some key points to take away are:
- The minimum read/write unit for a SSD is a page. A block is made up of a set of pages.
- A dirty page (with data) can not be overwritten before being erased.
- The minimum erase unit for a SSD is a block.
- Each block has a finite program/erase cycles (P/E cycles).
Within a SSD, data can only be erased by block. Garbage collection need to run to reclaim logically deleted pages (e.g., due to update). Therefore, data in blocks with deleted pages are packed and rewrite to another empty block. A piece of data might be rewritten over and over again, which is called the write amplification problem. This also leads to the fact that data is moving constantly which is quite different from data stored within a HDD.
Tables and indexes vs. HDD and SSD
This article above discussed about the strategy of storing table data and indexes on HDD vs. SSD. The results are clearly shown by those charts. Also, the discussion in the comments is worthwhile for reading.
Finally, I found a very interesting serial of blogs “Coding for SSDs”. The author built a key-value store optimized for SSDs. There are quite a lot of insights in these blogs.
In conclusion, SSDs outperform HDDs from almost every aspects today, except the price per bit. However, I envision that in the near future, the price could be made low enough to replace most HDDs. SSDs are almost drop-in replacement for HDDs. However, to get the best performance from SSDs, developers do need to take care about the data access characteristics of SSDs.