SILT: A Memory-Efficient, High-Performance Key-Value Store
Hyeontaek Lim, Bin Fan, David G. Andersen, Michael Kaminsky, in Proceedings of the 23rd Symposium on Operating Systems Principles (SOSP ’11), October 23-26, 2011, Cascais, Portugal.
In this paper, we have an interesting hybrid key-value store; part of the store is in DRAM, while the rest of it is stored in flash storage. The authors have focused on optimizing the performance of the store across the specific media chosen, which leads to some interesting choices and observations.
SILT (Small Index Large Table) is a memory-efficient, highperformance
key-value store system based on flash storage that
scales to serve billions of key-value items on a single node. It requires
only 0.7 bytes of DRAM per entry and retrieves key/value
pairs using on average 1.01 flash reads each. SILT combines new
algorithmic and systems techniques to balance the use of memory,
storage, and computation. Our contributions include: (1) the design
of three basic key-value stores each with a different emphasis on
memory-efficiency and write-friendliness; (2) synthesis of the basic
key-value stores to build a SILT key-value store system; and (3) an
analytical model for tuning system parameters carefully to meet the
needs of different workloads. SILT requires one to two orders of
magnitude less memory to provide comparable throughput to current
high-performance key-value systems on a commodity desktop
system with flash storage.
Thus, to achieve optimal performance they construct a hybrid ensemble of key-value stores that are optimized to particular underlying media.
They start early with their performance observations, and continue this pace throughout the paper. Thus, they point out that they have a small memory requirement (0.7 bytes per key value) and only a single read from flash required:
This fits well with their observation that memory efficiency is critically important for scalable key-value stores. Their system saturated the capabilities of their hardware with 46,000 lookups per second, though this doesn’t guarantee that it will do so on newer, faster hardware (and we will get to several current KV stores.
- LogStore – uses a CPU-efficient cuckoo-hash based only upon a tag derived from the full key (20 bytes). The benefit of this approach is that it avoids accessing the flash memory if the key is not present. It is probabilistic, in that a key might not be present in Flash with some low probability.
- HashStore – uses a memory-efficient storage model for their cuckoo flash table. These stores are immutable and are stored in flash.
- SortedStore – They use an entropy-coded trie, which provides a very space efficient indexing mechanism: the average according to the authors is 0.4 bytes per key. Given that keys are 160 bits (20 bytes), which means the index is 64 bits (8 bytes). The authors provide us with an impressive list of specific optimizations they have applied to sorting data, packing indexes, intelligent (non-pointer) based addressing,
They also provide us with pretty figures to graphically lay out their data structures. First, Figure 3, which shows the in-memory cuckoo hash table:
When we move onto HashStore, they show us how the structure of the data is moved from insertion order to hash sorted order as the data is moved from memory to flash.
Of course, one of the important aspects of a key-value store is the ability to look up the data quickly. Thus, they also show us how they sort data using a prefix tree (Trie). This is a key aspect of their SortedStore:
The in-memory Trie, permits them to rapidly find the corresponding data in Flash. This is one of their key motivations for very efficient encoding of the Trie: it ensures they can store it in memory. Once they can no longer store the indices in memory the lookup system will become I/O bound (to the Flash, in this case).
Figure 6 then shows how they actually encode the length of the tree in each direction, and how it is entropy encoded:There is some serious thought that has gone into their layout. They do admit, however, that they are not providing crash tolerance: there will be data that was stored in DRAM that cannot be recovered after a crash. They do not discuss the use of NVM for achieving this. Instead, they suggest a synchronous write mechanism in which the write blocks until the log data has been written to the flash-based log.
One of the interesting aspects of their performance evaluation is the recognition that they have quite a few parameters that must be chosen. Thus, they spend more than a page of the paper just discussing the relative merits of the trade-offs involved in selecting various parameters. They consider write amplification (the need to write more than just the data being preserved), read amplification (performing reads only to discover the data that you seek is not present), memory overhead (how much DRAM is used), and update rate versus flash life time. This latter point is an interesting one to keep in mind when reading about non-volatile memory: it wears out. There is a certain amount of degradation that occurs in the underlying physical medium each time it is erased and re-used. The specific details are dependent upon the characteristics of the medium, but it is an issue to keep in mind.
The evaluation of SILT is certainly interesting. They choose not to compare to any other KV system and instead focus on the performance of their system using a variety of fairly standard loads (YCSB) as well as simple get/put key operations as micro-benchmarks. They end up claiming they they can lookup almost 45k keys per second in the slowest of their hybrid stores. When combined, they indicate they can achieve almost 58k get operations per second on a single core. As they move to multiple cores they see some further improvement (180k get operations per second for data that is already in memory).
SILT offers some interesting observations, particularly about highly efficient use of memory and novel indexing schemes.