MRAMFS: A compressing file system for non-volatile RAM
Nathan K. Edel, Deepa Tuteja, Ethan L. Miller, and Scott A. Brandt in Proceedings of the 12th IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2004), Volendam, Netherlands, October 2004.
This paper allows me to provide both a file systems paper and look at an interesting approach to byte-addressable non-volatile memory (NVM).
We have developed a prototype in-memory file system which utilizes data compression on inodes, and which has preliminary support for compression of file blocks. Our file system, mramfs, is also based on data structures tuned for storage efficiency in non-volatile memory.
One of the interesting aspects of NVM is that it has characteristics of storage (persistence) and memory (byte-addressability). Storage people are used to having vast amounts of time to do things: it is quite difficult, though not impossible, to do anything computationally with data that will be an important factor when it is combined with the overhead of I/O latency to disk drives. In-memory algorithms worry about optimal cache line usage and efficient usage of the processor, but they don’t need to worry about what happens when the power goes off.
Bringing these two things together requires re-thinking things. NVM isn’t as fast as DRAM. Storage people aren’t used to worrying about CPU cache effects on data resilience.
So mramfs looks at this from a very file systems centric perspective: how do we exploit this nifty new memory to build a new kind of RAM disk: it’s still RAM but now it’s persistent. NVRAM is slower than DIMM and hence it makes sense to compress it to increase the effective data transfer rate (though it is not clear if that really will be the case.)
I didn’t find a strong motivation for compression, though I can see the viability of it now, in a world in which we want to pack as much as we can into a 64 byte cache line. The authors point out that one of the previous systems (Conquest) settled on a 53 byte inode size. The authors studied existing systems and found they could actually compress down to 20 bytes (or less) for a single inode. They achieved this using a combination of gamma compression and compressing common file patterns (mode, uid, and gid). Another reason for this approach is they did not wish to burden their file system with a computationally expensive compression scheme.
In Figure 1 (from the paper) the authors provide a graphic description of their data structures. This depicts a fairly traditional UNIX style file system, with an inode table, name space (directories), references from directory entries to the inodes. Inodes then point to control structures that eventually map to the actual data blocks.
The actual memory is managed by the file system from a single chunk of non-volatile memory; the memory is virtually addressed and the paper points out that they don’t actually care how that mapping is achieved.
Multiple inodes are allocated together in inode blocks with each block consisting of 16 (variable length) inodes. The minimum size of a block is 256 bytes. inodes are rewritten in place whenever possible, which can lead to slack space. If an inode doesn’t fit within its existing space, the entire block is reconstructed and then written to a new block. Aftewards, the block pointer is changed to point to the new block. Then the old block is freed.
One thing that is missing from this is much reasoning about crash consistency, which surprised me.
The authors have an extensive evaluation section, comparing to ext2fs, ramfs, and jffs2 (all over RAM disk). Their test was a create/unlink micro-benchmark, thus optimizing the meta-data insertion/deletion case. They then questioned their entire testing mechanism by pointing out that the time was also comparable to what they achieved using tmpfs building the openssl package from source. Their final evaluation was done without the compression code enabled (“[U]nfortuantely, the data compression code is not yet reliable enough to complete significant runs of Postmark or of large builds…”). They said they were getting about 20-25% of the speed without compression.
Despite this finding, their conclusion was “We have shown that both metadata and file data blocks are highly compressisble with little increase in code complexity. By using tuned compression techniques, we can save more than 60% of the inode space required by previous NVRAM file systems, and with little impact on performance.”
My take-away? This was an early implementation of a file system on NVM. It demonstrates one of the risks of thinking too much in file systems terms. We’ll definitely have to do better.