Home » Posts tagged 'fuse'

Tag Archives: fuse

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 204 other subscribers
May 2024
S M T W T F S
 1234
567891011
12131415161718
19202122232425
262728293031  

ZUFS

After one of my earlier posts on FUSE file system performance, someone mentioned this project to me – the Zero copy Userspace File System project (ZUFS) which appears to be a NetApp sponsored project.

Sometimes Zero is best
Sometimes Zero is best.

There have been a variety of talks about this project, including the Linux Plumber’s Conference (which was held next door to me – I can see the venue from my window as I write this), as well as the SNIA Persistent Memory Summit in 2018. The NetApp repositories on Github.com contain both a file system reflector (zufs-zuf), which appears to be similar to the FUSE kernel driver, as well as the user mode server (zufs-zus) which handles dispatching the kernel level requests to the user mode file system implementations.

Their concern appears to be eliminating the copy of any data between kernel and user mode, which makes sense given their objective of supporting persistent memory, such as the new Intel Optane DC Persistent Memory that has recently become commercially available.

Persistent memory benefits from a direct access model, in which traditional file data caching is eschewed in favor of direct access. Thus, data is read or written directly from the underlying persistent memory, rather than copied from a buffer cache.

There are a few persistent memory file systems, including UCSD’s NOVA file system, though usually they were developed using emulation of persistent memory. In such systems, there is no benefit to copying the data from persistent memory into DRAM and back; indeed, it is a significant performance impediment.

What is not currently present in the NetApp repository is an implementation of a user mode persistent file system (they have a dummy file system implementation, which appears to be the base from which one could build a real file system). This definitely presents an interesting alternative to using traditional FUSE.

Fuze vs ZUFS
FUSE vs ZUFS Performance (from NetApp SNIA presentation)

I have not had an opportunity to play with this new system yet, but it certainly does seem to be intriguing – and the performance graph from the SNIA presentation is rather compelling, given the massive improvement in scalable performance.

There sure are quite a few alternatives to traditional FUSE to consider…

Extension Framework for File Systems in User space

Extension Framework for File Systems in User space, Ashish Bijlani and Umakishore Ramachandran, USENIX Annual Technical Conference, 2019.

Useful Extensions

The idea of improving FUSE performance has become a common theme. This paper, which will be presented this week at USENIX ATC 2019 in Renton, WA, is one more to explore how we can improve FUSE performance.

One bit of feedback I received from the last FUSE performance paper I reviewed (last week) suggested that people do want to build file systems in user space for a variety of reasons, not the least of which is because they want to move that complexity out of the kernel environment. Thus, the argument is that the reason people build kernel file systems is because of performance. While I remain unconvinced that this is not the only impediment to a broader adoption of FUSE file systems, I will save that for a future discussion.

The approach the authors take this time does seem to try and bridge the gap: they’re proposal is to add kernel extensions that permit user mode file systems developers to add small modular components to the file system to optimize performance critical aspects. They address the increased security considerations inherent in allowing “kernel extensions” by sandboxing those extensions into an “in-kernel Virtual Machine (VM) runtime that safely executes the extensions”.

Their description of FUSE is quite a bit different than what I got from the FUSE performance paper at FAST 2018 – this paper describes FUSE as a “simple interposition layer”; the earlier description made it sound more complex than that. They do point out that FUSE file systems in production are becoming more common and point to Gluster, Ceph, and even Android’s SD card file system. For network file systems the overhead of FUSE is unlikely to have a material impact all but the most performance sensitive environments because the overhead of the network likely dominates. Similarly, SD card media is typically slow so once again the rate-limiting overhead is likely not the FUSE library and driver.

In addition to proposing an extension model, the authors also point out that there are a class of “unneeded” operations that are difficult to omit because the level of control offered by FUSE presently is not sufficiently fine grained enough; the authors propose enhancing FUSE to address these issues as well.

They set forth an interesting set of design considerations:

  • Compatibility – their observation is that the extension model must be something that works with existing file systems without requiring redesign or extensive coding.
  • Extensibility – the features offered by ExtFuse must allow adding specific features in a clean, minimalistic fashion, so that a FUSE file system developer can pick the specific features needed for their use case.
  • Safe and Performant – these are competing goals; the primary purpose of their work is to improve performance but they cannot do so at the expense of sacrificing security.
  • Correctness – they point out the challenge of having two operational paths (the “fast” path and the “slow” path, where the latter corresponds to the legacy path)
(Figure 1 from Paper)

The authors’ provide a graphical description of the architecture of their system in Figure 1 of the paper, which I have reproduced here. It shows the fact there are dual paths: the traditional FUSE path, as well as their accelerated path.

They move on to describe the extensions they implemented to demonstrate the range of functionality with their extension model:

  • Meta-data caching – the idea is that VFS itself cannot do effective caching due to the nature of its interface; the tighter interface between the extension and the user mode file system make this more practical.
  • I/O stacking – the concept here is that data may have multiple processing layers, such as logging, or union file systems. By permitting the extension to handle this, the overhead is minimized; indeed, this reminded me of the Scout Operating Systems work, which focuses on constructing optimized pipelines for such work.

Their evaluation focuses on a handful of critical operations: getattr, setattr, getxattr, and read/write. They looked at a mix of optimization models: the use of a smart attribute cache is clearly a win based upon their performance analysis. FUSE remains slower than a native file system in many scenarios however (e.g., they use EXT4 as a benchmark comparison) though the performance seems to be much closer than we’ve seen in prior work.

They also ported multiple different file systems to their extension library: StackFS, BindFS, Android’s sdcard file system, MergerFS, and LoggedFS. None of them required even 1,000 lines of new code for the kernel extensions. While the authors do discuss some of the observed performance improvements for those file systems, they do not provide us with general benchmark comparisons.

Overall, this is an interesting paper, which combines a number of ideas together into an intriguing package. It will be interesting to see if this gains traction in the FUSE community.

To FUSE or Not to FUSE: Performance of User-Space File Systems

To FUSE or Not to FUSE: Performance of User-Space File Systems
Bharath Kumar Reddy Vangoor, Vasily Tarasov, and Erez Zadok,
in The 15th USENIX Conference on File and Storage Technologies (FAST ’17),
February 27 – March 2, 2017, Santa Clara, CA, USA.

Previously, I discussed some of the rationale behind FUSE and a basic introduction to why we use it. This paper is actually a detailed analysis of the performance of FUSE. I have spent a fair bit of time reading – and re-reading – this paper, in order to understand what some of the bottlenecks are within FUSE itself. One area I have previously explored are possible ways of improving its performance; indeed, this is one of my current projects.

Figure 1 from original Vangoor Paper

Figure 1 in the paper is actually a basic block diagram providing an interface model for how FUSE fits into the file systems layer. FUSE consists of:

  • The kernel mode FUSE driver; on Linux this is part of the kernel; and
  • The user mode FUSE library. This handles interactions between the FUSE file system driver and the user mode file system process (referred to as a “daemon” or “background process”) in the diagram.
  • The user mode FUSE file system itself – this is the code that implements the FUSE library interface (one of them, since there are two different interfaces).

Applications can then access this FUSE file system without knowing any details of the implementation.

FUSE kernel driver queuing model.

In Figure 2 from the paper, the authors show the internal structure of how the FUSE kernel driver manages various internal data structures that handle events between the user mode file system and the kernel mode file system support. This includes the messages between the kernel and user mode application (requests and their corresponding replies) as well as synchronous and asynchronous file system operations and the cache invalidation mechanisms (“forgets”).

Caching is an essential part of the system because it allows the system to quickly respond to repetitive events. The downside to caches are that they can become invalid because the underlying state changes. Thus, there is a mechanism for invalidating the cache itself.

The author describes the model within the FUSE library and the two interfaces it exposes: the low level API, which provides greater control to the user mode file system, at the cost of more state management – specifically the mapping of a path name to the inode (index node).

The authors describe how they constructed a new FUSE file system, StackFS, for evaluating the behavior of the system. StackFS sits on top of an existing file system and attempts to minimize the amount of mapping it performs, since the goal of the authors is to evaluate the performance of FUSE itself.

Table 3 from the paper

The authors summarize their findings in Table 3, using both a hard disk and an SSD, the two most common types of storage media used on modern computer systems.

These results were both surprising and interesting to me because some of them surprised me. The performance bottlenecks were not for I/O as much as they were for meta-data operations. Random write performance (which is what we see with databases, for example) is not ideal, but their optimizations did a good job of addressing this, bringing the I/O overhead of the FUSE model down substantially relative to the native file system.

Bottom line: the challenge in improving FUSE performance now moves squarely into the arena of meta-data operations. Creating and deleting files is quite expensive in FUSE.

The authors conclude by pointing out that there is further room for improvement; they suggest some potential future directions. I have been looking at ways to improve this as well and I will discuss those in a future post.

FUSE: File Systems in User Space

File systems are notoriously difficult to implement: of all the pieces that appear in an operating system, they have the highest quality bar and are often called upon more than almost any other part of the operating system; virtual memory management may be called upon more.

Of course, the fact that modern operating systems tend to make the boundaries of file systems and virtual memory a bit fuzzy doesn’t really diminish their difficulty.

So, what makes building a file system challenging?

  • Persistence – file systems do things “for keeps”. If you build an application program, you can quit when things go bad. The user can just restart from scratch. A crashing application might leave its own files damaged and require recovery. When a file system gets it wrong (particularly a physical media file system) it can wipe out all the files.
  • Multi-threading – modern operating systems are heavily multi-threaded and often performing I/O. Frequently, that I/O is going through the file system. A multi-threaded application needs to worry about its own data. A file system needs to worry about its own data that’s being accessed by numerous threads across numerous applications.
  • Security – modern operating systems are written with the idea of multi-tenancy in mind. We can use lots of different isolation techniques to help mitigate some of these problems but file systems are fundamentally a layer that revolves around sharing data. I could (and have and probably will again) argue that sharing data might not always be what we want to do. Remember the CAP file system? One of its interesting features was that it did attempt to hide data and only expose it via capabilities.
  • Performance – file systems are extremely performance sensitive. Hard disks are slow, with high latencies and modest bandwidth. Of course, SSDs have fixed some of that, with lower latencies and higher bandwidth. They do introduce their own issues that file systems have had to adapt to handle.
  • Features – each feature you add in a file system tends to create an interference pattern with other features. For example, NTFS implements a file caching scheme for applications called oplocks (opportunistic locks). It also implements byte range locks. The two were not compatible. Then a new set of oplocks were added and they became compatible. The old oplocks are supported, the new oplocks are supported. The good news is that very few applications use byte range locks. It’s not the only example, it’s just one example.

In my experience, the file systems development cycle starts out with a bold new design: we’re going to fix all of the ills of prior file systems and/or implement bold, new features. We’ll be faster and more capable. We’ve studied the work that’s already been done and we know that we can do it better. Then you build your bold new design and begin to construct your file system. Eventually, you get to a point where it starts to work. Each new feature you add has side-effects that ripple through the code base. You begin to evaluate your file system and you realize it is slow. So you go through some cycles of iterative tuning. These introduce performance at the cost of complexity.

You find out about failure cases you hadn’t really understood before; maybe you read about them. You experience failures that you’d never heard of before – only to realize when you’re reading some other paper that they’re describing the same problem. That’s happened to me – the other paper said “we had these weird deadlocks, so we just increased the number of buffers we used and it went away.” We actually built a robust reservation scheme to ensure we’d never hit that deadlock.

Deadlocks in file systems are just part of life. You wanted performance so you added fine-grained data structure locking. Then you realized you had special cases where you had re-entrant calls. You find that you can have a thread moving file a to b at the same time that some other thread is moving file b to a. How do you lock that properly? In the past, I’ve built entire mechanisms for tracking and enforcing lock hierarchies, dealing with re-entrant calls, and ensuring we don’t deadlock.

So you performance tune, find and fix bugs, increase your parallelism, and continue your relentless march to victory. You learn about reference counting bugs (the ABA problem, for example, which I’ve seen in practice when trying to decrement and delete the reference count). You create interesting solutions that allow parallelism in all but the cases where it really matters.

You get old and grey in the process. You learn how to look at a damaged meta-data structure on disk and in your head theorize how that might happen, then you go look at the code and see if you can find that path.

All you wanted to do was build a file system. Moving a file system into user space is a very micro-kernel like thing to do; I’ve worked on micro-kernels where the file system was in user space. If it crashes, it doesn’t bring the machine down. The only threads it has to really worry about are those it creates. Maybe it isn’t as fast. Maybe it isn’t as challenging to build.

File Systems in User Space (FUSE) is a framework in which a kernel component interacts with an application program – the user-mode file system – and presents it to applications so that it looks much like a file system.

FUSE doesn’t fix all the challenges of building file systems, but it does address some of them. Security is addressed by restricting the file system process and the applications to belonging to the same security entity (“user”). The tools available are often easier for the debugging process; testing in user space is simpler due to the availability of test harnesses (kernel file systems can be run for testing purposes in user space as well, as I’ve done it before. Most of the file system logic isn’t tied to any specific operating mode.)

FUSE is a wildly popular interface. The last time I looked on Github.com the number of FUSE file systems numbered in the hundreds. At one point I was working on cataloguing them, more out of curiosity than anything else. Indeed, that might make a good write-up for some future post

Applications transparently use a FUSE file system because the file system supports the standard file systems interfaces. To the applications, it really does just look like another file system, mounted somewhere in the name space of the operating system itself, such as mounted on a directory in Linux or UNIX. Windows uses mount points and symbolic links to make file systems visible to applications. In either case, applications are oblivious to the fact that these file systems are implemented in user space.

The biggest advantage of this model is that building a new file system via FUSE is simpler. It isn’t going to win any performance records, it won’t be bootable, and its usage model is much more limited than a “general purpose” file system, but often that’s all that is required. For example, there are multiple implementations of a file system on top of Amazon’s Simple Storage Service (S3) – mostly because it provides a simple-to-use interface that works with existing tools. It certainly is not a performance-oriented approach, but often performance is not actually that important for a specialized service.

I will discuss some of the issues with fuse, and some potential solutions, in a future post.

https://github.com/libfuse/libfuse
https://github.com/osxfuse
https://github.com/billziss-gh/winfsp