Home » 2019 » July

Monthly Archives: July 2019

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 204 other subscribers
July 2019
S M T W T F S
 123456
78910111213
14151617181920
21222324252627
28293031  

360° Semantic File System: Augmented Directory Navigation for Nonhierarchical Retrieval of Files

360° Semantic File System: Augmented Directory Navigation for Nonhierarchical Retrieval of Files, Syed Rahman Mashwani and Shah Kusro, in IEEE Access, January 29, 2019.

360° Perspective
360° Perspective

This paper makes some interesting observations that resonate with my own research observations, though I will end up arguing (in a future blog post) that they don’t go far enough. But they do a good job of laying out the problem and why some solutions do not work. One of the common themes I have heard when discussing my own work is an insistence there really isn’t a problem, though usually a longer conversation ends up with us agreeing things could be done better.

The abstract is a bit long, but clearly describes the essence of the paper:

The organization of files in any desktop computer has been an issue since their inception. The file systems that are available today organize files in a strict hierarchy that facilitates their retrieval either through navigation, clicking directories and sub-directories, in a tree-like structure or by searching (which allows for finding of the desired files using a search tool). Research studies show that the users rarely (4% – 15%) use the latter approach, thus leaving navigation as the main mechanism for retrieving files.
However, navigation does not allow a user to retrieve files nonhierarchically, which makes it limited in terms of time, human effort, and cognitive overload. To mitigate this issue, several semantic file systems (SFSs) have been periodically proposed that have made the nonhierarchical navigation of files possible by exploiting some basic semantics but no more than that. None of these systems consider aspects such as time, location, file movement, content similarity, and territory together with learning from user file retrieval behaviors in identifying the desired file and accessing it in less time and with minimum human and cognitive efforts.
Moreover, most of the available SFSs replace the existing le system metaphor, which is normally not acceptable to users. To mitigate these issues, this research paper proposes 360 SFS that exploits the SFS ontology to capture all the possible relevant file metadata and learns from user browsing behaviors to semantically retrieve the desired files both easily and timely. Based on user studies, the evaluation results show that the proposed 360 SFS outperforms the existing traditional directory navigation and recently open files.

Paper Abstract

Of course, the problem existed before the appearance of the desktop computer: the original UNIX contained the permuted index server, which suggests to me that even in 1973 people were struggling to find things. What I do find interesting is the observation that people really do not like search – I recently described this insightful (to me at least) observation. Here it is once again, complete with references (in the paper) to prior work demonstrating this point. Indeed, it suggests to me one alternative explanation as to why the Google Desktop Search project was ultimately cancelled – not because it was “no longer necessary” but rather because it wasn’t useful to most computer users.

Another thing that I have observed, repeatedly, when working with students, is there seems to be a natural aversion to searching for answers. Thus, students will post on class forums (such as Piazza, with which I am most familiary) asking questions, even if the question has already been asked and answered. Searching for the answer does not seem to come naturally to such students. Indeed, I often find myself using search engines to find the answer and giving the results back to the original question. I have wondered about this “laziness” in the past and with this new insight I wonder if it is just because people prefer to navigate – and being told where to go is certainly one form of navigation.

Interestingly, like the Graph File System paper I described previously, these authors also argue that we cannot abandon the hierarchical view of files. I am not convinced of this, but I can understand the appeal of starting from it as a basic premise. We have been doing some work recently on a graph visualization model for the file system, more as a prototype, but it is surprisingly functional and encouraging us to look at alternative visualizations of file system data for navigation. In other words, thinking of the problem as a search problem ultimately seems to be the wrong path – yet that is the point of things like semantic file systems, to improve search.

The paper has an extensive review of prior work, much of which I’ve also previous described, though there were a few systems I had not previously seen. Table 1 of the paper has a comparison of features across the various file systems. Thus, the authors distinguish themselves from prior by focusing on providing enhanced functionality, using auxiliary directories, in which they display related content. They focus on:

  • Temporal characteristics – they focus on when files are being used, not merely frequency. This is an idea we’ve been exploring as well.
  • Geographical location – this is intriguing; identifying where the user was when they accessed a given file.
  • File movement – when files are reorganized and moved around.
  • File access patterns – they cluster files as related based upon the temporal proximity of their access; another idea we’ve been exploring. I found it insightful they describe this as a “relationship” though they do not explore a broader range of relationships.
  • Content similarity – files that are identical or substantially similar can be associated together; this is another technique that we’ve been actively investigating.
  • Manual tagging

They describe the file system they implement, which is essentially a layered file system in which they add two virtual directories: NOW and TAGS. They describe the interface for adding this information as well, which I found cumbersome, but it is in keeping with their goal of not deviating from the existing hierarchical interface. They do also permit the creation of custom virtual directories as well, though that is only briefly mentioned in the paper.

One of the problems they highlight, which resonated with me because we’ve been discussing the same problem, is how much information to display to users – in essence, when confronted with too many options, users quickly become overwhelmed.

Their evaluation focuses on the amount of time it took users to locate their files when using their enhanced file systems model and they lay out a case for the fact their system works well for their study group. One limitation of their study group is that it is based upon an experienced computer user group, but this is reflective of their environment.

One interesting comment was that while they used Linux for their evaluation, Windows would have been a better platform because of its broader usage within their organization. I have wondered how much the use of Linux tends to create a bias in an evaluation of this type, since most people are using Windows or Apple computers. Would the results be similar?

The authors do point to their open source implementation of their file system. I have not yet evaluated it, but it is definitely something on my (all too long) list of things to do.

ZUFS

After one of my earlier posts on FUSE file system performance, someone mentioned this project to me – the Zero copy Userspace File System project (ZUFS) which appears to be a NetApp sponsored project.

Sometimes Zero is best
Sometimes Zero is best.

There have been a variety of talks about this project, including the Linux Plumber’s Conference (which was held next door to me – I can see the venue from my window as I write this), as well as the SNIA Persistent Memory Summit in 2018. The NetApp repositories on Github.com contain both a file system reflector (zufs-zuf), which appears to be similar to the FUSE kernel driver, as well as the user mode server (zufs-zus) which handles dispatching the kernel level requests to the user mode file system implementations.

Their concern appears to be eliminating the copy of any data between kernel and user mode, which makes sense given their objective of supporting persistent memory, such as the new Intel Optane DC Persistent Memory that has recently become commercially available.

Persistent memory benefits from a direct access model, in which traditional file data caching is eschewed in favor of direct access. Thus, data is read or written directly from the underlying persistent memory, rather than copied from a buffer cache.

There are a few persistent memory file systems, including UCSD’s NOVA file system, though usually they were developed using emulation of persistent memory. In such systems, there is no benefit to copying the data from persistent memory into DRAM and back; indeed, it is a significant performance impediment.

What is not currently present in the NetApp repository is an implementation of a user mode persistent file system (they have a dummy file system implementation, which appears to be the base from which one could build a real file system). This definitely presents an interesting alternative to using traditional FUSE.

Fuze vs ZUFS
FUSE vs ZUFS Performance (from NetApp SNIA presentation)

I have not had an opportunity to play with this new system yet, but it certainly does seem to be intriguing – and the performance graph from the SNIA presentation is rather compelling, given the massive improvement in scalable performance.

There sure are quite a few alternatives to traditional FUSE to consider…

A Comparison of Two Network-Based File Servers

A Comparison of Two Network-Based File Servers
James G. Mitchell and Jeremy Dion, in Communications of the ACM, April 1982, Volume 25, Number 4.

PAir of File Servers

I previously described the Cambridge File Server (CFS).  In this 1981 SOSP paper the inner details of it and the Xerox Distributed File System (XDFS) are compared.  This paper provides an interesting insight into the inner workings of these file servers.

Of course, the scale and scope of a file server in 1982 was vastly smaller than the scale and scope of file servers today.  In 1982 the disk drives used for their file servers were as large as 300MB.

SD Cards

This stands in stark contract to the sheer size of modern SD cards; I think of them as slow but compared to the disk drives of that era they are quite a bit faster not to mention smaller.  I suspect the authors of this paper might be rather surprised at how the scale has changed, yet many of the basic considerations they were making back in the early 1980s are still important today.

 

  • Access Control (Security) – CFS was, of course, a capability based system. XDFS was an identity based system; most systems today are identity based systems, though we find aspects of both in use.
  • Storage Management – the interesting challenge here is how to ensure that storage is not wasted. The naive model is to shift responsibility for proper cleanup to the clients. Of course, the reality is that this is not a good model; even in the simple case of a client that crashes, it is unlikely the client will robustly ensure that space is reclaimed in such circumstances. CFS handles this using a graph file system and performing garbage collection in which an unreachable node is deemed subject to reclamation. XDFS uses the more naive model, but mitigates this by providing a directory service that can handle proper cleanup for clients – thus clients can “do it right” with minimal fuss, but are not constrained to do so.
  • Data Consistency – the authors point to the need to have some form of transactional update model. They observe that both CFS and XDFS offer atomic transactions; this represents the strong semantic end of the design spectrum for network file servers and we will observe that one of the most successful designs (Sun’s NFS) went to a much weaker end of the design spectrum. Some of this likely reflects the database background of the authors.
  • Network Protocols – I enjoyed this section, since this is very early networking, with CFS using the predecessor of token ring and XDFS using the 3Mb/s version of Ethernet. They discuss the issues inherent in the network communcations: flow and error control (so message exchange and exception/error handling) and how the two respective systems handle them

The authors also compare details of the implementation:

  • They describe a scheme in CFS in which small files use a direct block, and larger files use indirect blocks (blocks of pointers to direct blocks). This means that small files are faster. It is similar to the model that we see in other (later) file systems, while XDFS uses binary tree, used to track allocation of blocks to files, and a bitmap, used to indicate free/used space information.
  • They discuss redundancy, with an eye towards handling (partial) disk failures. Like any physical device, the disk drives of that era did wear out and fail.
  • They discuss their transaction log and how each system guaranteed consistency: they both use shadow pages, but their implementation of them is different. Ultimately, they both have similar issues, and similar impact. Shadow pages are a technique that we still use.

The evaluation is interesting: it is not so much a measure of performance but rather insights into the strengths and weaknesses of each approach. For XDFS they note that their transaction support has been successful and it permits database transactions (in essence, XDFS becomes a form of simple database service). They point to the lack of support for both normal and special files; from their description a special file is one with guaranteed write semantics. They also observe that ownership of files is easily lost, which in turn leads to inefficient storage utilization. They observe that it is not clear if the B-tree is win or lose of XDFS.

For CFS they point to the performance requirements as being a strength, though it sounds more like a design constraint that forced the CFS developers to make “hard choices” to optimize for performance. Similarly, they observe that the directed graph model of CFS is successful and capabilities are simple to implement. Interestingly, they also point to the index as well as string of names and access rights as being a success point. They also point to the fact that CFS generalizes well (“[t]wo quite different filling systems built in this way coexist on the CFS storage.”) They also point to automatic garbage collection as being a net win for CFS, though they also point out that CFS uses a reference count in addition to the garbage collection model. They list the CFS limitation of transactions to a single file or index as being one of its shortcomings and point to real-world experience porting other operating systems to use CFS as an indicator of the cost of this limitation. Interestingly, the limitation they point to (“… since file directories are implemented as an index with an associated file, it is currently impossible to update both structures in a single transaction.”) They conclude by arguing that XDFS has a better data layout, arguing that XDFS’s strategy of page allocation and intention logging is ultimately better than CFS’s cylinder maps: “… the redundancy function of cylinder maps does not seem to be as successful as those of page allocation and intentions logging; the program to reconstruct a corrupted block is not trivial.”

Ensuring correct recovery in a transactional system certainly challenging in my experience, so I can understand the authors’ concerns about simplicity and scalability.

Overall, it is an interesting read as I can see may of the issues described here as being file systems issues; many of the techniques they describe in this paper show up in subsequent file systems. The distinction between file system and file server also becomes more clearly separated in future work.

Extension Framework for File Systems in User space

Extension Framework for File Systems in User space, Ashish Bijlani and Umakishore Ramachandran, USENIX Annual Technical Conference, 2019.

Useful Extensions

The idea of improving FUSE performance has become a common theme. This paper, which will be presented this week at USENIX ATC 2019 in Renton, WA, is one more to explore how we can improve FUSE performance.

One bit of feedback I received from the last FUSE performance paper I reviewed (last week) suggested that people do want to build file systems in user space for a variety of reasons, not the least of which is because they want to move that complexity out of the kernel environment. Thus, the argument is that the reason people build kernel file systems is because of performance. While I remain unconvinced that this is not the only impediment to a broader adoption of FUSE file systems, I will save that for a future discussion.

The approach the authors take this time does seem to try and bridge the gap: they’re proposal is to add kernel extensions that permit user mode file systems developers to add small modular components to the file system to optimize performance critical aspects. They address the increased security considerations inherent in allowing “kernel extensions” by sandboxing those extensions into an “in-kernel Virtual Machine (VM) runtime that safely executes the extensions”.

Their description of FUSE is quite a bit different than what I got from the FUSE performance paper at FAST 2018 – this paper describes FUSE as a “simple interposition layer”; the earlier description made it sound more complex than that. They do point out that FUSE file systems in production are becoming more common and point to Gluster, Ceph, and even Android’s SD card file system. For network file systems the overhead of FUSE is unlikely to have a material impact all but the most performance sensitive environments because the overhead of the network likely dominates. Similarly, SD card media is typically slow so once again the rate-limiting overhead is likely not the FUSE library and driver.

In addition to proposing an extension model, the authors also point out that there are a class of “unneeded” operations that are difficult to omit because the level of control offered by FUSE presently is not sufficiently fine grained enough; the authors propose enhancing FUSE to address these issues as well.

They set forth an interesting set of design considerations:

  • Compatibility – their observation is that the extension model must be something that works with existing file systems without requiring redesign or extensive coding.
  • Extensibility – the features offered by ExtFuse must allow adding specific features in a clean, minimalistic fashion, so that a FUSE file system developer can pick the specific features needed for their use case.
  • Safe and Performant – these are competing goals; the primary purpose of their work is to improve performance but they cannot do so at the expense of sacrificing security.
  • Correctness – they point out the challenge of having two operational paths (the “fast” path and the “slow” path, where the latter corresponds to the legacy path)
(Figure 1 from Paper)

The authors’ provide a graphical description of the architecture of their system in Figure 1 of the paper, which I have reproduced here. It shows the fact there are dual paths: the traditional FUSE path, as well as their accelerated path.

They move on to describe the extensions they implemented to demonstrate the range of functionality with their extension model:

  • Meta-data caching – the idea is that VFS itself cannot do effective caching due to the nature of its interface; the tighter interface between the extension and the user mode file system make this more practical.
  • I/O stacking – the concept here is that data may have multiple processing layers, such as logging, or union file systems. By permitting the extension to handle this, the overhead is minimized; indeed, this reminded me of the Scout Operating Systems work, which focuses on constructing optimized pipelines for such work.

Their evaluation focuses on a handful of critical operations: getattr, setattr, getxattr, and read/write. They looked at a mix of optimization models: the use of a smart attribute cache is clearly a win based upon their performance analysis. FUSE remains slower than a native file system in many scenarios however (e.g., they use EXT4 as a benchmark comparison) though the performance seems to be much closer than we’ve seen in prior work.

They also ported multiple different file systems to their extension library: StackFS, BindFS, Android’s sdcard file system, MergerFS, and LoggedFS. None of them required even 1,000 lines of new code for the kernel extensions. While the authors do discuss some of the observed performance improvements for those file systems, they do not provide us with general benchmark comparisons.

Overall, this is an interesting paper, which combines a number of ideas together into an intriguing package. It will be interesting to see if this gains traction in the FUSE community.

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support, Yue Zhu, Teng Wang, Kathryn Mohror, Adam Moody, Kento Sato, Muhib Khan, and Weikuan Yu, in Proceedings of the 8th International Workshop on Runtime Operating Systems for Supercomputers, page 6, 2018.

Modern Fuse, circuit breakers instead of actual fuses.
Modern Fuse

There are quite a few papers that discuss the performance of the FUSE model. I already discussed a recent paper that explored the performance of FUSE on Linux and that paper observed that I/O performance for FUSE is reasonably good due to the optimization work that has been done to minimize the data copy overhead that can occur with a naive implementation.

What I do find surprising is the emphasis on FUSE performance; this leads me to think that people look to user mode file systems as something viable for implementing production file systems. Of course, one motivation for this is that building a FUSE file system is generally simpler than implementing an in-kernel file system. Some of this is environmental – the kernel is a harsh development environment, in which the smallest bugs lead to the system crashing.

Of course, virtual machine technologies have done quite a lot to minimize this overhead, as the “machine” that crashes is now more like an application. If you are developing code for the UNIX, Linux, or Windows kernel you are likely to be developing using C, the most commonly used systems language these days. It is possible to bravely branch out and use other languages, but then you inherit other interesting restrictions and frequently find that you are developing the tools as much as you are developing the file system.

Thus, one benefit of the user space file systems model is that you can use other development tools – FUSE file system implementations us a much larger range of programming languages than is normally found in kernel file systems. The FUSE model also permits fairly rapid development of a prototypical file system.

Today’s paper touches on these traditional issues and points out that sometimes what you need isn’t a general-purpose file system but rather something that is specifically crafted to solve the problem at hand. For the HPC community, performance is an important driver for the specialized file systems of choice. The authors’ use an optimized library, libsysio, that provides a POSIX-like interface which intercepts I/O operations to a remote file system – in essence, a sort of automated mechanism for turning I/O calls into something reminiscent of RPC.

The emphasis of the authors is in eliminating the overhead of system calls. Their approach is certainly focused: this solution works for a single application that requires high performance operations.

They start off by evaluating the cost overhead of using FUSE. Because their emphasis is on I/O, that is what they evaluate. Thus, unlike the earlier FUSE analysis, which indicated that meta-data operations were the most significant bottleneck, this work concludes there is still substantial impact on I/O performance as well.

They take an existing library from Sandia Labs, libsysio. I found multiple different versions of this library available on the Internet and was interested to find that it has been integrated into other file systems, including Lustre, with which I have some familiarity from past work. The authors’ don’t discuss if their approach is better than using other HPC file systems, focusing on improving the performance of their specific use case.

One interesting design consideration for Direct-FUSE is they seek to support multiple FUSE file systems from a single application, using the same high performance communications approach. This is not usually an issue for applications with pure FUSE file systems because to the application the FUSE file system appears to be functionally equivalent to every other file system. This is, however, an issue that can arise when incorporating multiple I/O library based models into a single application; something they address in Direct-FUSE.

They describe their implementation model for supporting multiple distinct file systems, differentiating between file systems via a prefix matching model, and then forwarding name based requests as appropriate. File handle based operations work by using an indirection table for encapsulating the additional state needed to determine which file system should be used to satisfy requests against the particular file handle.

Much of the paper focuses on the evaluation of their solution. In keeping with their focus on raw I/O performance, the evaluation is all about bandwidth at various I/O sizes. Their results indicate that they are able to achieve performance that is comparable to similar native file systems (they use ext4 and tmpfs implementations for these benchmarks). Thus, they demonstrate that their approach has comparable performance to the native ext4 and tmpfs implementations.

They also compare their performance in the distributed file systems arena using FusionFS, an existing FUSE file system. They show comparable performance for read I/O bandwidth (including scalability to multiple nodes) as well as improved write I/O bandwidth.

They then evaluate the context switch difference between the two solutions (FUSE and Direct-FUSE) and observe that they have eliminated the context switch overhead.

Bottom line, they have found a way to improve performance over traditional FUSE file systems. They do not compare to other HPC oriented file system (e.g., Lustre) and thus it is difficult for me to tell if this is a viable contender for larger scale distributed file systems work. Nevertheless, they do point out the impact of the context switch costs inherent in the traditional FUSE model.

I am left asking myself “is the goal to make FUSE performance close enough to native kernel file systems that it makes sense to simply implement in FUSE?” Since they only focus on I/O bandwidth, I am not sure if they will achieve this goal for broader benchmarks.