Category Archives: Name Spaces

April 2024
S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Better Finding: Combine Semantic and Associative Context with Indaleko

January 11, 2022

Last month I presented my thesis proposal to my PhD committee. My proposal doesn’t mean that I am done, rather it means that I have more clearly identified what I intend to make the focus of my final research.

It has certainly taken longer to get to this point than I had anticipated. Part of the challenge is that there is quite a lot of work that has been done previously around search and semantic context. Very recent work by Daniela Vianna relates to the use of “personal digital traces” to augment search. It was Dr. Vianna’s work that provided a solid theoretical basis for my own proposed work.

Our computer systems collect quite an array of information, not only about us but also about the environment in which we work.

In 1945 Vannevar Bush described the challenges to humans of finding things in a codified system of records. His observations continue to be insightful more than 75 years later:

Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in nature.

I find myself returning to Bush’s observations. Those observations have led me to ask if it is possible for us to build systems that get us closer to this ideal?

My thesis is that collecting, storing, and disseminating information about the environment in which digital objects are being used provides us with new context that enables better finding.

So, my proposal is about how to collect, store, and disseminate this type of external contextual information. I envision combining this with existing data sources and indexing mechanisms to allow capturing activity context in which digital objects are used by humans. A systems level service that can do this will then enable a broad range of applications to exploit this information to reconstruct context that is helpful to human users. Over my next several blog posts I will describe some ideas that I have with what I envision being possible with this new service.

The title of my proposal is: Indaleko: Using System Activity Context to Improve Finding. One of the key ideas from this is the idea that we can collect information the computer might not find particularly relevant but the human user will. This could be something as simple as the ambient noise in the user’s background (“what music are you listening to?” or “Is your dog barking in the background”) or environmental events (“it is raining”) or even personal events (“my heart rate was elevated” or “I just bought a new yoga mat”). Humans associate things together – not in the same way, nor the same specific elements – using a variety of contextual mechanisms. My objective is to enable capturing data that we can then use to replicate this “associative thinking” that helps humans.

Ultimately, such a system will help human users find connections between objects. My focus is on storage because that is my background: in essence, I am interested in how the computer can extend human memory without losing the amazing flexibility of that memory to connect seemingly unrelated “things” together.

In my next several posts I will explore potential uses for Indaleko.

Laundry Baskets: The New File System Namespace Model

September 28, 2021

A large pile of laundry in a laundry basket, with a cat sleeping on the top. — The “Laundry Basket” model for storage.

While I’ve been quiet about what I’ve been doing research-wise, I have been making forward progress. Only recently have ideas been converging towards a concrete thesis and the corresponding research questions that I need to explore as part of verifying my thesis.

I received an interesting article today showing that my research is far more relevant than I’d considered: “FILE NOT FOUND“. The article describes that the predominant organizational scheme for “Gen Z” students is the “Laundry Basket” in which all of their files are placed. This is coming as a surprise to people who have been trained in the ways of the hierarchical folder metaphor.

While going through older work, I have found it is intriguing that early researchers did not see the hierarchical design as being the pinnacle of design; rather they saw it as a stop-gap measure on the way to richer models. Researchers have explored richer models. Jeff Mogul, now at Google Research, did his PhD thesis around various ideas about improving file organization. Eno Thereska, now at Amazon, wrote an intriguing paper while at Microsoft Research entitled “Beyond file systems: understanding the nature of places where people store their data” in which he and his team pointed out that cloud storage was creating a tension between file systems and cloud storage. The article from the Verge that prompted me to write this post logically makes sense in the context of what Thereska was saying back in 2014.

The challenge is to figure out what comes instead. Two summers ago I was fortunate enough to have a very talented young intern working with me for a couple months and during that time one of the interesting things he built was a tool that viewed files as a graph rather than a tree. The focus was always at the center, but then it would be surrounded by related files. Pick one of those files and it became the central focus, with a breadcrumb trail showing how you got there but also showing other related files.

The relationships we used were fairly simple and extracted from existing file meta-data. What was actually quite fascinating about it though was that we constructed it to tie two disjoint storage locations (his local laptop and his Google Drive) together into a single namespace. It was really an electrifying demonstration and I have been working to figure out how to enable that more fully – what we had was a mock-up, with static information, but the visualization aspects of “navigating” through files was quite powerful.

I have been writing my thesis proposal, and as part of that I’ve been working through and identifying key work that has already been done. Of course my goal is to build on top of this prior work and while I have identified ways of doing this, I also see that to be truly effective it should use as much of the prior work as possible. The idea of not having directories is a surprisingly powerful one. What I hadn’t heard previously was the idea of considering it to be a “laundry basket” yet the metaphor is quite apt. Thus, the question is how to enable building tools to find the specific thing you want from the basket as quickly as possible.

For example, the author of the Verge article observed: “More broadly, directory structure connotes physical placement — the idea that a file stored on a computer is located somewhere on that computer, in a specific and discrete location.” Here is what I recently wrote in an early draft of my thesis proposal: “This work proposes to develop a model to separate naming from location, which enables the construction of dynamic cross-silo human usable name-spaces and show how that model extends the utility of computer storage to better meet the needs of human users.”

Naming tied to location is broken, at least for human users. Oh, sure, we need to keep track of where something is stored to actually retrieve the contents, but there is absolutely no reason that we need to embed that within the name we use to find that file. One reason for this is that we often choose the location due to external factors. For example, we might use cloud storage for sharing specific content with others. People that work with large data sets often use storage locations that are tuned to the needs of that particular data set. There is, however, no reason why you should store the Excel spreadsheet or Python notebook that you used to analyze that data in the same location. Right now, with hierarchical names, you need to do so in order to put them into the “right directory” with each other.

That’s just broken.

However, it’s also broken to expect human users to do the grunt work here. The reason Gen Z is using a “laundry basket” is because it doesn’t require any effort on their part to put something into that basket. The work then becomes when they need to find a particular item.

This isn’t a new idea. Vannevar Bush described this idea in 1945:

“Consider a future device for individual use, which is a sort of
mechanized private file and library. It needs a name, and, to coin
one at random, ”memex” will do. A memex is a device in which
an individual stores all his books, records, and communications,
and which is mechanized so that it may be consulted with exceeding
speed and flexibility. It is an enlarged intimate supplement to his
memory.”

He also did a good job of explaining why indexing (the basis of hierarchical file systems) was broken:

“Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

“The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental
pictures, is awe-inspiring beyond all else in nature.”

We knew it was broken in 1945. What we’ve been doing since then is using what we’ve been given and making it work as best we can. We knew it was broken. Seltzer wrote “Hierarchical File Systems Are Dead” back in 2009. Yet, that’s what computers still serve up as our primary interface.

The question then is what the right primary interface is. Of course, while I find that interesting I work with computer systems and I am equally concerned about how we can build in better support, using the vast amount of data that we have in modern computer systems, to construct better tools for navigating through the laundry basket to find the correct thing.

How I think that should be done will have to wait for another post, since that’s the point of my thesis proposal.

Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery

June 25, 2019

Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery, Robert Ross, Lee Ward, Philip Carns, Gary Grider, Scott Klasky, Quincey Koziol, Glenn K. Lockwood, Kahthryn Mohror, Bradley Settlemyer, and Matthew Wolf, United States: N. p., 2019. Web. doi:10.2172/1491994.

Google brought this to my attention recently because I’ve set triggers for people citing prior works that I’ve found interesting or useful. There are a number of things that caught my eye with this technical report that apply to my own current research focus. In addition, I received a suggestion that I pick a specific field or discipline and look at how to solve problems in that specific community as an aid to providing focus and motivation for my own work.

Data center Image (supercomputer). — Supercomputer Cluster

This report is a wealth of interesting information and suggestions for areas in which additional work will be useful. The report itself is quite long – 134 pages – so I am really only going to discuss the sections I found useful for framing my own research interests.

The report itself is essentially capturing the discussion at a US DOE workshop. Based upon my reading of the report, it appears that the workshop was structured, with a presentation of information and seeded with topics expected to elicit discussion. Much of the interesting information for me was in Section 4.2 “Metadata, Name Spaces, and Provenance”.

The report starts by defining what they mean by metadata:

Metadata, in this context, refers generally to the information about data. It may include traditional user-visible file system metadata (e.g., file names, permissions, and access times), internal storage system constructs (e.g., data layout information), and extended metadata in support of features such as provenance or user-defined attributes. Metadata access is often characterized by small, latency-bound operations that present a significant challenge for SSIO systems that are optimized for large, bandwidth-intensive transfers. Other challenging aspects of metadata management are the interdependencies among metadata items, consistency requirements of the information about the data, and volume and diversity of metadata workloads.
4.2.1 Metadata (p. 39)

I was particularly interested in seeing their observation that relationships were one of the things that they found missing from existing systems. While somewhat self-serving, this is the observation that has led me into my current research direction. The observation about how meta-data I/O behavior is a mismatch for the bandwidth-intensive nature of accessing the data was a useful insight: there is a real mismatch between the needs of accessing meta-data versus accessing the data itself, particularly for HPC style workloads.

Not explicitly mentioned here is the challenge of constructing a system in which meta-data is an inherent part of the file itself, despite these very different characteristics. The authors do point out other challenges, such as the difficulty in constructing efficient indexing, which suffers from scalability issues:

While most HPC file systems support some notion of extended attributes for files [Braam2002, Welch2008, Weil2006], this type of support is insufficient to capture the desired requirements to establish relationships between distributed datasets, files, and databases; attribute additional complex metadata based on provenance information; and support the mining and analysis of data. Some research systems provide explicit support for searching the file system name space based on attributes [Aviles-Gonzalez2014, Leung2009], but most of these systems rely on effective indexing, which has its own scalability and data-consistency challenges [Chou2011].
4.2.1 Metadata – “State of the Art” (p. 39)

In other words: the performance requirements of accessing meta-data versus data are quite different; the obvious solution is to provide separate storage tiers or services for satisfying these needs. The disadvantage to this is that when we start separating meta-data from data, we create consistency problems and classic solutions to these consistency problems in turn create challenges in scalability. In other words, we are faced with classic distributed systems problems, in which we must trade off consistency versus performance. That is the CAP Theorem in a nutshell.

Another important point in looking at this technical report is that it emphasizes the vast size of datasets in the scientific and HPC communities. These issues of scale are exacerbated in these environments because their needs are extraordinary, with vast data sets, large compute clusters, geographical diversity, and performance demands.

The need for solutions is an important one in this space, as these are definitely pain points. Further, the needs of reproducibility in science are important – the report expressly mentions the DOE policy now requires a data management plan. The emphasis is clearly on reproducibility and data sharing. It seems fairly clear (to me, at least) that having better data sharing can only benefit scientific work.

The seeding discussion for the Metadata section of the report raises some excellent points that, again, help buttress my own arguments and hopefully will be useful in shaping the forward direction:

A number of nontraditional use cases for the metadata management system have emerged as key to DOE missions. These include multiple views of the metadata to support, for example, different views at different levels of the name space hierarchy and different views for different users’ purposes; user-defined metadata; provenance of the metadata; and the ability to define relationships between metadata from different experiments (e.g., to support the provenance use case).
As the collection of metadata expands, it is important to ensure that all metadata associated with a dataset remains with the data. Metadata storage at different storage tiers, storage and recovery of metadata from archive, and the transfer of datasets to different storage systems are all important use cases to consider.
4.2.1 Metadata – “Seeding Workshop Discussion” (p. 40)

The idea of multiple views is important. It is something we’ve been exploring recently as we consider how to look at data just using current information available to us, something I should describe in another post.

So what do I pick out here as being important considerations? Different views, user-defined metadata, provenance, and defining relationships. Their requirement that metadata be associated with the underlying dataset. As I noted previously, this becomes more challenging when you consider that metadata operations have very different interface and performance characteristics than data access.

I have not really been looking at the issues related to metadata management in tiered storage systems, but clearly I have to do so if I want to address the concerns of the HPC community.

The attendees agreed that the added complexity in storage hierarchies presents challenges for locating users’ data. A primary reason is that the community does not yet have efficient mechanisms for representing and querying the metadata of users’ data in a storage hierarchy. Given the bottlenecks that already exist in metadata operations for simple parallel file systems, there is a strong research need to explore how to efficiently support metadata in hierarchical storage systems. A promising direction of research could be to allow users to tag and name their data to facilitate locating the data in the future. The appropriate tagging and naming schemes need investigation and could include information about the data contents to facilitate locating particular datasets, as well as to communicate I/O requirements for the data (e.g., data lifetime or resilience).
Section 4.4.5 Hierarchy and Data Management

How do we efficiently support medatadata in hierarchical storage systems? Implicit in this question is the assumption that there are peculiar challenges for doing this. The report does delve into some of the challenges:

Hashing can be useful in sharding data for distribution and load balancing, but this does not capture locality – the fact that various files are actually related to one another. We have been considering file clustering; I suspect that cluster sharding might be a useful mechanisms for providing load balancing.
Metadata generation consists of both automatically collected information (e.g., timestamps, sizes, and name) as well as manually generated information (e.g., tags). The report argues that manual generation is not a particularly effective approach and suggests automatically capturing workflow and provenance information is important. As I was reading this, I was wondering if we might be able to apply inheritance to metadata, in a way that is similar to taint tracking systems.

The report has a short, but useful discussion of namespaces. This includes the traditional POSIX hierarchical name space as well as object oriented name spaces. They point to views as a well-understood approach to the problem from the database community. I would point out the hierarchical approach is one possible view. The report is arguing that their needs would be best met by having multiple views.

The existing work generally is hierarchical and focused on file systems. A number of researchers, however, have argued that such hierarchical namespaces impose inherent limitations on concurrency and usability. Eliminating these limitations with object storage systems or higher-level systems could be the fundamental breakthrough needed to scale namespaces to million-way concurrency and to enable new and more productive interaction modalities.
4.2.2 Namespaces “Seeding Workshop Discussion”

There is a dynamic tension here, between search and navigation. I find myself returning to this issue repeatedly lately and this section reminds me that this is, in fact, an important challenge. Navigation becomes less useful when the namespace becomes large and poorly organized; humans then turn to search. Views become alternative representations of the namespace that humans can use to navigate. They can filter out data that is not useful, which simplifies the task of finding relevant data. We apply views already: we hide “hidden files” or directories beginning with a special character (e.g., “.” in UNIX derived systems). The source code control system git will ignore files (filter them from its view) via a .gitignore file. Thus, we are already applying a primitive, limited form of filtering to create the actual view we show.

This report goes on further. It considers some really interesting issues within this area:

Storage aware systems for maintaining provenance data.
The scaling issues inherent in collecting more provenance data; or what do we do when managing the metadata becomes a huge issue itself?
Cross-system considerations. This doesn’t require HPC data – I have commented more than once that when humans are looking for something, they don’t want to restrict it to the current storage device. Data flows across devices and storage systems; we need to be able to capture these relationships. “[T]here is no formal way to construct, capture, and manage this type of data in an interoperable manner.”
External meta-data. We need to remember that the context in which the data is collected or consumed is an important aspect of the data itself. Thus, the tools used, the systems, etc. might be factors. I would argue that a storage system can’t reasonably be expected to capture these, but it certainly should be able to store this metadata.

The discussion for this section is equally interesting, because it reflects the thoughts of practitioners and thus their own struggles with the current system:

Attendees mentioned that tracking provenance is a well-explored aspect of many other fields (art history, digital library science, etc.) and that effort should be made to apply the best practices from those fields to our challenges, rather than reinventing them. Attendees extensively discussed the high value of provenance in science reproducibility, error detection and correction in stored data, software fault detection, and I/O performance improvement of current and future systems.
Attendees also discussed the need for research into how much provenance information to store, for how long, in what level of detail, and how to ensure that the provenance information was immutable and trustworthy. The value of using provenance beyond strictly validating science data itself was brought up; attendees pointed out that provenance information can be used to train new staff as well as help to retain and propagate institutional knowledge of data gathering processes and procedures.
4.2.4 Discussion Themes “Provenance”

A generally useful observation: look to how other fields have approached common problems, to see if there are insights from those fields that we can use to address them here. I found the vast reach of the discussion here interesting – the idea that such a system can be used to “… retain and propagate institutional knowledge…”

Finally, I’m going to capture the areas in which the report indicates participants at the workshop reached consensus. I’ll paraphrase, rather than quote:

Scalable metadata storage – a key point for me here was decoupling meta-data from the data itself. That, despite the seeding suggestion that we keep meta-data associated with the file.
Improve namespace query and display capabilities – make them dynamic, programmable, and extensible.
Better provenance information – the emphasis was on reproducibility of results, but they wanted to ensure that this could embed domain specific features, so that such systems can be useful beyond just reproducibility.

I’ve really only touched on a small part of this report’s total content; there is quite a bit of other, useful, insight within the report. I will be mulling over these issues

What is a File System?

June 6, 2019 / 2 Comments on What is a File System?

I don’t know if I’ve discussed this previously, but if so, feel free to skip it. I had a meeting with one of my supervisors this morning and he observed that something I pointed out “would make a great HotOS paper…” I’m pretty much off the hook on that one, since there won’t be another HotOS for almost two years.

Neutron star collision — Copyright 2016 Skyworks Digital, Inc.

I said “we conflate name spaces and storage in file systems”. Or something equivalent. Here’s my real question: why do we mingle storage with indexing? I realize that indexing needs to be able to ultimately get us what we’re looking for, but traditionally we smash these together and call them a “file system”.

The Google File System paper certainly annoys me at some level each time I read it. But why does it disturb me? Because what they build isn’t a file system, it’s a key-value store. Yet over the years I have come back to it, repeatedly, and observed that there is an important insight in it that really does affect the way we build file systems: applications do not need hierarchical name spaces. What they need is a way to get the object they want.

Indeed, Amazon’s S3 service really is the same thing: an object store in which the “hierarchical name” is nothing more than an arbitrary key. I don’t know how it is organized internally, but the documentation is quite clear that the “hierarchical name space” is really just the way the key is encoded.

An Amazon S3 bucket has no directory hierarchy such as you would find in a typical computer file system. You can, however, create a logical hierarchy by using object key names that imply a folder structure. For example, instead of naming an object sample.jpg, you can name it photos/2006/February/sample.jpg.
GetObject

For me, this is a useful insight because at one point I considered inverting the traditional directory hierarchy, so that rather than have a directory point to it’s children, I would have each file maintain a list of the directories to which it points. Then “enumerating a directory” becomes “finding all the files with a specific attribute”. One motivation for considering this approach was to avoid the scaling issues inherent in large directories (which led me to an interesting looking PhD thesis Scale and Concurrency of Massive File System Directories – more for the reading list!) This is an important concern in a world in which I’m suggesting adding multiple different indices.

But back to the more fundamental question: why do we conflate them? Recall that the original thinking around file systems mirrored the physical nature of existing physical filing systems (I discussed that somewhat back when I talked about my Eurosys presentation) rather than the more free-flowing linkage present back in the Memex model. In a world where the physical copy is located by using its naming information, this does make sense.

Consider how a modern library is organized (yes, some people still collect books!) The location of the object (book) is important, but the categories of the object are equally important – it’s properties. A card catalog could be organized by topic, and would then give you information allowing you to find the work itself.

Applications don’t need hierarchical name spaces – those are something that humans use for managing things. Applications need persistent location information. By using a mutable location key with application we create a situation in which things break just by reorganizing them.

S3 does not fix this issue, since “moving a file” is equivalent to “changing its key”. In that way, it misses my point here. However, I suspect that those keys don’t actually change very often. Plus, S3 has other useful concepts, such as versioning.

What you do lose here is the ability to “check directory-wise security”. That will likely bother some people, but for the vast majority of situations, it really doesn’t matter.

Typical file systems actually do use key-value stores, although local file systems do it with simple keys (e.g., “i-node numbers”). These keys turn out to be useful: NFS looks up files using that i-node number, which is embedded inside the file handle. We did the same thing in AFS, going so far as to add an “open by i-node number” system call so the server could be implemented in user mode. Windows does the same thing with CDFS and NTFS, and I’ve implemented file systems that support “open by ID” for both file and object IDs.

Thus, my point: we merge the name space and the storage management together because “that’s the way we’ve always done it”. The POSIX interface, which codified how things were done in UNIX, embeds this further as there is no open by i-node number and the security model requires a path-wise walk.

Ah, but the failings of POSIX are something that I’ll save for another post.

Visualization

May 30, 2019

Last post I discussed relationships. But relationships really are not enough. Another key to this puzzle is visualization. In other words, how do we present the information to users so that it is useful.

But first, let me step back and point to a larger problem: information overload. If we present users with a list of 100,000 options, they won’t be able to necessarily find what they’re actually seeking. For example, one of the challenges of using an Internet search engine is that they can return millions of “answers” to my basic query. In fact, I just typed in “What is the meaning of life?” to a search engine and it responded back with 1.1 billion possible answers, all in under a second. They are a marvel at backing up the dump truck and offering to inundate me with answers. When is the last time you went through even a handful of these, let alone a large percentage of them?

I suspect that if I ask the HCI folks they will be able to tell me what works for ordinary mortals, but I assure you that computers are capable of returning more information than one can possibly ever process – many years ago I received a bug report about a file system directory that contained over 700,000 files and did not display with the file system kit I’d constructed and the company sold. I’d known I made decisions about resources when I did it; but we lifted that restriction and supported much larger directories than that. I’m quite sure that no mere human would look at such a directory in any meaningful sense. Maybe it was a collection of log files, in which case the names likely embed semantic information about the files themselves. Indeed, in Burrito the authors noted that scientists often embed the schema of their data within the file names. I know that I have done the same and when I’m looking for specific data I am often scripting code to sift through the pile of data to find the subset that is useful to me. The point remains the same: huge listings of files within a directory don’t work for humans.

One potential area for considering navigation is faceted search, a technique for making vast quantities of data searchable. Indeed, this fits well with the graph file system idea because what we expect to find in our graphs are clusters of related files. The graph is likely to be a sparse graph because most files are unlikely to share common features with one another. Thus, it suggests that at least one model for this data visualization problem is going to be rolling up data into these clusters, with an iterative approach to breaking it down and displaying it further. Of course, we might be able to do that within the confines of the existing hierarchical structure, which would be great for retrofitting this into the vast array of existing applications; still, my hope is that we can also provide novel new variations (or rather someone more clever than I am at these things will do so). The challenge is to build something that enables this means I need to have some level of understanding as to what kinds of information are needed to do so.

For example, I was wondering about first order approximations – those that mimic an existing hierarchical file system. Such a file system would present one or more views of the data. Maybe we have a Time view, and the time view then shows you all the files in time order. But if you have 1,000,000 files, that is going to be a mighty big list. One option might be to divide it up into ever smaller chunks of time. Eventually, we’d get to a point where you could see a few dozen files in some sort of time order. Of course, strict sharding of time might not make sense either: why should two files that are separated in time by small intervals end up in separate locations.

Time Slider

One possible option would be to consider a time slider that controls the view. This is something that we can find in other temporal data tools. Thus, creating a time slider might help make visualization easier to perform – in some ways this is similar to the Windows timeline feature that has recently been added into Windows 10. I suspect the file system doesn’t do much to facilitate this. From my own use of this feature it has some interesting limitations, not the least of which is that if you want full functionality they want to export data “to the cloud” for further analysis. That sounds like whatever they are doing is data intensive, which in turn suggests to me that the file system is not doing anything to facilitate this. If the answer really is “sorry, but you have to spend lots of computational cycles to mine this data” then my research is unlikely to be fruitful. I push forward because I don’t think this really is the case – the database community has done marvellous things that permit a vast treasure of relationships be mined.

If we combine this with a faceted search area, I can envision a filtered timeline model – in essence, this seems to be what the new Windows feature is doing, albeit by filtering the things they have deemed to be of interest. I suspect they will extend this capability over time, but a time ordered view is just one of the possible relationships I consider to be important for consideration. I don’t know what the relationships will be, but I do think that having a pre-defined list of those relationships will be self-limiting. Perhaps it will be enough. I proceed on the basis that I expect it will not be enough.

One possible visualization I’ve been considering – and part of the motivation for me deciding I need to start building a file system – is that this sort of namespace might be achievable on our existing hierarchical model if I just add “an extra level of indirection”. Suppose we construct a file system that, instead of having the existing file name has a directory of the same name. In turn, that directory then contains relationships associated with other objects. One of those other objects could be the actual file (so we can still access it), but we could also have a “temporal” directory that would then display a list of files that were created, modified, or accessed around the same timeframe. We could store information about what web sites were visited around times that the file was in use. We could keep track of the music that was playing around the same time. We could point to files that were similar to that specific file. Such a visualization could be easily achieved and compatible with existing tools. Rather than being an endpoint, this is more an intermediate staging area – a way of mocking up the concepts and ideas, and to see what works for people and what does not work.

Thus the desire for a file system. I think we can construct a static namespace by using an existing file system and symbolic links (so you can eventually get back to the real files) by mining existing data sources. But eventually, we are going to want dynamic support here. We can stub that out with FUSE (for example) but in my experience (and based upon the literature) FUSE is slow for meta-data operations, which is really all I will be doing. Building file systems is hard work, but it is something I’ve been doing for years, so that aspect doesn’t really scare me. Visualization on the other hand is an area in which I don’t have a lot of experience.

I’m certainly open to ideas…

Graph File Systems

May 9, 2019

We submitted a paper to HotOS 2019 in which we (unsuccessfully) made the argument that file systems as hierarchies is hobbling our ability to enhance the usability of file systems.

One of the blind reviews pointed to a pair of papers, one of which I’ve already reviewed (I’ll review the other, but I didn’t consider it to be the same thing, except vaguely in name as it ends up being a semantic tagging system). This paper actually pre-dates the work I submitted to Eurosys and profoundly influenced that work.

Five pages isn’t really much space to explore this area. Further, it was about a week before the deadline that I found out HotOS, while an ACM workshop now, uses an older template for its format, with wider margins and larger text, so the five page draft version we had became 6.5 pages! After surgery, it was back down to five pages but missing some useful discussions.

After submission of the HotOS paper, someone pointed me to a Stack Overflow article describing a 1958 paper (An Information Filing and Retrieval System for the Engineering and Management Records of a Large-Scale Computer Development Project) that may be the earliest record of hierarchical file structure (Figure 1).

Figure 1: ERMA Diagram mapping file folders to hierarcy

This is certainly not “new knowledge” as it has been extensively discussed in prior work – hierarchical structure fits the model in which physical filing was actually done.

This becomes clear by the time we get to Multics (Figure 2). We now have a model of directories and files organized in a strict hierarchical fashion.

Figure 2: 1965 Daley, et. al. Multics I/O Diagram (redrawn)

In my experience, when one presents a model and then finds it necessary to “hack” the model to be usable, it suggests that maybe the model is wrong – or at least not optimal. In the same paper the authors observe that they found it useful to augment hierarchy with links. But the introduction of links converts their hierarchy into a directed acyclic graph. Similar, yet not the same.

Figure 3: 1965 Daley, et. al. Multics I/O Diagram *with links* (redrawn)

In all fairness to the Multics folks, this was a reasonable option at that point. They had substantial limitations that would make graph processing impractical at that point (indeed, there are some who are likely to question whether or not graph processing at this level is practical now).

Simplified graph file system model — Figure 4: Simplified Graph Model

So what is it I envision? In Figure 4 I’ve started with a simplified graph model. In the model I’m envisioning (please keep in mind, this is a work in progress and quite likely to change) is that we have a clear separation between the name space (which is the graph) and the storage manager (which deals with figuring out how to deal with data).

One important benefit to come out of the rejection was identification of the QMDS paper – it helps establish why hierarchy isn’t good, even if the solution they put forward has limitations. For me, this is a blessing in disguise because I’ve had to spend so much time justifying why there is even a research question here that pointing to prior work (which wrestled with the same issues and made many similar arguments) allows me to focus future work more on the solution.

The graph model makes sense to me because it generalizes the hierarchical tree (a minimally connected graph) and existing relationships, including links. We are much more familiar with graphs now than we have been in the past: Facebook and LinkedIn are at their heart relationship graphs. Computer memories are much larger than in 1965, as are storage capacities. During the Eurosys Doctoral Workshop someone asked me about the overhead of such a system and I made the bold statement that I would be willing to spend 10% of my storage space if it dramatically improved my ability to find things. Surprisingly, that seemed to mollify the person asking.

It is the capture of relationships that distinguishes this approach from the more classic tagging approach. A tag represents an extension of some property of what a file is, not how it relates to other files. We’ve actually had tagging systems for a very long time – when I worked on Episode we explicitly decided to add “property lists” as a form of extended attribute; not quite as general as streams in NTFS, but a similar idea (as I understand it, they chose to do something similar in ReFS – they support alternate data streams, but they are limited to 128KB. Episode had a 64KB limitation for property lists.)

Why aren’t tags enough? Because they associate information with the specific file (or directory). What they fail to capture is relationships across file system objects. Why do we want relationships?

Up to this point I’ve been arguing that we want relationships because they provide us with the ability to find things. One of the very intriguing take-aways from The Ubiquitous Digital File paper is the observation that people prefer navigation to search. That’s a pretty profound observation when viewed against 30 years of research into tagging systems. Apple’s Spotlight and Microsoft’s search focus on improving search ability.

I’m pretty old-school here. When I am looking for something I often resort to searching for it by name from the command line and once I find it I navigate to the containing directory. I had not really considered that for me navigation is my primary mechanism and I use search as a secondary mechanism.

One of the most common uses of graphs by “real people” are maps. I’ve known this and I have considered visualizations of data as being a map between data elements. What I had not really considered is that we navigate maps all the time. If our data is organized in a graph fashion, we could consider navigating it much like we might navigate a map, or walk through relationship graphs such as Facebook or LinkedIn.

The foundation of this research direction is the relationship graph. Thus, the next phase of my work is really to explore what a reasonable representation of the namespace in this system would look like. More to discuss and consider in a future post!

QMDS: A File System Metadata Management Service Supporting a Graph Data Model-based Query Language

May 2, 2019

QMDS: A File System Metadata Management Service Supporting a Graph Data Model-based Query Language

Sasha Ames, Maya B. Gokhale, and Carlos Maltzahn, International Journal of Parallel, Emergent and Distributed Systems, Volume 28, Number 2, pp. 159-183, 2013.

This paper came to my attention via feedback from an anonymous reviewer, observing that our idea of constructing a graph file system had “already been done”. It never ceases to amaze me that, despite how much time I have spent combing the literature, there seem to be things I miss. In this particular case, I have to agree with the reviewer that the basic idea we proposed really had been done before, though it seems as if the design space has not been exhausted and this paper actually will save me considerable time because up to this point it’s been a challenge to even explain why this kind of file system is useful.

Indeed, my read of this paper really suggests that these authors also struggled with similar objections because they spend considerable time justifying the need for their work: the introduction is fairly long because it explains the underlying problem, and the prior work section also goes to great length to explain why prior work is inadequate to the job.

In this paper, we discuss an exploration of our approach to the problem: the use of a graph data model for representing file system user-defined metadata and a query language for retrieval. The purpose of this approach is to provide management of user-defined file metadata along with data under a single file system interface, delivering a common service across applications. Applications would be able to offload their metadata management needs to the service, alleviating the need for their own solution. This arrangement would benefit applications by reducing their code complexity, by virtue of not having their own custom metadata management components. A second benefit is improved opportunities for interoperability among separate applications.

The model we have been discussing, and trying to present, is one in which we have a richer model for meta-data to capture not only attributes of files, but also relationships across files. Like these authors we reached the conclusion that a graph is likely a better representational model for data. This encompasses the hierarchical model that is a fundamental part of POSIX, while at the same time providing us with a robust platform on which to build additional functionality. Before I start explaining that, though, I should go through this paper because it has valid results that I can use moving forward.

In Figure 1 from the paper, the authors describe the type of graph they are using: it has vertices (files) and edges (parent/child), with labels on the edges (attributes). I am not convinced this is the right graph model, but I will save that conversation for a future blog post.

In Figure 2, the authors delve into the structure of their graph in greater detail, as they compare their model to that of Resource Description Framework (RDF) triples that are used in several graph processing systems. Here we see a better description of their format (which is actually closer to what I’ll describe for my own work): “Our data model for file system metadata is a directed graph with attributes on nodes and edges, shown in Figure 1. Nodes in the graph can represent files, and this allows the system to manage relationships among files. We call our directed edges links, connecting parent and child nodes.”

They note that applications do not explicitly define schemas (something else I need to discuss in a future blog post) nor does their system require classes be defined. The authors argue this provides greater flexibility and indeed, the fact it does not force an application to be locked into a specific model.

“A heterogeneous approach to managing metadata gives all applications the
same tools to manage relationships.” It seems to me that this is one of the most compelling reasons on why this is a systems problem and not an application problem. If we insist applications implement this, most will not. Those that do will have no mechanism for interaction across applications. If files were truly isolated from one another, that would be fine, but in the real world, files do have relationships with other objects, whether it is other files, or external references (e.g., the “get me the e-mail from whence this file originated” example I described recently.) This helps with motivation, which I mentioned before has been an area of resistance I’ve received as well.

Figure 3 shows how the authors optimized their file system’s meta-data efficiency. This provides some interesting insight into the cases they expect to be common. I found their emphasis on navigation a useful one as well, particularly given the discussion of it recently.

The choice of optimization models certainly seems to be an important one, given that we can’t optimize for everything, and if we optimize for the wrong things we end up with something that looks much like brute-force search, which I can’t imagine is going to perform well.

In Figure 4 the authors turn their attention to their implementation model. They use the FUSE file systems interface to aid their implementation. This is interesting because one of the areas I’ve been exploring (sigh, yet another area to discuss in greater detail) is ways to more easily enhance the FUSE interface to enable exploring enhanced interfaces.

It seems that one downside to this approach is that it focuses on existing mechanisms for finding and accessing files, without providing a useful mechanism for exploiting enhanced search. Admittedly, the extended attribute interface does provide some mechanism for achieving this, but this is a useful paradigm for exploring how such a file system will work with existing applications – certainly an important aspect of constructing any file system that one expects will be useful.

Figure 5 was quite interesting to me, because it addresses one of the concerns I’ve seen in prior work based on relational databases (e.g., they’re often slow). I suspect that these results, with good times within QMDS relative to their evaluation relate to their optimization model for the queries they are executing.

One thing this doesn’t evaluate is how QDMS performs relative to other FUSE based file systems. The queries they do execute on QMDS seem to be targeted search queries and not necessarily well-correlated to actual usage as a file system.

There is no follow-up to this work, unfortunately, which makes it difficult to understand the general usefulness of QMDS. The upside to this is that it leaves considerable room for future work. It does provide a strong case for exploring this approach more thoroughly and I have already suggested that better evaluation seems justified under the circumstances.

I’ll discuss more of these issues as I turn my attention to describing my own work in future posts.

The Cap File System

February 12, 2018

The Cap Filing System
R. M. Needham and A.D. Birrell, Symposium on Operating Systems Principles, 1977, Association for Computing Machinery.

I’ve fallen behind this past ten days, working towards a deadline. My own conference paper is now submitted and I’m working on recovering. What that means is this week is going to be a busy one as I work on catching up. Adding to the fun, FAST is going on this week (February 13-16, 2008). I hope to add some bonus discussions on the current work being presented there.

Let’s get to discussing this paper.

CAP is a capabilities based system. The capability systems are a parallel idea that has been explored from time to time versus the access control system for Multics/UNIX inspired operating systems. The latest contender in this space would be Fuschia, an experimental operating system that Google is developing (though it is an open source project under a mixture of licenses). Interestingly, there are reports it runs on the Pixelbook in addition to the previous hardware it had supported.

At any rate, the idea behind a capability is to use identifiers that encapsulate the access rights within the capability itself. Thus, it is a name and the virtue of having the name is that it means you have the access inherent in that name.

These capabilities all represent to a program the right to have or do something: the preservation of information from one run of a program to another (a universal operating system requirement) is thus seen by the CAP programmer as the preservation of a capability rather than of an object itself.

The paper describes how capabilities are managed, which interacts with the “filing system” of course, since it has to store persistent information. They describe the “system internal name” (SIN), which is combined with a disk address to map memory segments to actual storage. The capability (in a “directory”) then becomes a repository of these persistent objects. This also creates a reference to the disk block that is in use, ensuring those storage regions are not reused.

One interesting characteristic of their system is that they do not guarantee free disk space will be recycled quickly (“[T]here is no guarantee that inaccessible disk space will be relinquished at the earliest possible moment.”) Indeed they note that space reclamation is only done when the system reboots and “[T]he filing system is not designed to run forever.”

They discuss how CAP differs from prior work (e.g., OS/360) where only the name matters; there is no concept of a directory for use as part of the capability system. The directory actually provides them an additional level of control as well, and they use directory capabilities as well as segment capabilities. Directories may be persistent (so they have a SIN and disk block location) or ephemeral (so they disappear when the program exits) – a sort of “built in” temporary storage concept for ephemeral memory management.

Sharing objects is now trivial – you simply tell someone the correct name; that embeds the capability within it. They do not describe the mechanisms used for transfer (“[B]y mechanisms which do not concern us in detail here.”)

They also describe how this name space is similar to other systems (UNIX and CAL-TSS) but different:

Access is associated with the name which in turn references information in the directory. It is not an attribute of the file itself.
The name space need not be a “strict hierarchy”. This means that portions could become disconnected, or even be private to a single application.
Their use of directories behaves similar to the model of “current directory” (presumably in UNIX) even though CAP expressly does not have a concept of current directory.
Directories are not even directed acyclic graphs!

The paper describes how capabilities work, since they are a fine-grained control mechanism. They explain that the holder of an existing capability (a program) may generate a more restrictive capability to provide to another program. Since capabilities apply to both individual files as well as directories, it is possible for a program to transfer a set of capabilities by creating a new directory and storing the relevant capabilities to the target program.

The names themselves can be quite ugly, as they incorporate capabilities within them. They describe some conventions they employ in terms of name management and directory placement, but point out these are conventions and not a hard requirement of the system.

CAP certainly presents a rather different model of a file system than we see in other systems. The idea of disconnected name spaces that are only visible to particular programs is an intriguing one. They use the example of the password database, which requires the program have the password file capability.

They discuss security. The directory manager is a separate module that programs use to interact with the directories to which they have access. To invoke the directory manager, the caller must have the ENTER capability.

I find this to be exactly the type of thought provoking paper I had hoped to find as I comb through these old papers. The idea that a file system name space need not be connected, that it could be private to a particular program or set of programs, and embedding access rights (“capabilities”) into the name will give me plenty to think about.

If you would like to know more about CAP there is a paper about it in the prior Symposium on Operating Systems Principles: The Cambridge CAP Computer and its Operating System. It is not too surprising that this is available from Microsoft Research, as they also built a capability based operating system (or two): Singularity and Midori.

A Principle for Resilient Sharing of Distributed Resources

January 30, 2018

A Principle for Resilient Sharing of Distributed Resources
Peter A. Alsberg and John D. Day, In Proceedings of the 2nd international conference on Software engineering, pp. 562-570. IEEE Computer Society Press, 1976.

Today I turn my attention to a paper that begins to explore the issues surrounding distributed systems. This paper sets forth some basic principles that should be considered as part of distributed systems that are worth capturing and pointing out.

They state that a “resilient server” must have four attributes:

It must be able to detect and recover from some finite number of errors.
It must be reliable enough that a user doesn’t need to worry about the possibility of service failure.
Failure just beyond the finite limit are not catastrophic.
One user’s abuse should not have material impact on any other user.

These all seem somewhat reasonable, though I must admit that I found (3) a bit surprising, as it is not uncommon for some classes of failures to cause catastrophic failure. For example, when using erasure coding it is common for some number of failures to lead to catastrophic failure. Upon reflection, I realized that one could achieve this goal by simply setting the finite limit a bit lower, though I suspect this is not entirely what the original author had in mind.

Still, the general concept of resiliency is a good one to capture and consider. The authors point out some of the challenges inherent in this over a network, notably latency. “[A] major consideration when choosing a resource sharing strategy is to reduce, as much as possible, the number of message delays required to effect the sharing of resources.”

In other words, keep the messages to a minimum, try not to make them synchronous. Asynchronous messaging systems will turn out to be rather complex, however, and sometimes there are just operations that require synchronous behavior.

Of course, there has to be a file system tie-in here (however tenuous) and there is! Under examples they list “Network Virtual File Systems which provide directory services and data access services…” Thus, it is clear that the authors are considering network file systems as part of their work.

In 1976 the authors indicate that the cost to send a message is on the order of 100ms, while the cost to acquire a lock on the CPU is 100 microseconds to 1 millisecond. While things are faster now, there is still a large difference between these two paths on modern systems. Thus, we will still be dealing with issues on the same scale.

The bulk of the paper then walks through the description of their model for providing resilient network services – an application host, sends a request to a primary server host; that server host then communicates with a secondary server host. That secondary server host can send backup requests to a tertiary server, and so forth. The secondary host confirms with the primary host, and ultimately it is the secondary host that confirms the operation with the application host.

They cover a variety of important concepts, such as the idea that a host may need to record state about the operations, that operations cannot be applied until the other servers have received their messages, etc. This is, in essence, an early consensus protocol. While not efficient, the authors have laid groundwork in helping us think about how we construct such services.

I have included Figure 3 from the paper above. It shows the message flow through the system. The paper also discusses how to handle a number of failure situations and how messages flowing through the system keep track of which nodes are current.

It also touches on one of the most complex issues in distributed systems: network partition. Intriguingly, the authors do not assert that one partition must remain alive as they describe the decision being related to the specific requirements of the environment. Thus, in their model it would be possible to end up with partitions that continue forward but can no longer be easily re-joined after the network partition is resolved. Of course, they don’t require that both sides of a partition progress, though they do not discuss ways to handle this, either.

They assert that the primary/secondary/backup model is optimal in terms of the number of messages that it sends and in how it ensures the messages are strictly ordered. Then they briefly describe a few other possible arrangements that have multiple primaries and/or secondaries. Their final conclusion is that their original strategy is at least as good as the alternatives though this is not demonstrated in any formal way.

Now that they have their replication system, they view how it will work for the network virtual file system. They conclude that the highest levels of the hierarchy need to be stored on all nodes (which makes them the most expensive to maintain). They partition the names space below that and record location information within the nodes of the name space where the storage has been split out across hosts. Thus, we see a description of a global name space.

Their system is simple, but they identify a number of the important issues. Their suggestion about sharding the file systems name space is one that we shall see again in the future.

They have laid the groundwork for thinking about how to build distributed file systems.

The UNIX Time-Sharing Operating System

January 18, 2018

The UNIX Time-Sharing Operating System
Dennis M. Ritchie and Ken Thompson, Bell Labs, Communications of the ACM, July 1974, Volume 17, Number 7, pp. 365-375.

This paper describes Version 3 of UNIX. It was Version 6 that became the basis of the Berkeley Software Distribution (BSD) version of UNIX. The only other operating system in CS history to date that has had so much impact on operating systems development is MULTICS, and UNIX is a direct descendant. Developed by several Bell Labs researchers that had been involved in MULTICS, its goal was to try and build a smaller operating system that retained what they viewed as the key benefits of MULTICS.

Much of UNIX Version 3 was written in the C programming language, itself derived from BCPL, a language that had been used on the MULTICS project.

“The most important job of UNIX is to provide a file system.“

These words leave little doubt about the role of file systems in UNIX and the importance assigned to them. The paper then goes on to describe files, in similar terms to MULTICS: files, directories, and special files (devices). We see the hierarchical file system of MULTICS reflected back in the description of the system. They talk about file names being 14 characters or less, the formation of paths, and the iterative walk of names through the file system name space to find other directories, as well as the terminal file nodes.

“The directory structure is constrained to have the form of a rooted tree.“

This is what I am looking for – the why of hierarchical file systems. I found the answer here, unsurprising yet ironically disappointing:

“The reason for this is to simplify the writing of programs which visit subtrees of the directory structure, and more important, to avoid the separation of portions of the hierarchy.“

Not surprising, yet not precisely what I had expected. I had expected the reason to be for the simplicity of the operating system (though they do allude to this by discussing the difficulty of knowing when it is safe to delete a directory. They describe links to files, however, so their file system is not really a tree. Rather, it is more like a directed acyclic graph (DAG). Files do not have pointers back to their directories, but directories have pointers back to their parent. Thus we have the distinction. The namespace is a DAG. Files don’t really live in the name space directly, they are referenced from the namespace, but have a reference count.

Oddly, with that, I found what I came for, at least in terms of insight for my own research area. There is a certain satisfaction in being able to point to this seminal document and say “this is why we got to where we are now.”

But if I stopped at this point, I would be leaving out the bits I had not expected to find.

First, the mundane: they discuss removable file systems, the fact that this is in fact a collection of name spaces, combining persistent name spaces with one another using a non-persistent mechanism (mounting), There is a simple description of how the file system is itself implemented. They describe the i-number (now the inode number) as an index into the file table. Thus, a directory entry is where the name lives, it merely refers to the file using its i-number. These entries are then called i-nodes. The i-node contains information about the owner of the file, the protection bits governing the file, the location information for where logical data is physically stored on the medium, the size of the file, its timestamps, it’s attribute bits, and the number of directory entries referencing the given i-node.

Surprisingly, not all that different than it is now, 45 years later. Implementation details have changed, as we no longer limit files to 10MB in size.

They describe bufering, they describe sector sized I/O and how it is more efficient for a program to do sector-sized I/O operations.

Much of the paper has nothing to do with file systems. I leave that to the interested reader to explore beyond that.

There are two interesting tid-bits remaining:

They lost data once, on a hard disk that failed. The backup was 3 days old.
They considered the permuted index application as one of the “major programs available”.

The fact they considered the permuted index important at this early stage was an interesting insight to me. Clearly, the ability to “find our stuff” is one that’s been around since the dawn of time. Maybe this research direction of mine does make sense.

Category Archives: Name Spaces

Recent Posts

Recent Comments

Archives

Categories

Subscribe to Blog via Email