Home » Posts tagged 'Namespace'

Tag Archives: Namespace

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 204 other subscribers
May 2024
S M T W T F S
 1234
567891011
12131415161718
19202122232425
262728293031  

Brainiattic: Remember more with your own Metaverse enhanced brain attic

Connecting devices and human cognition

I recently described the idea of “activity context” and suggested that providing this new type of information about data (meta-data) to applications would permit improve important tasks such as finding. My examining committee challenged me to think about what I would do if my proposed service – Indaleko – already existed today.

This is the second idea that I decided to propose on my blog. My goal is to find how activity context can be used to provide enhanced functionality. My first idea was fairly mundane: how can we improve the “file browsing” experience in a fashion that focuses on content and similarity by combining prior work with the additional insight provided by activity context.

My initial motivation for this second idea was motivated by my mental image of a personal library but I note that there’s a more general model here: displaying digital objects as something familiar. When I recently described this library instantiation of my brain attic the person said “but I don’t think of digital objects as being big enough to be books.” To address this point: I agree, another person’s mental model for how they want to represent digital data in a virtual world need not match my model. That’s one of the benefits of virtual worlds – we can represent things in forms that are not constrained by what things must be in the real world.

In my recent post about file browsers I discussed Focus, an alternative “table top” browser for making data accessible. One reason I liked Focus is that the authors observed how hierarchical organization does not work in this interface. They also show how the interface is useful and thus it is a concrete argument as to at least one limitation of the hierarchical file/folder browser model. Another important aspect of the Focus work was their observation that a benefit of the table top interface is it permits different users to organize information in their own way. A benefit of a virtual “library” is that the same data can be presented to different users in ways that are comfortable to them.

Of course, the “Metaverse” is still an emerging set of ideas. In a recent article about Second Life Philip Rosedale points out that existing advertising driven models don’t work well. This begs the question – what does work well?

My idea is that by having a richer set of environmental information available, it will be easier to construct virtual models that we can use to find information. Vannevar Bush had Memex, his extended memory tool. This idea turns out to be surprisingly ancient in origin, from a time before printing when most information was remembered. I was discussing this with a fellow researcher and he suggested this is like Sherlock Holmes’ Mind Palace. This led me to the model of a “brain attic” and I realized that this is similar to my model of a “personal virtual library.”

The Sherlock Holmes article has a brilliant quotation from Maria Konnikova: “The key insight from the brain attic is that you’re only going to be able to remember something, and you can only really say you know it, if you can access it when you need it,”

This resonates with my goal of improving finding, because improving finding improves access when you need it.

Thus, I decided to call this mental model “Braniattic.” It is certainly more general than my original mental model of a “personal virtual library,” yet I am also permitted to have my mental model of my pertinent digital objects being projected as books. I could then ask my personal digital librarian to show me works related to specific musical bands, or particular weather. As our virtual worlds become more capable – more like the holodeck of Star Trek – I can envision having control of the ambient room temperature and even the production of familiar smells. While our smart thermostats are now capturing the ambient room temperature and humidity level and we can query online sources for external temperatures, we don’t actively use that information to inform our finding activities, despite the reality is that human brains do recall such things; “it was cold out,” “I was listening to Beethovan,” or “I was sick that day.”

Thus, having additional contextual information can be used at least to improve finding by enabling your “brain attic.” I suspect that, once activity context is available we will find additional ways to use it in constructing some of our personal metaverse environments.

Using Focus, Relationship, Breadcrumbs, and Trails for Success in Finding

As I mentioned in my last post, I am considering how to add activity context as a system service that can be useful in improving findings. Last month (December 2021) my examination committee asked me to consider a useful question: “If this service already existed what would you build using it?”

The challenge in answering this question was not finding examples, but rather finding examples that fit into the “this is a systems problem” box that I had been thinking about while framing my research proposal. It has now been a month and I realized at some point that I do not need to constrain myself to systems. From that, I was able to pull a number of examples that I had considered while writing my thesis proposal.

The first of this is likely what I would consider the closest to being “systems related.” This hearkens back to the original motivation for my research direction: I was taking Dr. David Joyner’s “Human-Computer Interaction” course at Georgia Tech and at one point he used the “file/folder” metaphor as an example of HCI. I had been wrestling with the problem of scope and finding and this simple presentation made it clear why we were not escaping the file/folder metaphor – it has been “good enough” for decades.

More recently, I have been working on figuring out better ways to encourage finding, and that is the original motivation for my thesis proposal. The key idea of “activity context” has potentially broader usage beyond building better search tools.

In my research I have learned that humans do not like to search unless they have no other option. Instead, they prefer to navigate. The research literature says that this is because searching creates more cognitive load for the human user than navigation does. I think of this as meaning that people prefer to be told where to go rather than being given a list of possible options.

Several years ago (pre-pandemic) Ashish Nair came and worked with us for nine weeks one summer. I worked with him to look at building tools to take existing file data across multiple distinct storage domains and present them based upon commonality. By clustering files according to both their meta-data and simply extracted semantic context, he was able to modify an existing graph data visualizer to permit browsing files based on those relationships, regardless of where they were actually stored. While simple, this demonstration has stuck with me.

Ashish Nair (Systopia Intern) worked with us to build an interesting file browser using a graph data visualizer.

Thus, pushed to think of ways in which I would use Indaleko, my proposed activity context system, it occurred to me that using activity context to cluster related objects would be a natural way to exploit this information. This is also something easy to achieve. Unlike some of my other ideas, this is a tool that can demonstrate an associative model because “walking a graph” is an easy to understand way to walk related information.

There is a small body of research that has looked at similar interfaces. One that stuck in my mind was called Focus. While the authors were thinking of tabletop interfaces, the basic paradigm they describe, where one starts with a “primary file” (the focus) and then shows similar files (driven by content and meta-data) along the edges. This is remarkably like Ashish’s demo.

The exciting thing about having activity context is that it provides interesting new ways of associating files together: independent of location and clustered together by commonality. Both the demo and Focus use existing file meta-data and content similarity, which is useful. With activity context added as well, there is further information that can be used to both refine similar associations as well as cluster along a greater number of axis.

Thus, I can show off the benefits of Indaleko‘s activity context support by using a Focus-style file browser.

Better Finding: Combine Semantic and Associative Context with Indaleko

Last month I presented my thesis proposal to my PhD committee. My proposal doesn’t mean that I am done, rather it means that I have more clearly identified what I intend to make the focus of my final research.

It has certainly taken longer to get to this point than I had anticipated. Part of the challenge is that there is quite a lot of work that has been done previously around search and semantic context. Very recent work by Daniela Vianna relates to the use of “personal digital traces” to augment search. It was Dr. Vianna’s work that provided a solid theoretical basis for my own proposed work.

Our computer systems collect quite an array of information, not only about us but also about the environment in which we work.

In 1945 Vannevar Bush described the challenges to humans of finding things in a codified system of records. His observations continue to be insightful more than 75 years later:

Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in nature.

I find myself returning to Bush’s observations. Those observations have led me to ask if it is possible for us to build systems that get us closer to this ideal?

My thesis is that collecting, storing, and disseminating information about the environment in which digital objects are being used provides us with new context that enables better finding.

So, my proposal is about how to collect, store, and disseminate this type of external contextual information. I envision combining this with existing data sources and indexing mechanisms to allow capturing activity context in which digital objects are used by humans. A systems level service that can do this will then enable a broad range of applications to exploit this information to reconstruct context that is helpful to human users. Over my next several blog posts I will describe some ideas that I have with what I envision being possible with this new service.

The title of my proposal is: Indaleko: Using System Activity Context to Improve Finding. One of the key ideas from this is the idea that we can collect information the computer might not find particularly relevant but the human user will. This could be something as simple as the ambient noise in the user’s background (“what music are you listening to?” or “Is your dog barking in the background”) or environmental events (“it is raining”) or even personal events (“my heart rate was elevated” or “I just bought a new yoga mat”). Humans associate things together – not in the same way, nor the same specific elements – using a variety of contextual mechanisms. My objective is to enable capturing data that we can then use to replicate this “associative thinking” that helps humans.

Ultimately, such a system will help human users find connections between objects. My focus is on storage because that is my background: in essence, I am interested in how the computer can extend human memory without losing the amazing flexibility of that memory to connect seemingly unrelated “things” together.

In my next several posts I will explore potential uses for Indaleko.

intricacy of trails, the detail of mental pictures, is awe-inspiring
beyond all else in nature.
This is as true in 2021 as it was in 1945. Thus, the question that mo-
tivates my research is: “Can we build systems that get us closer to that
ideal?”

Laundry Baskets: The New File System Namespace Model

A large pile of laundry in a laundry basket, with a cat sleeping on the top.
The “Laundry Basket” model for storage.

While I’ve been quiet about what I’ve been doing research-wise, I have been making forward progress. Only recently have ideas been converging towards a concrete thesis and the corresponding research questions that I need to explore as part of verifying my thesis.

I received an interesting article today showing that my research is far more relevant than I’d considered: “FILE NOT FOUND“. The article describes that the predominant organizational scheme for “Gen Z” students is the “Laundry Basket” in which all of their files are placed. This is coming as a surprise to people who have been trained in the ways of the hierarchical folder metaphor.

While going through older work, I have found it is intriguing that early researchers did not see the hierarchical design as being the pinnacle of design; rather they saw it as a stop-gap measure on the way to richer models. Researchers have explored richer models. Jeff Mogul, now at Google Research, did his PhD thesis around various ideas about improving file organization. Eno Thereska, now at Amazon, wrote an intriguing paper while at Microsoft Research entitled “Beyond file systems: understanding the nature of places where people store their data” in which he and his team pointed out that cloud storage was creating a tension between file systems and cloud storage. The article from the Verge that prompted me to write this post logically makes sense in the context of what Thereska was saying back in 2014.

The challenge is to figure out what comes instead. Two summers ago I was fortunate enough to have a very talented young intern working with me for a couple months and during that time one of the interesting things he built was a tool that viewed files as a graph rather than a tree. The focus was always at the center, but then it would be surrounded by related files. Pick one of those files and it became the central focus, with a breadcrumb trail showing how you got there but also showing other related files.

The relationships we used were fairly simple and extracted from existing file meta-data. What was actually quite fascinating about it though was that we constructed it to tie two disjoint storage locations (his local laptop and his Google Drive) together into a single namespace. It was really an electrifying demonstration and I have been working to figure out how to enable that more fully – what we had was a mock-up, with static information, but the visualization aspects of “navigating” through files was quite powerful.

I have been writing my thesis proposal, and as part of that I’ve been working through and identifying key work that has already been done. Of course my goal is to build on top of this prior work and while I have identified ways of doing this, I also see that to be truly effective it should use as much of the prior work as possible. The idea of not having directories is a surprisingly powerful one. What I hadn’t heard previously was the idea of considering it to be a “laundry basket” yet the metaphor is quite apt. Thus, the question is how to enable building tools to find the specific thing you want from the basket as quickly as possible.

For example, the author of the Verge article observed: “More broadly, directory structure connotes physical placement — the idea that a file stored on a computer is located somewhere on that computer, in a specific and discrete location.” Here is what I recently wrote in an early draft of my thesis proposal: “This work proposes to develop a model to separate naming from location, which enables the construction of dynamic cross-silo human usable name-spaces and show how that model extends the utility of computer storage to better meet the needs of human users.”

Naming tied to location is broken, at least for human users. Oh, sure, we need to keep track of where something is stored to actually retrieve the contents, but there is absolutely no reason that we need to embed that within the name we use to find that file. One reason for this is that we often choose the location due to external factors. For example, we might use cloud storage for sharing specific content with others. People that work with large data sets often use storage locations that are tuned to the needs of that particular data set. There is, however, no reason why you should store the Excel spreadsheet or Python notebook that you used to analyze that data in the same location. Right now, with hierarchical names, you need to do so in order to put them into the “right directory” with each other.

That’s just broken.

However, it’s also broken to expect human users to do the grunt work here. The reason Gen Z is using a “laundry basket” is because it doesn’t require any effort on their part to put something into that basket. The work then becomes when they need to find a particular item.

This isn’t a new idea. Vannevar Bush described this idea in 1945:

“Consider a future device for individual use, which is a sort of
mechanized private file and library. It needs a name, and, to coin
one at random, ”memex” will do. A memex is a device in which
an individual stores all his books, records, and communications,
and which is mechanized so that it may be consulted with exceeding
speed and flexibility. It is an enlarged intimate supplement to his
memory.”

He also did a good job of explaining why indexing (the basis of hierarchical file systems) was broken:

“Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

“The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental
pictures, is awe-inspiring beyond all else in nature.”

We knew it was broken in 1945. What we’ve been doing since then is using what we’ve been given and making it work as best we can. We knew it was broken. Seltzer wrote “Hierarchical File Systems Are Dead” back in 2009. Yet, that’s what computers still serve up as our primary interface.

The question then is what the right primary interface is. Of course, while I find that interesting I work with computer systems and I am equally concerned about how we can build in better support, using the vast amount of data that we have in modern computer systems, to construct better tools for navigating through the laundry basket to find the correct thing.

How I think that should be done will have to wait for another post, since that’s the point of my thesis proposal.

What is a File System?

I don’t know if I’ve discussed this previously, but if so, feel free to skip it. I had a meeting with one of my supervisors this morning and he observed that something I pointed out “would make a great HotOS paper…” I’m pretty much off the hook on that one, since there won’t be another HotOS for almost two years.

Neutron star collision
Copyright 2016 Skyworks Digital, Inc.

I said “we conflate name spaces and storage in file systems”. Or something equivalent. Here’s my real question: why do we mingle storage with indexing? I realize that indexing needs to be able to ultimately get us what we’re looking for, but traditionally we smash these together and call them a “file system”.

The Google File System paper certainly annoys me at some level each time I read it. But why does it disturb me? Because what they build isn’t a file system, it’s a key-value store. Yet over the years I have come back to it, repeatedly, and observed that there is an important insight in it that really does affect the way we build file systems: applications do not need hierarchical name spaces. What they need is a way to get the object they want.

Indeed, Amazon’s S3 service really is the same thing: an object store in which the “hierarchical name” is nothing more than an arbitrary key. I don’t know how it is organized internally, but the documentation is quite clear that the “hierarchical name space” is really just the way the key is encoded.

An Amazon S3 bucket has no directory hierarchy such as you would find in a typical computer file system. You can, however, create a logical hierarchy by using object key names that imply a folder structure. For example, instead of naming an object sample.jpg, you can name it photos/2006/February/sample.jpg.

GetObject

For me, this is a useful insight because at one point I considered inverting the traditional directory hierarchy, so that rather than have a directory point to it’s children, I would have each file maintain a list of the directories to which it points. Then “enumerating a directory” becomes “finding all the files with a specific attribute”. One motivation for considering this approach was to avoid the scaling issues inherent in large directories (which led me to an interesting looking PhD thesis Scale and Concurrency of Massive File System Directories – more for the reading list!) This is an important concern in a world in which I’m suggesting adding multiple different indices.

But back to the more fundamental question: why do we conflate them? Recall that the original thinking around file systems mirrored the physical nature of existing physical filing systems (I discussed that somewhat back when I talked about my Eurosys presentation) rather than the more free-flowing linkage present back in the Memex model. In a world where the physical copy is located by using its naming information, this does make sense.

Consider how a modern library is organized (yes, some people still collect books!) The location of the object (book) is important, but the categories of the object are equally important – it’s properties. A card catalog could be organized by topic, and would then give you information allowing you to find the work itself.

Applications don’t need hierarchical name spaces – those are something that humans use for managing things. Applications need persistent location information. By using a mutable location key with application we create a situation in which things break just by reorganizing them.

S3 does not fix this issue, since “moving a file” is equivalent to “changing its key”. In that way, it misses my point here. However, I suspect that those keys don’t actually change very often. Plus, S3 has other useful concepts, such as versioning.

What you do lose here is the ability to “check directory-wise security”. That will likely bother some people, but for the vast majority of situations, it really doesn’t matter.

Typical file systems actually do use key-value stores, although local file systems do it with simple keys (e.g., “i-node numbers”). These keys turn out to be useful: NFS looks up files using that i-node number, which is embedded inside the file handle. We did the same thing in AFS, going so far as to add an “open by i-node number” system call so the server could be implemented in user mode. Windows does the same thing with CDFS and NTFS, and I’ve implemented file systems that support “open by ID” for both file and object IDs.

Thus, my point: we merge the name space and the storage management together because “that’s the way we’ve always done it”. The POSIX interface, which codified how things were done in UNIX, embeds this further as there is no open by i-node number and the security model requires a path-wise walk.

Ah, but the failings of POSIX are something that I’ll save for another post.