Home » Conferences Journals and Workshops (Page 2)

Category Archives: Conferences Journals and Workshops

The DEMOS File System

The DEMOS File System
Michael L. Powell, In Proceedings of the sixth ACM symposium on Operating systems principles, pp. 33-42.

This paper delves into the nitty gritty details of constructing physical file systems.  I was surprised that it had relatively few citations (61 according to Google Scholar when I checked) because, having read it, I would hand this paper to someone asking me “what are file systems?”  I suspect that the more frequently cited paper in this area will be “A Fast File System for UNIX,” which cites to this paper.

The target for DEMOS is the CRAY-1 supercomputer, at the time the fastest computer in the world.  As a matter of comparison, modern mobile devices have more computational power (and often more I/O bandwidth) than the CRAY-1 did.

DEMOS Figures 1 and 2The author discusses the design of a new file system for use with a custom operating system for the Los Alamos National Laboratory’s CRAY-1 computer system.  A key for this project was that it seeks to improve performance as much as possible.  After all, why build a super-computer if you then cripple it with features that don’t enhance its performance?

What I find delightful about this paper is that it describes the basic constituent parts of a file system, as well as strategies for optimizing performance.  It does so in a clear and understandable fashion.DEMOS Figures 3 and 4

DEMOS utilizes a UNIX-like hierarchical file system model.  It has directories and files. It does not have the link model from Multics so paths to files are unique in DEMOS.  Files are managed in units of blocks (4096 bytes) but I/O is specified as bytes (interestingly, they specify eight bit bytes as nine bit machines were still in use.)

The authors discuss file sizes.  To the best of my knowledge this is one of the earliest papers covering this common subject  (which is revisited periodically because workloads change and file sizes also change).  One of the common themes I have seen in other work is mirrored here: most files are small.  Figure 1 shows a CDF for file sizes.  We note that the majority of files in their system are small, with approximately 75% being less than 1KB; this is consistent with later work as well.  Their second figure (Figure 2) describes the proportion of transfer sizes and their source.   We see a spike in the 100, perhaps 256 or 512 being “natural block sizes” that applications would use.

demos figure 5They establish lofty performance requirements: “[T]he file system will have to support a bandwidth of 20-60 megabits/second or higher”. Our performance requirements today would be much higher, but this recognizes the reality that then (as now) the I/O bandwidth of storage is often the rate limiting factor.

DEMOS is paired with a centralized storage facility (“Common File System” or CFS) that is to provide the function of what we would now think of as a centralized file server.  While not yet implemented by the time of the paper, their plan was to introduce automatic file migration and staging.

The central bit of the paper then describes the constituent parts of the file system.  This maps rather well onto what I have seen in the typical file system: a “request interpreter” that handles requests from applications.  Even their description is appropriate: “parameter validation and request translation”; a “buffer manager” that handles the allocation of buffer cache space (often virtual cache these days); and a “disk driver” that handles low level data operations, such as filling or storing the contents of buffers.

Figures 3 and 4 capture their insight into the disk manager.  This dovetails with their discussion about efficiency of I/O, including observations about queue management (“shortest seek time first” order for requests, and then sub-sorted by “shortest latency time first”).  This is a clear “hat tip” to the impact that rotational latency and track seek time has on performance.

Speaking of performance, the authors discuss this.  It leads to their observations on improving I/O performance: “I/O operations out to proceed in parallel with computation”.  Their point is that serializing these things decreases overall performance.  Their second observation: “[T]he length of time an I/O operation takes should be reduced as much as possible.”  This seems logical and is one reason why they use their optimized strategy.

There is a section on “file system buffering” that touches on the tradeoffs between using memory for buffer caching versus other possible uses.  Thus, the authors evaluate how increased buffering impacts their CPU utilization – this is in keeping with their goal of parallelizing I/O and computation.  Their observation?  The greatest benefit comes from a small number of buffers, in their analysis eight buffers provides most of the benefit. Building on that Figures 6 and 7, they observe there is a clear limit to the benefit of further buffering.  These days we do not think too much about this because we tend to use virtual caches, so the amount of physical memory is really managed by the virtual memory management code, yet the observation would likely still apply.  There is a limit to the benefit of buffering.

The authors also point out that disk allocation is a challenging.  They employ allocation bit maps, cluster allocations, over-allocate, and even use simplistic predictive read-ahead.  They refer to these as “strategy” routines.

In general, this is a great introduction to the basic structure of a media file system.  There are plenty of details that will be refined in later work.

 

 

 

The Cap File System

The Cap Filing System
R. M. Needham and A.D. Birrell, Symposium on Operating Systems Principles, 1977, Association for Computing Machinery.

I’ve fallen behind this past ten days, working towards a deadline.  My own conference paper is now submitted and I’m working on recovering.  What that means is this week is going to be a busy one as I work on catching up.  Adding to the fun, FAST is going on this week (February 13-16, 2008).  I hope to add some bonus discussions on the current work being presented there.

Let’s get to discussing this paper.

CAP is a capabilities based system.  The capability systems are a parallel idea that has been explored from time to time versus the access control system for Multics/UNIX inspired operating systems.  The latest contender in this space would be Fuschia, an experimental operating system that Google is developing (though it is an open source project under a mixture of licenses).  Interestingly, there are reports it runs on the Pixelbook in addition to the previous hardware it had supported.

At any rate, the idea behind a capability is to use identifiers that encapsulate the access rights within the capability itself.  Thus, it is a name and the virtue of having the name is that it means you have the access inherent in that name.

These capabilities all represent to a program the right to have or do something: the preservation of information from one run of a program to another (a universal operating system requirement) is thus seen by the CAP programmer as the preservation of a capability rather than of an object itself.

The paper describes how capabilities are managed, which interacts with the “filing system” of course, since it has to store persistent information.  They describe the “system internal name” (SIN), which is combined with a disk address to map memory segments to actual storage.  The capability (in a “directory”) then becomes a repository of these persistent objects.  This also creates a reference to the disk block that is in use, ensuring those storage regions are not reused.

One interesting characteristic of their system is that they do not guarantee free disk space will be recycled quickly (“[T]here is no guarantee that inaccessible disk space will be relinquished at the earliest possible moment.”)  Indeed they note that space reclamation is only done when the system reboots and “[T]he filing system is not designed to run forever.”

They discuss how CAP differs from prior work (e.g., OS/360) where only the name matters; there is no concept of a directory for use as part of the capability system.  The directory actually provides them an additional level of control as well, and they use directory capabilities as well as segment capabilities.  Directories may be persistent (so they have a SIN and disk block location) or ephemeral (so they disappear when the program exits) – a sort of “built in” temporary storage concept for ephemeral memory management.

Sharing objects is now trivial – you simply tell someone the correct name; that embeds the capability within it.  They do not describe the mechanisms used for transfer (“[B]y mechanisms which do not concern us in detail here.”)

They also describe how this name space is similar to other systems (UNIX and CAL-TSS) but different:

  • Access is associated with the name which in turn references information in the directory.  It is not an attribute of the file itself.
  • The name space need not be a “strict hierarchy”.  This means that portions could become disconnected, or even be private to a single application.
  • Their use of directories behaves similar to the model of “current directory” (presumably in UNIX) even though CAP expressly does not have a concept of current directory.
  • Directories are not even directed acyclic graphs!

The paper describes how capabilities work, since they are a fine-grained control mechanism.  They explain that the holder of an existing capability (a program) may generate a more restrictive capability to provide to another program.  Since capabilities apply to both individual files as well as directories, it is possible for a program to transfer a set of capabilities by creating a new directory and storing the relevant capabilities to the target program.

The names themselves can be quite ugly, as they incorporate capabilities within them.  They describe some conventions they employ in terms of name management and directory placement, but point out these are conventions and not a hard requirement of the system.

CAP certainly presents a rather different model of a file system than we see in other systems.  The idea of disconnected name spaces that are only visible to particular programs is an intriguing one.  They use the example of the password database, which requires the program have the password file capability.

They discuss security.  The directory manager is a separate module that programs use to interact with the directories to which they have access.  To invoke the directory manager, the caller must have the ENTER capability.

I find this to be exactly the type of thought provoking paper I had hoped to find as I comb through these old papers.  The idea that a file system name space need not be connected, that it could be private to a particular program or set of programs, and embedding access rights (“capabilities”) into the name will give me plenty to think about.

If you would like to know more about CAP there is a paper about it in the prior Symposium on Operating Systems Principles: The Cambridge CAP Computer and its Operating System.  It is not too surprising that this is available from Microsoft Research, as they also built a capability based operating system (or two): Singularity and Midori.

 

Shanghai Bound!

I’m on my way to the the Symposium on Operating Systems Principles, which is a well-known operating systems conference.  This time it is in Shanghai, China so I’m making my way there.  I do not have the privilege of presenting anything there this trip, but hopefully I will be able to do so in the future – it really is just a small matter of coming up with some interesting research that is worthy.

This will be my first time in China.  Definitely looking forward to seeing some of the interesting talks that are scheduled throughout the conference!

Decision Made

The decision deadline was last weekend and I confirmed my decision, accepting one of the offers and declining the other.  I’m excited that I’ll be starting my new program officially near the end of summer.

I’m already starting to think about the fact that this journey is really about to begin.  I’d be lying if I said that I weren’t still in a bit of shock – euophoric shock, for sure, but still shock.  Despite everything that I had going against me, I managed to make this happen – and no amount of actually thinking about it makes me view it as quite a (wonderful) surprise.

For example, if I hadn’t been in the position to take the research assistant position with UBC, I suspect I’d have been less likely to have been offered admission there.  Of course, had I not been summarily dismissed from my job just a couple weeks earlier, i wouldn’t have been in a position to do so.   Pretty exciting, indeed!