Home » Operating Systems

Category Archives: Operating Systems

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 532 other subscribers

December 2018
S M T W T F S
« May    
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Moving to the now

I’ve been gone for a bit; real life keeping me busy.  Part of that has been exploring the current edge of my technology space.  So I’m going to take a break from the historical paper review (but it will come back) and instead start talking about new storage technologies and the interesting things that we can do with them.

Specifically, I have been looking at the brave new world of “non-volatile dual inline memory modules”.  What this means is that we now have persistent memory that is directly accessed via processor primitives.  That means a load or a store instruction can be used to access this kind of memory.  We have actually been talking about persistent memory since the dawn of time.  Many of the early operating systems papers think of disk storage and memory as being different forms of memory.

The big change here was when computers moved away from magnetic core memory into solid state memory.  Intel’s first product was dynamic RAM (though it was invented by IBM).  DRAM was cheaper and faster than core – and much faster than magnetic storage.  Thus the abstraction of memory and disk being part of the same continuum we began to think of them as being distinct.  DRAM was transient; disks and tapes were persistent (note that magnetic core was persistent until someone read it, at which point it lost its state and had to be rewritten – even across power cycles).

Disk storage was often used to contain data from memory, such as for paging or segmentation.  Over the years the operating systems community learned many ways to make disks seem fast: caching, read-ahead, asynchronous I/O, reordering operations, etc.

Computers – and memory – were able to experience vast increases in speed.  Disk drives did improve, but only modestly in comparison.

Memory technologies have continued to evolve.  Persistent memory options increased as well; flash memory is one of the most common today and is used inside solid state drives and NVMe disks, both of which are vastly faster than rotating media disks.

Another memory technology that has been discussed for more than 20 years has been persistent RAM.  One way this has been done in “real products” has been to simply add a backup power source: some sort of battery.  Then if the power is lost, the data contents of volatile memory (DRAM) can be written to persistent storage (e.g., Flash).  This approach has been a stop-gap on the way to the actual persistent memory solution that has been promised for at least a decade: persistent memory that acts like DRAM.  Intel is now starting to ship their new Optane memory products. Samsung has announced their new Z-NAND products.  Other technologies remain “in development” as well.

Why does this matter?  Storage is suddenly getting faster.  We went from 10 millisecond access times to 0.2 millisecond access times (HDDs to SSDs).  Now we are looking at going from 0.2 milliseconds to 200 nanoseconds – six orders of magnitude faster.  This sort of change is profound.  We’ve been talking about it for many years, trying to reason about it.  It is now materializing.  Over the next several years the other promise of NVM is going to materialize: it supports higher density than DRAM.  It is slower than DRAM; where DRAM access times are on the order of 50-100 nanoseconds, NVM access times are on the order of 125-800 nanoseconds (writes being slower than reads).

File systems that have been optimized for working on hard disks or SSDs don’t necessarily make sense on NVM.

Thus, the area I’ve been looking at: expanding my own understanding of NVM.  Since this is still related to my own exploration of file systems, I’ll use this as my soapbox for exploring the space.

Let’s see where this journey goes.  And I promise, I’ll come back to the old file systems papers after this diversion.

 

 

The DEMOS File System

The DEMOS File System
Michael L. Powell, In Proceedings of the sixth ACM symposium on Operating systems principles, pp. 33-42.

This paper delves into the nitty gritty details of constructing physical file systems.  I was surprised that it had relatively few citations (61 according to Google Scholar when I checked) because, having read it, I would hand this paper to someone asking me “what are file systems?”  I suspect that the more frequently cited paper in this area will be “A Fast File System for UNIX,” which cites to this paper.

The target for DEMOS is the CRAY-1 supercomputer, at the time the fastest computer in the world.  As a matter of comparison, modern mobile devices have more computational power (and often more I/O bandwidth) than the CRAY-1 did.

DEMOS Figures 1 and 2The author discusses the design of a new file system for use with a custom operating system for the Los Alamos National Laboratory’s CRAY-1 computer system.  A key for this project was that it seeks to improve performance as much as possible.  After all, why build a super-computer if you then cripple it with features that don’t enhance its performance?

What I find delightful about this paper is that it describes the basic constituent parts of a file system, as well as strategies for optimizing performance.  It does so in a clear and understandable fashion.DEMOS Figures 3 and 4

DEMOS utilizes a UNIX-like hierarchical file system model.  It has directories and files. It does not have the link model from Multics so paths to files are unique in DEMOS.  Files are managed in units of blocks (4096 bytes) but I/O is specified as bytes (interestingly, they specify eight bit bytes as nine bit machines were still in use.)

The authors discuss file sizes.  To the best of my knowledge this is one of the earliest papers covering this common subject  (which is revisited periodically because workloads change and file sizes also change).  One of the common themes I have seen in other work is mirrored here: most files are small.  Figure 1 shows a CDF for file sizes.  We note that the majority of files in their system are small, with approximately 75% being less than 1KB; this is consistent with later work as well.  Their second figure (Figure 2) describes the proportion of transfer sizes and their source.   We see a spike in the 100, perhaps 256 or 512 being “natural block sizes” that applications would use.

demos figure 5They establish lofty performance requirements: “[T]he file system will have to support a bandwidth of 20-60 megabits/second or higher”. Our performance requirements today would be much higher, but this recognizes the reality that then (as now) the I/O bandwidth of storage is often the rate limiting factor.

DEMOS is paired with a centralized storage facility (“Common File System” or CFS) that is to provide the function of what we would now think of as a centralized file server.  While not yet implemented by the time of the paper, their plan was to introduce automatic file migration and staging.

The central bit of the paper then describes the constituent parts of the file system.  This maps rather well onto what I have seen in the typical file system: a “request interpreter” that handles requests from applications.  Even their description is appropriate: “parameter validation and request translation”; a “buffer manager” that handles the allocation of buffer cache space (often virtual cache these days); and a “disk driver” that handles low level data operations, such as filling or storing the contents of buffers.

Figures 3 and 4 capture their insight into the disk manager.  This dovetails with their discussion about efficiency of I/O, including observations about queue management (“shortest seek time first” order for requests, and then sub-sorted by “shortest latency time first”).  This is a clear “hat tip” to the impact that rotational latency and track seek time has on performance.

Speaking of performance, the authors discuss this.  It leads to their observations on improving I/O performance: “I/O operations out to proceed in parallel with computation”.  Their point is that serializing these things decreases overall performance.  Their second observation: “[T]he length of time an I/O operation takes should be reduced as much as possible.”  This seems logical and is one reason why they use their optimized strategy.

There is a section on “file system buffering” that touches on the tradeoffs between using memory for buffer caching versus other possible uses.  Thus, the authors evaluate how increased buffering impacts their CPU utilization – this is in keeping with their goal of parallelizing I/O and computation.  Their observation?  The greatest benefit comes from a small number of buffers, in their analysis eight buffers provides most of the benefit. Building on that Figures 6 and 7, they observe there is a clear limit to the benefit of further buffering.  These days we do not think too much about this because we tend to use virtual caches, so the amount of physical memory is really managed by the virtual memory management code, yet the observation would likely still apply.  There is a limit to the benefit of buffering.

The authors also point out that disk allocation is a challenging.  They employ allocation bit maps, cluster allocations, over-allocate, and even use simplistic predictive read-ahead.  They refer to these as “strategy” routines.

In general, this is a great introduction to the basic structure of a media file system.  There are plenty of details that will be refined in later work.

 

 

 

An Experimental Implementation of the Kernel/Domain Architecture

An Experimental Implementation of the Kernel/Domain Architecture
Michael J. Spier, Thomas N. Hastings, David N. Cutler,  ACM SIGOPS Operating Systems Review, vol. 7, no. 4, pp. 8-21. ACM, 1973.

While looking for file systems papers, one of the approaches that I used was to walk through the successive years of conference proceedings from the Symposium on Operating Systems Principles (SOSP).  This biennial conference has been going on for more than 50 years at this point and contains a wealth of interesting and influential papers from the operating systems domain.  It is common to find file systems papers at these conferences.

This particular gem comes from the fourth such symposium (1973).  While it discusses various aspects of operating systems, which makes it quite interesting anyway, but it also discusses file systems within the context of this experimental operating system that may prove insightful.

This paragraph caught my eye:

On our proposed architecture with its many monitor-like domains we envisioned the possibility of supporting the concurrent existence of similar purpose redundant supervisory services (e.g., 3 file systems, 7 file nomenclature hierarchies, 4 login responders, etc.) which, for all practical intents and purposes, would provide the external appearance of concurrent operating systems of disparate natures.

This is the first time I have seen a reference to the concept of having multiple file systems simultaneously as an explicit part of the operating system model.  It also describes this intriguing concept of “concurrent operating systems of disparate natures”.  One of the authors (David N. Cutler) goes on to work on the construction of several different operating systems including what is now Windows today – and it’s architecture is one in which it was designed to provide precisely this type of concurrent disparate operating system environments.  Windows also supports multiple dynamically loaded file systems which now does not look so novel, as essentially all modern operating systems do.

Indeed, what they go on to describe is the “domain machine” in which the monolithic operating system has been divided up into distinct components, which constitute distinct supervisory domains.  One of these domains represents the small set of functionality necessary to manage the actual hardware,   The goal was to keep it small and bug free.

Complicating this was the author’s observation that the system must be re-entrant in order to avoid deadlock.  This was a deliberate goal and they even go so far as to point out this behavior when involving the file system – the kernel may need to invoke the file system to handle demand loading of a domain, but the file system must invoke the kernel in order to access the disk where the data is stored.

Another of their observations intrigued me: they separate the “traditional file system” into a storage layer (“disk space manager” as they call it) and a “name space manager”.  They admit the possibility that this name space might be hierarchical.  They also explicitly note they could support “differently flavored file systems”.

In scheduling, they touch on the fact that I/O bound processes (e.g., the file system) benefit from being given real-time scheduling priority on traditional operating systems; in this domain system they do not need to do so.

This paper definitely surprised me, as I see the germs of ideas here that will be clearly realized in later systems work.  Some of these ideas clearly survive and emerge later, including those that affect file systems.

As we will see later, this logical divide of file systems actually naturally develops along the storage line, though the name space is not so clearly delineated.  Perhaps it should be.

While not directly germane to file systems, Dave Cutler goes on to work on RSX-11M and VMS (at Digital Equipment Corporation), as well as Windows NT (at Microsoft).  The latter became the basis of the Windows OS beginning with Windows XP.

 

The UNIX Time-Sharing Operating System

The UNIX Time-Sharing Operating System
Dennis M. Ritchie and Ken Thompson, Bell Labs, Communications of the ACM, July 1974, Volume 17, Number 7, pp. 365-375.

This paper describes Version 3 of UNIX.  It was Version 6 that became the basis of the Berkeley Software Distribution (BSD) version of UNIX.  The only other operating system in CS history to date that has had so much impact on operating systems development is MULTICS, and UNIX is a direct descendant.  Developed by several Bell Labs researchers that had been involved in MULTICS, its goal was to try and build a smaller operating system that retained what they viewed as the key benefits of MULTICS.

Much of UNIX Version 3 was written in the C programming language, itself derived from BCPL, a language that had been used on the MULTICS project.

The most important job of UNIX is to provide a file system.

These words leave little doubt about the role of file systems in UNIX and the importance assigned to them.  The paper then goes on to describe files, in similar terms to MULTICS: files, directories, and special files (devices).  We see the hierarchical file system of MULTICS reflected back in the description of the system.  They talk about file names being 14 characters or less, the formation of paths, and the iterative walk of names through the file system name space to find other directories, as well as the terminal file nodes.

The directory structure is constrained to have the form of a rooted tree.

This is what I am looking for – the why of hierarchical file systems.  I found the answer here, unsurprising yet ironically disappointing:

The reason for this is to simplify the writing of programs which visit subtrees of the directory structure, and more important, to avoid the separation of portions of the hierarchy.

Not surprising, yet not precisely what I had expected.  I had expected the reason to be for the simplicity of the operating system (though they do allude to this by discussing the difficulty of knowing when it is safe to delete a directory.  They describe links to files, however, so their file system is not really a tree.  Rather, it is more like a directed acyclic graph (DAG).  Files do not have pointers back to their directories, but directories have pointers back to their parent.  Thus we have the distinction.  The namespace is a DAG.  Files don’t really live in the name space directly, they are referenced from the namespace, but have a reference count.

Oddly, with that, I found what I came for, at least in terms of insight for my own research area.  There is a certain satisfaction in being able to point to this seminal document and say “this is why we got to where we are now.”

But if I stopped at this point, I would be leaving out the bits I had not expected to find.

First, the mundane: they discuss removable file systems, the fact that this is in fact a collection of name spaces, combining persistent name spaces with one another using a non-persistent mechanism (mounting),  There is a simple description of how the file system is itself implemented.  They describe the i-number (now the inode number) as an index into the file table.  Thus, a directory entry is where the name lives, it merely refers to the file using its i-number.  These entries are then called i-nodes.  The i-node contains information about the owner of the file, the protection bits governing the file, the location information for where logical data is physically stored on the medium, the size of the file, its timestamps, it’s attribute bits, and the number of directory entries referencing the given i-node.

Surprisingly, not all that different than it is now, 45 years later.  Implementation details have changed, as we no longer limit files to 10MB in size.

They describe bufering, they describe sector sized I/O and how it is more efficient for a program to do sector-sized I/O operations.

Much of the paper has nothing to do with file systems.  I leave that to the interested reader to explore beyond that.

There are two interesting tid-bits remaining:

  • They lost data once, on a hard disk that failed.  The backup was 3 days old.
  • They considered the permuted index application as one of the “major programs available”.

The fact they considered the permuted index important at this early stage was an interesting insight to me.  Clearly, the ability to “find our stuff” is one that’s been around since the dawn of time.  Maybe this research direction of mine does make sense.

MULTICS I/O

Feiertag, Richard J., and Elliott I. Organick. “The Multics input/output system.” In Proceedings of the third ACM symposium on Operating systems principles, pp. 35-41. ACM, 1971.

In this paper we return to MULTICS but this time the focus is on the I/O system.  Much of the content is not directly related to the file system, but it certainly does touch on familiar concepts including the file system.

By this time MULTICS had been released and Bell Labs had dropped out of the project (to build the “castrated” version of MULTICS that became UNICS and ultimately UNIX.)  Thus, this is more a reflection of the system that has been built (versus the earlier paper, where it is a prospective description of the system).

So what is this I/O System?  The description is of a system with a flexible interface, from a highly general (and easy-to-use) interface, to a highly specific (and challenging-to-use) interface to devices.  The idea seems to be that devices have a common interface – the general level interface – but may have specific features that require device-specific knowledge to exploit – the specialized interface.

There are some interesting insights in this paper, again, that reflect the way we do things now rather closely.  For example, the MULTICS I/O Model puts the file systems as a separate service, but then shows they interact with an I/O system component.  Thus, applications can be seen to directly interact with the file system, via its own API, or they can do so through the more general Device I/O Module (DIM).

This abstraction of devices is one we see in many subsequent operating systems (and all modern ones).  File systems gained their idiosyncratic behavior model fairly early, then, where they play closely, but not subserviently, to the I/O subsystem.

Indeed, much of this paper relates to how I/O from files may be redirected for other purposes, such as for the input to an application instead of a more common interface device, such as a teletype (tty) machine.  Of course, MULTICS had already indicated that files were treated logically as memory and their contents could be pulled in “as needed” (demand paging).

Multics IO System Figure 2

This paper also describes the interface to the file system, albeit in general terms.  It describes the idea that logical data flows (streams as they call it) can be separated from the actual data storage mechanism used beneath them.  It is this mechanism that permits them to redirect the tty.

This paper thus introduces some surprising concepts around I/O: a semi-abstract interface, the idea that file systems were a consumer of that interface (and in theory a provider of the interface).  The ability to extend the OS for new device types because of this generalized interface.  After reading it, I can see its influence on both Linux (via UNIX) and Windows (presumably via RSX-11M).

 

TENEX

Tenex, a Paged Time Sharing System for the PDP-10
Communications of the ACM, March 1972, Volume 15, Number 3
Daniel G. Bobrow, Jerry D. Burchfiel, Daniel L. Murphy, and Raymond S. Tomlinson, Bolt Beranek and Newman Inc.

TENEX is a new time sharing system implemented on a DEC PDP-10 augmented by special paging hardware developed at BBN. This report specifies a set of goals which are important for any time sharing system. It describes how the TENEX design and implementation achieve these goals. These include specifications for a powerful multiprocess large memory virtual machine, intimate terminal interaction, comprehensive uniform file and I/O capabilities, and clean flexible system structure. Although the implementation described here required some compromise to achieve a system operational within six months of hardware checkout, TENEX has met its major goals and provided reliable service at several sites and through the ARPA network.

Storage organization and management in TENEX
Daniel L. Murphy
AFIPS ’72 (Fall, part I) Proceedings of the December 5-7, 1972, fall joint computer conference, part I
Pages 23-32, Anaheim, California — December 05 – 07, 1972

The first of these two papers discusses TENEX; much of the paper is not about file systems, but there are about a page and a half about the TENEX file system.  The second paper goes into greater detail about storage – including the file system – for TENEX.  I have picked these papers for several reasons:

  • They demonstrate the impact of the MULTICS work on the systems that follow (certainly beyond the obvious UNIX work).
  • They introduce the concept of virtual integration with the file system
  • They introduce the concept of copy-on-write
  • They show the fundamental drive to maintain backwards compatibility
  • They introduce the concept of a suffix (or extension) as a means of identifying the purpose of a file
  • They delve into the details of open file state management

A TENEX file has a compound name structure:

A powerful and versatile directory and file naming facility is provided in which a particular file is identified by a fixed-depth path which includes device, directory name, file name, extension, and version.

The fixed-depth path is a limitation the TENEX developers chose to implement for backwards compatibility with existing PDP-10 programs, an early example of how application compatibility is often a critical concern for operating systems development.  The authors do note they are considering expanding upon this to make it arbitrary depth – a feature of MULTICS.

Both papers also discuss the Job concept, the idea of a set of related processes.  The implication is that processes within a single job can share resources, thus providing more of that “balance between sharing and isolation” that operating systems have to handle.  When a file is opened successfully, a Job File Number is created in a table.  That encapsulates the information about how to find the given file and instead uses an index value – in other words, a file descriptor or file handle.  “Once the initial association of JFN and file has been established, the JFN is used for all ensuing operations on the file, including sequential reading and writing, opening, closing, etc.

TENEX then allows random access to the file by combining the JFN with an index identifying the desired element.  The authors point out that this is more flexible than previous systems in which the file was not random access.

TENEX File to Process MappingThis becomes flexible when describing the page map for a given process.  The Process Map points from an entry in the virtual address space to a corresponding JFN and index (offset).  Thus, the contents of that page can be retrieved on demand from the underlying file system.

None of this should look particularly surprising to anyone familiar with modern operating systems, of course.  This just happens to be part of the path to get to where we are today.  The papers actually go into greater detail about the details here, including access control, but that isn’t germane to my file systems focus.

Since the file path names identify files over the domain of all jobs in the system, it is evident that our naming and mapping procedures readily provide a means for sharing storage. Using the appropriate path names (including legality checks), processes in two or more different jobs can identify the same file, and each can obtain a JFN for it. Nothing in the mapping procedures specified above requires that either process be aware of the other’s access, and so each process constructs an identifier and places it in its process map (Figure 4).

In other words, the contents of regions of a file can be shared across processes.  This is, in fact, transparent to the processes.

Sharing at this level would be particularly important because of the limited address space and desire to share code – the papers discuss this, and point out the benefits of this form of sharing.

This leads to their inclusion of copy-on-write.  “One other important TENEX feature which facilitates sharing is a type of page access called copy-on-write. To our knowledge, this facility was first developed and used on the BBN-LISP system for the
XDS-9407.”  Thus, while not original to TENEX, this is a logical extension beyond what MULTICS had described.  Copy-on-write is a mainstay of both modern operating systems and some file systems.

Interestingly, TENEX seems to implement a rudimentary page cache as well:

To implement the file sequential monitor calls (e.g., byte-in, byte-out) the monitor maintains a number of “window” pages in a separate map invisible to the user process. For each file with sequential operations in progress, the monitor maps the file page which is to receive or provide the next byte. Each call from the user causes one or more bytes to be loaded from or stored into this page, and a count updated to determine if a new page should be mapped. Movement through the file is accomplished by mapping successive pages, and the sequential access module does not have to be aware of the physical device on which the page resides nor interface with I/O driver modules to read or write it. This modularity is very satisfying from an operating system design point of view.

Thus, byte level access to block level devices is managed via this window page mechanism.  The files are not strictly memory mapped, though, so this is more like a buffer cache than a page cache.

They also use the file system to implement inter-process communications – a form of file-backed shared memory.

Page management is tightly tied to this implementation as well, though the description involves what we would likely consider the memory management unit and page fault handling logic as well as the page to file/offset mapping necessary to provide the system’s demand paging.

Two other interesting aspects of their file systems model includes a pair of extra mapping layers: one for mapping from logical storage address to physical storage location, and the other mapping from multiple distinct page references to a single storage block.

The underlying rationale here is that this permits relocating the storage to different locations, typically from higher speed storage (when warm/hot) to slower speed storage (when cold).

This mechanism doesn’t involve changing the actual description of the storage and instead moves to a logical storage addressing model.  It was interesting to me to see this level of indirection added in such an early system, but clearly the mismatch in speeds between various types of storage dictated the importance of this scheme.  Once again, it is interesting to see how little the problems we face have actually changed.

The data sharing model also uses an extra level of indirection.  I’m familiar with this model from my own work in Windows, where shared memory is indirectly mapped in a similar fashion.  That this mechanism was around in the early 1970s is once again a reminder of how little operating systems have fundamentally changed.

There are many aspects of this paper that I have glossed over, in no small part because they don’t really apply to modern systems – we don’t have to worry about drum memories, for example, no more than we need to worry about punch card readers.  These two papers, however, clearly lay out a deeper realization of the file system than I have seen in prior work.

TENEX differed from MULTICS in a number of ways and the two systems remained competitors for many years.  TENEX ultimately would become TOPS-20 and in turn be supported by Digital Equipment Corporation.  It was an important part of the early (pre-VAX) ARPANET and survived for many years as a viable system.

If you would like to read more about this, I’d recommend Dan Murphy’s Origins and Development of TOPS-20 post.  It provides further fascinating background on how TENEX evolved and how systems evolved.  I leave you with the final words from that post:

Although this book is about DEC’s 36-bit architecture, it is clear now that hardware CPU architectures are of declining importance in shaping software. For a long time, instruction set architectures drove the creation of new software. A new architecture would be introduced, and new software systems would be written for it. The 36-bit architecture was large in comparison to most other systems which permitted interactive use. It was also lower in cost than most other systems of that size. These factors were important in the creation of the kind of software for which the architecture is known.

Now, new architectures are coming along with increasing frequency, but they are simply “slid in” under the software. The software systems are far too large to be rewritten each time, and a system which cannot adapt to new architectures will eventually suffer declining interest and loss of competitive hardware platforms. TOPS-20 didn’t pass that test, although it did contribute a number of ideas to the technology of interactive systems. How far these ideas ultimately spread is a story yet to be told.

There is considerable insight in this for me, particularly the admonishment “software systems are far too large to be rewritten each time” as it resonates with (one of) my own research directions.

 

 

MULTICS

A General-Purpose File System For Secondary Storage
R. C. Daley and P.G. Neumann
Published in the Proceedings of the American Federation of Information Processing Societies 1965, Fall Joint Computer Conference, vol. 27, pp. 213-229.

This is the seminal paper discussing how file systems were envisioned within the MULTICS operating system. While you can still run MULTICS, it is a curiosity at this point.  However, virtually all operating systems we now use today descended from MULTICS and thus, its design profoundly influenced their development.

This paper is a delightfully easy read, written at the dawn of this new field of multi-programming. Prior to this time computers were essentially single user.  The introduction of the idea of sharing a computer with other users was nascent.  Thus, the experts working in the field at the time had to begin thinking about things like organization, security, and sharing.

Indeed, a common tension in operating systems literature in general is between isolation and sharing.  Isolation is great from a security perspective, but is inefficient.  Each user of the system often uses the same programs, for example, but we do not want to keep a distinct copy of the same thing for each user as that would be wasteful.   This profoundly impacts the file systems work because the file system is the point of persistence, the level at which shared resources become manifest.

But I’m jumping ahead at this point.  Let’s start with the simple question: What is a file system?  As we will find during this journey, its meaning and usage is far richer than one might think upon first reading.  While this paper is not the first paper to discuss storage and file systems, it is a good example of the state of the art in 1965.

This paper offers us some useful definitions:

file is simply an ordered sequence of elements, where an element could be a machine word, a character, or a bit, depending upon the implementation.

That seems to be a rather broad definition, but it is a reasonable place for us to start.  This does not impose structure on the content itself, which proves to be one reason why this abstraction ultimately turns out to be a powerful one.  “At the level of the file system, a file is formatless.

This paper also establishes the name space abstractions as well:

As far as a particular user is concerned, a file has one name, and that name is symbolic.  (Symbolic names may be arbitrarily long, and may have syntax of their own.  For example, they may consist of several parts, some of which are relevant to the nature of the file, e.g., ALPHA FAP DEBUG.)

This again paints a rather broad abstraction.  The name has meaning to the user, but otherwise is just symbolic data for the file system.  The paper goes on to define the now classic name space specific abstraction:

directory is a special file which is maintained by the file system, and which contains a list of entries.  To a user, an entry appears to be a file and is accessed in terms of its symbolic entry name, which is the user’s file name.  An entry name need be unique only within the directory in which it occurs.

 

Thus, this paper lays out the quintessential aspect of modern file system name spaces: they are hierarchically organized.  The paper describes this in greater detail and refers to links and branches.

The authors describe how users might work in different parts of the hierarchical name space.  They observe that this then creates a situation in which sharing of files might be an issue, and thus they introduce the concept of links to resolve this.


link in this context is an entry in a directory that refers to an existing file within the file system.  Thus, we see the genesis of links, though the paper does not clearly delineate symbolic links from hard links.  This does help motivate why these features show up in UNIX a number of years later, however.

The paper goes into greater detail about how they envision this hierarchical name space functioning, including traversal, working directory, and links.

From there the authors then turn their attention to a problem inherent in having a single shared file system name space with data contents belonging to different users, namely managing access to the individual files.  Thus, they introduce access controls.  They note that a file system could default to either permissive or restrictive access within this model.  From this point they incorporate the access control list, the access mode for a given file, and the concept of access attributes.  Of the five attributes listed, four of them are familiar: read, execute, write, and append access have analogs in modern file systems.  The fifth, trap is interesting, in that it defines an explicit exception mechanism for access control that requires external validation for access – an interesting generality that is not typically present in modern file systems.

They also describe file sharing, introducing the concept of explicit operations to lock access to a given file, or to unlock the file.  They suggest that a locked file would require the user provide a designated key to permit accessing the file; I have seen this approach in some file systems, actually, though it is fairly uncommon.

The paper describes access control at length with no real surprises otherwise, other than perhaps the fact that many of these features disappeared from later operating systems, only to be resurrected and added many years later.

There is quite a bit in the paper about backup and restore processing.  Much of the detail here is interesting historically but does not really add much to my exploration of file systems.  If you are looking for more information about the history of backup, I do encourage you to read those sections.  Having done magnetic tape backups in the past, I’m content with leaving them be.

There is one observation I will point out from this section, however.  The authors actually discuss a recurring theme in file systems – the fact that storage itself ends up being a multi-level media management challenge.

In most cases a user does not need to know how or where a file is stored by the file system.  A user’s primary concern is that the file be readily available to him when he needs it.  In general, only the file system knows on which device a file resides.

The file system is designed to accommodate any configuration of secondary storage devices.  These devices may cover a wide range of speeds and capacities.  All considerations of speed and efficiency of storage devices are left to the file system.  Thus all user programs and all other system programs are independent of the particular configuration of secondary storage.

They go on to describe migration of data from hot to cold storage as needed and point out these are functions of the file system.  I found this an interesting insight since even today we routinely deal with these sorts of situations, such as the Strata paper from SOSP 2017 (“Strata: a Cross Media File System“).

The remainder of the paper focuses on how the file system is implemented in MULTICS

I must admit, I found this section of the paper both detailed and yet fascinating because of the broad sweeping nature in which the authors lay down fundamental ideas that we see in modern operating systems.  Some of it is not really in the scope of file systems (“segment management” which appears to equate to shared executables, such as programs and shared libraries in modern systems,) the concept of demand segment loading (which presages demand paging,) the concept of file system search, mechanisms for managing file systems, memory recycling (reminiscent of page reclamation in modern virtual memory systems based upon its description,) device management, and I/O queueing.  They finish up by describing their “multilevel storage management module”, backup system, and utility functions.  The latter has (now) amusing functions like “file to cards” and “tape to file”.

So these give us asynchronous operation, paging, backup, hierarchical storage management, security, file sharing, and directory sharing.  Most of these concepts survive to this day.  Indeed, what I found most surprising after reading this paper is how few of these ideas disappeared: traps and file system search are the two that spring to my mind.

Thus the lesson of today’s paper: in 1965 the MULTICS team more or less laid down a model for how file systems were to work in virtually all modern operating systems.   The subsequent work will provide us with greater insight into the details, but the basic shape of our file systems has not strayed far from this early vision.

 

US Patent 9,830,329 Issued

US Patent 9,830,329 issued November 28, 2017.

It is about a technique for utilizing a pre-existing tunneling mechanism to invoke remote functionality. The original inspiration for this was the need to communicate from a component on a client system to utilize functionality available on the server.

The first case for this was when I wanted to obtain meta-data that was maintained on the server, but not available to me on the client. By exploiting a pass-through mechanism in the Windows SMB client, I could send a query to the server, have a component on the server get the answer, and respond back to my software on the client. This turns out to be useful in several cases – and doing it through the existing SMB infrastructure allowed a clean, functional solution.

Windows and “reserved names”

I’ve fallen into the habit of answering some questions on Quora and those which involve answering questions about Windows file systems seem to garner the most interest and the most disagreement.

Yesterday I answered a question entitled Why can’t I save a folder name “con” in Windows?  I decided to respond to this because the other answers that I read said that Windows reserved this name.   My response demonstrated how you could easily do this with bash running in WSL.  Some comments seemed to think that using bash was cheating, so I sat down and wrote a simple program to create folders with reserved names.  It isn’t bound to WSL and executes as a standard console program (so it is executed from the command line).  I’ve published the source code on github.

It’s been a few months since I’ve posted anything about Windows file systems and I realized that I should start doing so again.  So, that little application program inspired me.

The difficulty that I have with assertions that “Windows doesn’t do that” or “Windows doesn’t allow that” is that they often conflate the limitations of applications, libraries, or subsystems as being limitations of Windows.  Windows NT was developed to be a general purpose host operating system.  The expectation was that it would have subsystems which implemented specific execution environments. While this includes Win32 (the subsystem most often used by applications) it has never been the only subsystem available.  Windows NT 3.1 shipped with Win32, NTVDM, OS/2, POSIX, and native subsystems (though native is more “not bound to a subsystem”).  I had also heard rumors of a UNIX System V Release 4 subsystem as well, but to the best of my knowledge it was never released and I never saw it.

Window dropped support for OS/2 in Windows XP.  The POSIX implementation changed as it became more like a library that called Win32 and native as needed.  What remained of POSIX was dropped in Windows 8.1.  The Windows Subsystem for Linux (WSL) has been added in Windows 10. It is not based upon the old POSIX subsystem, but it certainly exploits functionality that Windows includes because of the POSIX subsystem’s requirement.

Win32 does not have a concept of fork, but Windows includes all the support necessary to create a duplicate address space, either via NtCreateProcess or NtCreateUserProcess (you can see an example of the latter, including protected processes here).  This is used to implement this functionality.

NTFS has always supported the full range of POSIX compliant names.  The downside to using this is that it means Win32 application (like Explorer) can have problems accessing them.  That’s not a limitation of Windows it is a limitation of the application.

I’ve heard several people tell me that “Windows” must include everything installed, not just the operating system.  I disagree with this characterization, but I’ll address the why of that in a future post.

Bottom line: Windows itself does not have restricted names based upon MS-DOS.  Win32 applications usually do, however.  I would also note that Explorer cannot handle long path names – somewhere within the code is a hard coded buffer that is smaller than the OS level maximum path length (which happens to be 32767 16-bit UNICODE characters).  NTFS limits each name component (directory or file) to 256 bytes, but ReFS limits the component name size to 32767 (and this has inspired me in another area I’ll talk about at some point as well).  Other file systems may have their own limits, too.

That’s enough for now.  I’ll find more interesting things to discuss when it comes to Windows file systems, though.