Feiertag, Richard J., and Elliott I. Organick. “The Multics input/output system.” In Proceedings of the third ACM symposium on Operating systems principles, pp. 35-41. ACM, 1971.
In this paper we return to MULTICS but this time the focus is on the I/O system. Much of the content is not directly related to the file system, but it certainly does touch on familiar concepts including the file system.
By this time MULTICS had been released and Bell Labs had dropped out of the project (to build the “castrated” version of MULTICS that became UNICS and ultimately UNIX.) Thus, this is more a reflection of the system that has been built (versus the earlier paper, where it is a prospective description of the system).
So what is this I/O System? The description is of a system with a flexible interface, from a highly general (and easy-to-use) interface, to a highly specific (and challenging-to-use) interface to devices. The idea seems to be that devices have a common interface – the general level interface – but may have specific features that require device-specific knowledge to exploit – the specialized interface.
There are some interesting insights in this paper, again, that reflect the way we do things now rather closely. For example, the MULTICS I/O Model puts the file systems as a separate service, but then shows they interact with an I/O system component. Thus, applications can be seen to directly interact with the file system, via its own API, or they can do so through the more general Device I/O Module (DIM).
This abstraction of devices is one we see in many subsequent operating systems (and all modern ones). File systems gained their idiosyncratic behavior model fairly early, then, where they play closely, but not subserviently, to the I/O subsystem.
Indeed, much of this paper relates to how I/O from files may be redirected for other purposes, such as for the input to an application instead of a more common interface device, such as a teletype (tty) machine. Of course, MULTICS had already indicated that files were treated logically as memory and their contents could be pulled in “as needed” (demand paging).
This paper also describes the interface to the file system, albeit in general terms. It describes the idea that logical data flows (streams as they call it) can be separated from the actual data storage mechanism used beneath them. It is this mechanism that permits them to redirect the tty.
This paper thus introduces some surprising concepts around I/O: a semi-abstract interface, the idea that file systems were a consumer of that interface (and in theory a provider of the interface). The ability to extend the OS for new device types because of this generalized interface. After reading it, I can see its influence on both Linux (via UNIX) and Windows (presumably via RSX-11M).
A General-Purpose File System For Secondary Storage
R. C. Daley and P.G. Neumann
Published in the Proceedings of the American Federation of Information Processing Societies 1965, Fall Joint Computer Conference, vol. 27, pp. 213-229.
This is the seminal paper discussing how file systems were envisioned within the MULTICS operating system. While you can still run MULTICS, it is a curiosity at this point. However, virtually all operating systems we now use today descended from MULTICS and thus, its design profoundly influenced their development.
This paper is a delightfully easy read, written at the dawn of this new field of multi-programming. Prior to this time computers were essentially single user. The introduction of the idea of sharing a computer with other users was nascent. Thus, the experts working in the field at the time had to begin thinking about things like organization, security, and sharing.
Indeed, a common tension in operating systems literature in general is between isolation and sharing. Isolation is great from a security perspective, but is inefficient. Each user of the system often uses the same programs, for example, but we do not want to keep a distinct copy of the same thing for each user as that would be wasteful. This profoundly impacts the file systems work because the file system is the point of persistence, the level at which shared resources become manifest.
But I’m jumping ahead at this point. Let’s start with the simple question: What is a file system? As we will find during this journey, its meaning and usage is far richer than one might think upon first reading. While this paper is not the first paper to discuss storage and file systems, it is a good example of the state of the art in 1965.
This paper offers us some useful definitions:
A file is simply an ordered sequence of elements, where an element could be a machine word, a character, or a bit, depending upon the implementation.
That seems to be a rather broad definition, but it is a reasonable place for us to start. This does not impose structure on the content itself, which proves to be one reason why this abstraction ultimately turns out to be a powerful one. “At the level of the file system, a file is formatless.”
This paper also establishes the name space abstractions as well:
As far as a particular user is concerned, a file has one name, and that name is symbolic. (Symbolic names may be arbitrarily long, and may have syntax of their own. For example, they may consist of several parts, some of which are relevant to the nature of the file, e.g., ALPHA FAP DEBUG.)
This again paints a rather broad abstraction. The name has meaning to the user, but otherwise is just symbolic data for the file system. The paper goes on to define the now classic name space specific abstraction:
A directory is a special file which is maintained by the file system, and which contains a list of entries. To a user, an entry appears to be a file and is accessed in terms of its symbolic entry name, which is the user’s file name. An entry name need be unique only within the directory in which it occurs.
Thus, this paper lays out the quintessential aspect of modern file system name spaces: they are hierarchically organized. The paper describes this in greater detail and refers to links and branches.
The authors describe how users might work in different parts of the hierarchical name space. They observe that this then creates a situation in which sharing of files might be an issue, and thus they introduce the concept of links to resolve this.
A link in this context is an entry in a directory that refers to an existing file within the file system. Thus, we see the genesis of links, though the paper does not clearly delineate symbolic links from hard links. This does help motivate why these features show up in UNIX a number of years later, however.
The paper goes into greater detail about how they envision this hierarchical name space functioning, including traversal, working directory, and links.
From there the authors then turn their attention to a problem inherent in having a single shared file system name space with data contents belonging to different users, namely managing access to the individual files. Thus, they introduce access controls. They note that a file system could default to either permissive or restrictive access within this model. From this point they incorporate the access control list, the access mode for a given file, and the concept of access attributes. Of the five attributes listed, four of them are familiar: read, execute, write, and append access have analogs in modern file systems. The fifth, trap is interesting, in that it defines an explicit exception mechanism for access control that requires external validation for access – an interesting generality that is not typically present in modern file systems.
They also describe file sharing, introducing the concept of explicit operations to lock access to a given file, or to unlock the file. They suggest that a locked file would require the user provide a designated key to permit accessing the file; I have seen this approach in some file systems, actually, though it is fairly uncommon.
The paper describes access control at length with no real surprises otherwise, other than perhaps the fact that many of these features disappeared from later operating systems, only to be resurrected and added many years later.
There is quite a bit in the paper about backup and restore processing. Much of the detail here is interesting historically but does not really add much to my exploration of file systems. If you are looking for more information about the history of backup, I do encourage you to read those sections. Having done magnetic tape backups in the past, I’m content with leaving them be.
There is one observation I will point out from this section, however. The authors actually discuss a recurring theme in file systems – the fact that storage itself ends up being a multi-level media management challenge.
In most cases a user does not need to know how or where a file is stored by the file system. A user’s primary concern is that the file be readily available to him when he needs it. In general, only the file system knows on which device a file resides.
The file system is designed to accommodate any configuration of secondary storage devices. These devices may cover a wide range of speeds and capacities. All considerations of speed and efficiency of storage devices are left to the file system. Thus all user programs and all other system programs are independent of the particular configuration of secondary storage.
They go on to describe migration of data from hot to cold storage as needed and point out these are functions of the file system. I found this an interesting insight since even today we routinely deal with these sorts of situations, such as the Strata paper from SOSP 2017 (“Strata: a Cross Media File System“).
The remainder of the paper focuses on how the file system is implemented in MULTICS
I must admit, I found this section of the paper both detailed and yet fascinating because of the broad sweeping nature in which the authors lay down fundamental ideas that we see in modern operating systems. Some of it is not really in the scope of file systems (“segment management” which appears to equate to shared executables, such as programs and shared libraries in modern systems,) the concept of demand segment loading (which presages demand paging,) the concept of file system search, mechanisms for managing file systems, memory recycling (reminiscent of page reclamation in modern virtual memory systems based upon its description,) device management, and I/O queueing. They finish up by describing their “multilevel storage management module”, backup system, and utility functions. The latter has (now) amusing functions like “file to cards” and “tape to file”.
So these give us asynchronous operation, paging, backup, hierarchical storage management, security, file sharing, and directory sharing. Most of these concepts survive to this day. Indeed, what I found most surprising after reading this paper is how few of these ideas disappeared: traps and file system search are the two that spring to my mind.
Thus the lesson of today’s paper: in 1965 the MULTICS team more or less laid down a model for how file systems were to work in virtually all modern operating systems. The subsequent work will provide us with greater insight into the details, but the basic shape of our file systems has not strayed far from this early vision.