A General-Purpose File System For Secondary Storage
R. C. Daley and P.G. Neumann
Published in the Proceedings of the American Federation of Information Processing Societies 1965, Fall Joint Computer Conference, vol. 27, pp. 213-229.
This is the seminal paper discussing how file systems were envisioned within the MULTICS operating system. While you can still run MULTICS, it is a curiosity at this point. However, virtually all operating systems we now use today descended from MULTICS and thus, its design profoundly influenced their development.
This paper is a delightfully easy read, written at the dawn of this new field of multi-programming. Prior to this time computers were essentially single user. The introduction of the idea of sharing a computer with other users was nascent. Thus, the experts working in the field at the time had to begin thinking about things like organization, security, and sharing.
Indeed, a common tension in operating systems literature in general is between isolation and sharing. Isolation is great from a security perspective, but is inefficient. Each user of the system often uses the same programs, for example, but we do not want to keep a distinct copy of the same thing for each user as that would be wasteful. This profoundly impacts the file systems work because the file system is the point of persistence, the level at which shared resources become manifest.
But I’m jumping ahead at this point. Let’s start with the simple question: What is a file system? As we will find during this journey, its meaning and usage is far richer than one might think upon first reading. While this paper is not the first paper to discuss storage and file systems, it is a good example of the state of the art in 1965.
This paper offers us some useful definitions:
A file is simply an ordered sequence of elements, where an element could be a machine word, a character, or a bit, depending upon the implementation.
That seems to be a rather broad definition, but it is a reasonable place for us to start. This does not impose structure on the content itself, which proves to be one reason why this abstraction ultimately turns out to be a powerful one. “At the level of the file system, a file is formatless.”
This paper also establishes the name space abstractions as well:
As far as a particular user is concerned, a file has one name, and that name is symbolic. (Symbolic names may be arbitrarily long, and may have syntax of their own. For example, they may consist of several parts, some of which are relevant to the nature of the file, e.g., ALPHA FAP DEBUG.)
This again paints a rather broad abstraction. The name has meaning to the user, but otherwise is just symbolic data for the file system. The paper goes on to define the now classic name space specific abstraction:
A directory is a special file which is maintained by the file system, and which contains a list of entries. To a user, an entry appears to be a file and is accessed in terms of its symbolic entry name, which is the user’s file name. An entry name need be unique only within the directory in which it occurs.
Thus, this paper lays out the quintessential aspect of modern file system name spaces: they are hierarchically organized. The paper describes this in greater detail and refers to links and branches.
The authors describe how users might work in different parts of the hierarchical name space. They observe that this then creates a situation in which sharing of files might be an issue, and thus they introduce the concept of links to resolve this.
A link in this context is an entry in a directory that refers to an existing file within the file system. Thus, we see the genesis of links, though the paper does not clearly delineate symbolic links from hard links. This does help motivate why these features show up in UNIX a number of years later, however.
The paper goes into greater detail about how they envision this hierarchical name space functioning, including traversal, working directory, and links.
From there the authors then turn their attention to a problem inherent in having a single shared file system name space with data contents belonging to different users, namely managing access to the individual files. Thus, they introduce access controls. They note that a file system could default to either permissive or restrictive access within this model. From this point they incorporate the access control list, the access mode for a given file, and the concept of access attributes. Of the five attributes listed, four of them are familiar: read, execute, write, and append access have analogs in modern file systems. The fifth, trap is interesting, in that it defines an explicit exception mechanism for access control that requires external validation for access – an interesting generality that is not typically present in modern file systems.
They also describe file sharing, introducing the concept of explicit operations to lock access to a given file, or to unlock the file. They suggest that a locked file would require the user provide a designated key to permit accessing the file; I have seen this approach in some file systems, actually, though it is fairly uncommon.
The paper describes access control at length with no real surprises otherwise, other than perhaps the fact that many of these features disappeared from later operating systems, only to be resurrected and added many years later.
There is quite a bit in the paper about backup and restore processing. Much of the detail here is interesting historically but does not really add much to my exploration of file systems. If you are looking for more information about the history of backup, I do encourage you to read those sections. Having done magnetic tape backups in the past, I’m content with leaving them be.
There is one observation I will point out from this section, however. The authors actually discuss a recurring theme in file systems – the fact that storage itself ends up being a multi-level media management challenge.
In most cases a user does not need to know how or where a file is stored by the file system. A user’s primary concern is that the file be readily available to him when he needs it. In general, only the file system knows on which device a file resides.
The file system is designed to accommodate any configuration of secondary storage devices. These devices may cover a wide range of speeds and capacities. All considerations of speed and efficiency of storage devices are left to the file system. Thus all user programs and all other system programs are independent of the particular configuration of secondary storage.
They go on to describe migration of data from hot to cold storage as needed and point out these are functions of the file system. I found this an interesting insight since even today we routinely deal with these sorts of situations, such as the Strata paper from SOSP 2017 (“Strata: a Cross Media File System“).
The remainder of the paper focuses on how the file system is implemented in MULTICS
I must admit, I found this section of the paper both detailed and yet fascinating because of the broad sweeping nature in which the authors lay down fundamental ideas that we see in modern operating systems. Some of it is not really in the scope of file systems (“segment management” which appears to equate to shared executables, such as programs and shared libraries in modern systems,) the concept of demand segment loading (which presages demand paging,) the concept of file system search, mechanisms for managing file systems, memory recycling (reminiscent of page reclamation in modern virtual memory systems based upon its description,) device management, and I/O queueing. They finish up by describing their “multilevel storage management module”, backup system, and utility functions. The latter has (now) amusing functions like “file to cards” and “tape to file”.
So these give us asynchronous operation, paging, backup, hierarchical storage management, security, file sharing, and directory sharing. Most of these concepts survive to this day. Indeed, what I found most surprising after reading this paper is how few of these ideas disappeared: traps and file system search are the two that spring to my mind.
Thus the lesson of today’s paper: in 1965 the MULTICS team more or less laid down a model for how file systems were to work in virtually all modern operating systems. The subsequent work will provide us with greater insight into the details, but the basic shape of our file systems has not strayed far from this early vision.