Home » Operating Systems
Category Archives: Operating Systems
The UNIX Time-Sharing Operating System
Dennis M. Ritchie and Ken Thompson, Bell Labs, Communications of the ACM, July 1974, Volume 17, Number 7, pp. 365-375.
This paper describes Version 3 of UNIX. It was Version 6 that became the basis of the Berkeley Software Distribution (BSD) version of UNIX. The only other operating system in CS history to date that has had so much impact on operating systems development is MULTICS, and UNIX is a direct descendant. Developed by several Bell Labs researchers that had been involved in MULTICS, its goal was to try and build a smaller operating system that retained what they viewed as the key benefits of MULTICS.
Much of UNIX Version 3 was written in the C programming language, itself derived from BCPL, a language that had been used on the MULTICS project.
“The most important job of UNIX is to provide a file system.“
These words leave little doubt about the role of file systems in UNIX and the importance assigned to them. The paper then goes on to describe files, in similar terms to MULTICS: files, directories, and special files (devices). We see the hierarchical file system of MULTICS reflected back in the description of the system. They talk about file names being 14 characters or less, the formation of paths, and the iterative walk of names through the file system name space to find other directories, as well as the terminal file nodes.
“The directory structure is constrained to have the form of a rooted tree.“
This is what I am looking for – the why of hierarchical file systems. I found the answer here, unsurprising yet ironically disappointing:
“The reason for this is to simplify the writing of programs which visit subtrees of the directory structure, and more important, to avoid the separation of portions of the hierarchy.“
Not surprising, yet not precisely what I had expected. I had expected the reason to be for the simplicity of the operating system (though they do allude to this by discussing the difficulty of knowing when it is safe to delete a directory. They describe links to files, however, so their file system is not really a tree. Rather, it is more like a directed acyclic graph (DAG). Files do not have pointers back to their directories, but directories have pointers back to their parent. Thus we have the distinction. The namespace is a DAG. Files don’t really live in the name space directly, they are referenced from the namespace, but have a reference count.
Oddly, with that, I found what I came for, at least in terms of insight for my own research area. There is a certain satisfaction in being able to point to this seminal document and say “this is why we got to where we are now.”
But if I stopped at this point, I would be leaving out the bits I had not expected to find.
First, the mundane: they discuss removable file systems, the fact that this is in fact a collection of name spaces, combining persistent name spaces with one another using a non-persistent mechanism (mounting), There is a simple description of how the file system is itself implemented. They describe the i-number (now the inode number) as an index into the file table. Thus, a directory entry is where the name lives, it merely refers to the file using its i-number. These entries are then called i-nodes. The i-node contains information about the owner of the file, the protection bits governing the file, the location information for where logical data is physically stored on the medium, the size of the file, its timestamps, it’s attribute bits, and the number of directory entries referencing the given i-node.
Surprisingly, not all that different than it is now, 45 years later. Implementation details have changed, as we no longer limit files to 10MB in size.
They describe bufering, they describe sector sized I/O and how it is more efficient for a program to do sector-sized I/O operations.
Much of the paper has nothing to do with file systems. I leave that to the interested reader to explore beyond that.
There are two interesting tid-bits remaining:
- They lost data once, on a hard disk that failed. The backup was 3 days old.
- They considered the permuted index application as one of the “major programs available”.
The fact they considered the permuted index important at this early stage was an interesting insight to me. Clearly, the ability to “find our stuff” is one that’s been around since the dawn of time. Maybe this research direction of mine does make sense.
Feiertag, Richard J., and Elliott I. Organick. “The Multics input/output system.” In Proceedings of the third ACM symposium on Operating systems principles, pp. 35-41. ACM, 1971.
In this paper we return to MULTICS but this time the focus is on the I/O system. Much of the content is not directly related to the file system, but it certainly does touch on familiar concepts including the file system.
By this time MULTICS had been released and Bell Labs had dropped out of the project (to build the “castrated” version of MULTICS that became UNICS and ultimately UNIX.) Thus, this is more a reflection of the system that has been built (versus the earlier paper, where it is a prospective description of the system).
So what is this I/O System? The description is of a system with a flexible interface, from a highly general (and easy-to-use) interface, to a highly specific (and challenging-to-use) interface to devices. The idea seems to be that devices have a common interface – the general level interface – but may have specific features that require device-specific knowledge to exploit – the specialized interface.
There are some interesting insights in this paper, again, that reflect the way we do things now rather closely. For example, the MULTICS I/O Model puts the file systems as a separate service, but then shows they interact with an I/O system component. Thus, applications can be seen to directly interact with the file system, via its own API, or they can do so through the more general Device I/O Module (DIM).
This abstraction of devices is one we see in many subsequent operating systems (and all modern ones). File systems gained their idiosyncratic behavior model fairly early, then, where they play closely, but not subserviently, to the I/O subsystem.
Indeed, much of this paper relates to how I/O from files may be redirected for other purposes, such as for the input to an application instead of a more common interface device, such as a teletype (tty) machine. Of course, MULTICS had already indicated that files were treated logically as memory and their contents could be pulled in “as needed” (demand paging).
This paper also describes the interface to the file system, albeit in general terms. It describes the idea that logical data flows (streams as they call it) can be separated from the actual data storage mechanism used beneath them. It is this mechanism that permits them to redirect the tty.
This paper thus introduces some surprising concepts around I/O: a semi-abstract interface, the idea that file systems were a consumer of that interface (and in theory a provider of the interface). The ability to extend the OS for new device types because of this generalized interface. After reading it, I can see its influence on both Linux (via UNIX) and Windows (presumably via RSX-11M).
A General-Purpose File System For Secondary Storage
R. C. Daley and P.G. Neumann
Published in the Proceedings of the American Federation of Information Processing Societies 1965, Fall Joint Computer Conference, vol. 27, pp. 213-229.
This is the seminal paper discussing how file systems were envisioned within the MULTICS operating system. While you can still run MULTICS, it is a curiosity at this point. However, virtually all operating systems we now use today descended from MULTICS and thus, its design profoundly influenced their development.
This paper is a delightfully easy read, written at the dawn of this new field of multi-programming. Prior to this time computers were essentially single user. The introduction of the idea of sharing a computer with other users was nascent. Thus, the experts working in the field at the time had to begin thinking about things like organization, security, and sharing.
Indeed, a common tension in operating systems literature in general is between isolation and sharing. Isolation is great from a security perspective, but is inefficient. Each user of the system often uses the same programs, for example, but we do not want to keep a distinct copy of the same thing for each user as that would be wasteful. This profoundly impacts the file systems work because the file system is the point of persistence, the level at which shared resources become manifest.
But I’m jumping ahead at this point. Let’s start with the simple question: What is a file system? As we will find during this journey, its meaning and usage is far richer than one might think upon first reading. While this paper is not the first paper to discuss storage and file systems, it is a good example of the state of the art in 1965.
This paper offers us some useful definitions:
A file is simply an ordered sequence of elements, where an element could be a machine word, a character, or a bit, depending upon the implementation.
That seems to be a rather broad definition, but it is a reasonable place for us to start. This does not impose structure on the content itself, which proves to be one reason why this abstraction ultimately turns out to be a powerful one. “At the level of the file system, a file is formatless.”
This paper also establishes the name space abstractions as well:
As far as a particular user is concerned, a file has one name, and that name is symbolic. (Symbolic names may be arbitrarily long, and may have syntax of their own. For example, they may consist of several parts, some of which are relevant to the nature of the file, e.g., ALPHA FAP DEBUG.)
This again paints a rather broad abstraction. The name has meaning to the user, but otherwise is just symbolic data for the file system. The paper goes on to define the now classic name space specific abstraction:
A directory is a special file which is maintained by the file system, and which contains a list of entries. To a user, an entry appears to be a file and is accessed in terms of its symbolic entry name, which is the user’s file name. An entry name need be unique only within the directory in which it occurs.
Thus, this paper lays out the quintessential aspect of modern file system name spaces: they are hierarchically organized. The paper describes this in greater detail and refers to links and branches.
The authors describe how users might work in different parts of the hierarchical name space. They observe that this then creates a situation in which sharing of files might be an issue, and thus they introduce the concept of links to resolve this.
A link in this context is an entry in a directory that refers to an existing file within the file system. Thus, we see the genesis of links, though the paper does not clearly delineate symbolic links from hard links. This does help motivate why these features show up in UNIX a number of years later, however.
The paper goes into greater detail about how they envision this hierarchical name space functioning, including traversal, working directory, and links.
From there the authors then turn their attention to a problem inherent in having a single shared file system name space with data contents belonging to different users, namely managing access to the individual files. Thus, they introduce access controls. They note that a file system could default to either permissive or restrictive access within this model. From this point they incorporate the access control list, the access mode for a given file, and the concept of access attributes. Of the five attributes listed, four of them are familiar: read, execute, write, and append access have analogs in modern file systems. The fifth, trap is interesting, in that it defines an explicit exception mechanism for access control that requires external validation for access – an interesting generality that is not typically present in modern file systems.
They also describe file sharing, introducing the concept of explicit operations to lock access to a given file, or to unlock the file. They suggest that a locked file would require the user provide a designated key to permit accessing the file; I have seen this approach in some file systems, actually, though it is fairly uncommon.
The paper describes access control at length with no real surprises otherwise, other than perhaps the fact that many of these features disappeared from later operating systems, only to be resurrected and added many years later.
There is quite a bit in the paper about backup and restore processing. Much of the detail here is interesting historically but does not really add much to my exploration of file systems. If you are looking for more information about the history of backup, I do encourage you to read those sections. Having done magnetic tape backups in the past, I’m content with leaving them be.
There is one observation I will point out from this section, however. The authors actually discuss a recurring theme in file systems – the fact that storage itself ends up being a multi-level media management challenge.
In most cases a user does not need to know how or where a file is stored by the file system. A user’s primary concern is that the file be readily available to him when he needs it. In general, only the file system knows on which device a file resides.
The file system is designed to accommodate any configuration of secondary storage devices. These devices may cover a wide range of speeds and capacities. All considerations of speed and efficiency of storage devices are left to the file system. Thus all user programs and all other system programs are independent of the particular configuration of secondary storage.
They go on to describe migration of data from hot to cold storage as needed and point out these are functions of the file system. I found this an interesting insight since even today we routinely deal with these sorts of situations, such as the Strata paper from SOSP 2017 (“Strata: a Cross Media File System“).
The remainder of the paper focuses on how the file system is implemented in MULTICS
I must admit, I found this section of the paper both detailed and yet fascinating because of the broad sweeping nature in which the authors lay down fundamental ideas that we see in modern operating systems. Some of it is not really in the scope of file systems (“segment management” which appears to equate to shared executables, such as programs and shared libraries in modern systems,) the concept of demand segment loading (which presages demand paging,) the concept of file system search, mechanisms for managing file systems, memory recycling (reminiscent of page reclamation in modern virtual memory systems based upon its description,) device management, and I/O queueing. They finish up by describing their “multilevel storage management module”, backup system, and utility functions. The latter has (now) amusing functions like “file to cards” and “tape to file”.
So these give us asynchronous operation, paging, backup, hierarchical storage management, security, file sharing, and directory sharing. Most of these concepts survive to this day. Indeed, what I found most surprising after reading this paper is how few of these ideas disappeared: traps and file system search are the two that spring to my mind.
Thus the lesson of today’s paper: in 1965 the MULTICS team more or less laid down a model for how file systems were to work in virtually all modern operating systems. The subsequent work will provide us with greater insight into the details, but the basic shape of our file systems has not strayed far from this early vision.
US Patent 9,830,329 issued November 28, 2017.
It is about a technique for utilizing a pre-existing tunneling mechanism to invoke remote functionality. The original inspiration for this was the need to communicate from a component on a client system to utilize functionality available on the server.
The first case for this was when I wanted to obtain meta-data that was maintained on the server, but not available to me on the client. By exploiting a pass-through mechanism in the Windows SMB client, I could send a query to the server, have a component on the server get the answer, and respond back to my software on the client. This turns out to be useful in several cases – and doing it through the existing SMB infrastructure allowed a clean, functional solution.
I’ve fallen into the habit of answering some questions on Quora and those which involve answering questions about Windows file systems seem to garner the most interest and the most disagreement.
Yesterday I answered a question entitled Why can’t I save a folder name “con” in Windows? I decided to respond to this because the other answers that I read said that Windows reserved this name. My response demonstrated how you could easily do this with bash running in WSL. Some comments seemed to think that using bash was cheating, so I sat down and wrote a simple program to create folders with reserved names. It isn’t bound to WSL and executes as a standard console program (so it is executed from the command line). I’ve published the source code on github.
It’s been a few months since I’ve posted anything about Windows file systems and I realized that I should start doing so again. So, that little application program inspired me.
The difficulty that I have with assertions that “Windows doesn’t do that” or “Windows doesn’t allow that” is that they often conflate the limitations of applications, libraries, or subsystems as being limitations of Windows. Windows NT was developed to be a general purpose host operating system. The expectation was that it would have subsystems which implemented specific execution environments. While this includes Win32 (the subsystem most often used by applications) it has never been the only subsystem available. Windows NT 3.1 shipped with Win32, NTVDM, OS/2, POSIX, and native subsystems (though native is more “not bound to a subsystem”). I had also heard rumors of a UNIX System V Release 4 subsystem as well, but to the best of my knowledge it was never released and I never saw it.
Window dropped support for OS/2 in Windows XP. The POSIX implementation changed as it became more like a library that called Win32 and native as needed. What remained of POSIX was dropped in Windows 8.1. The Windows Subsystem for Linux (WSL) has been added in Windows 10. It is not based upon the old POSIX subsystem, but it certainly exploits functionality that Windows includes because of the POSIX subsystem’s requirement.
Win32 does not have a concept of fork, but Windows includes all the support necessary to create a duplicate address space, either via NtCreateProcess or NtCreateUserProcess (you can see an example of the latter, including protected processes here). This is used to implement this functionality.
NTFS has always supported the full range of POSIX compliant names. The downside to using this is that it means Win32 application (like Explorer) can have problems accessing them. That’s not a limitation of Windows it is a limitation of the application.
I’ve heard several people tell me that “Windows” must include everything installed, not just the operating system. I disagree with this characterization, but I’ll address the why of that in a future post.
Bottom line: Windows itself does not have restricted names based upon MS-DOS. Win32 applications usually do, however. I would also note that Explorer cannot handle long path names – somewhere within the code is a hard coded buffer that is smaller than the OS level maximum path length (which happens to be 32767 16-bit UNICODE characters). NTFS limits each name component (directory or file) to 256 bytes, but ReFS limits the component name size to 32767 (and this has inspired me in another area I’ll talk about at some point as well). Other file systems may have their own limits, too.
That’s enough for now. I’ll find more interesting things to discuss when it comes to Windows file systems, though.