The UNIX Time-Sharing Operating System
Dennis M. Ritchie and Ken Thompson, Bell Labs, Communications of the ACM, July 1974, Volume 17, Number 7, pp. 365-375.
This paper describes Version 3 of UNIX. It was Version 6 that became the basis of the Berkeley Software Distribution (BSD) version of UNIX. The only other operating system in CS history to date that has had so much impact on operating systems development is MULTICS, and UNIX is a direct descendant. Developed by several Bell Labs researchers that had been involved in MULTICS, its goal was to try and build a smaller operating system that retained what they viewed as the key benefits of MULTICS.
Much of UNIX Version 3 was written in the C programming language, itself derived from BCPL, a language that had been used on the MULTICS project.
“The most important job of UNIX is to provide a file system.“
These words leave little doubt about the role of file systems in UNIX and the importance assigned to them. The paper then goes on to describe files, in similar terms to MULTICS: files, directories, and special files (devices). We see the hierarchical file system of MULTICS reflected back in the description of the system. They talk about file names being 14 characters or less, the formation of paths, and the iterative walk of names through the file system name space to find other directories, as well as the terminal file nodes.
“The directory structure is constrained to have the form of a rooted tree.“
This is what I am looking for – the why of hierarchical file systems. I found the answer here, unsurprising yet ironically disappointing:
“The reason for this is to simplify the writing of programs which visit subtrees of the directory structure, and more important, to avoid the separation of portions of the hierarchy.“
Not surprising, yet not precisely what I had expected. I had expected the reason to be for the simplicity of the operating system (though they do allude to this by discussing the difficulty of knowing when it is safe to delete a directory. They describe links to files, however, so their file system is not really a tree. Rather, it is more like a directed acyclic graph (DAG). Files do not have pointers back to their directories, but directories have pointers back to their parent. Thus we have the distinction. The namespace is a DAG. Files don’t really live in the name space directly, they are referenced from the namespace, but have a reference count.
Oddly, with that, I found what I came for, at least in terms of insight for my own research area. There is a certain satisfaction in being able to point to this seminal document and say “this is why we got to where we are now.”
But if I stopped at this point, I would be leaving out the bits I had not expected to find.
First, the mundane: they discuss removable file systems, the fact that this is in fact a collection of name spaces, combining persistent name spaces with one another using a non-persistent mechanism (mounting), There is a simple description of how the file system is itself implemented. They describe the i-number (now the inode number) as an index into the file table. Thus, a directory entry is where the name lives, it merely refers to the file using its i-number. These entries are then called i-nodes. The i-node contains information about the owner of the file, the protection bits governing the file, the location information for where logical data is physically stored on the medium, the size of the file, its timestamps, it’s attribute bits, and the number of directory entries referencing the given i-node.
Surprisingly, not all that different than it is now, 45 years later. Implementation details have changed, as we no longer limit files to 10MB in size.
They describe bufering, they describe sector sized I/O and how it is more efficient for a program to do sector-sized I/O operations.
Much of the paper has nothing to do with file systems. I leave that to the interested reader to explore beyond that.
There are two interesting tid-bits remaining:
- They lost data once, on a hard disk that failed. The backup was 3 days old.
- They considered the permuted index application as one of the “major programs available”.
The fact they considered the permuted index important at this early stage was an interesting insight to me. Clearly, the ability to “find our stuff” is one that’s been around since the dawn of time. Maybe this research direction of mine does make sense.