Home » File Systems » As We May Think

As We May Think

As We May Think
Vannevar Bush, The Atlantic, July 1945.

I saw this covered by Adrian Colyer recently and unabashedly decided I needed to cover it as well, not because I thought there was anything wrong with his coverage but rather because this speaks well to my own research interests.  As Colyer points out the concept of trails is something that seems to have been lost.

Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose. If the aggregate time spent in writing scholarly works and in reading them could be evaluated, the ratio between these amounts of time might well be startling. Those who conscientiously attempt to keep abreast of current thought, even in restricted fields, by close and continuous reading might well shy away from an examination calculated to show how much of the previous month’s efforts could be produced on call. Mendel’s concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential.

Bottom line? We can’t find things. I recently described the Polyvalues paper from SOSP 1979. I commented on the fact that there seemed to be useful insights here that just disappeared.

The real heart of the matter of selection, however, goes deeper than a lag in the adoption of mechanisms by libraries, or a lack of development of devices for their use. Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

This was one of those moments when I realized how deeply embedded our concept of hierarchical organization really is.  It isn’t embedded in the operating systems of the 1960s.  It was inherited from the more fundamental constraints of paper indexing.  Indeed, since reading this article it has given me further insight into how deeply entrenched we are with hierarchical organization.

The first idea, however, to be drawn from the analogy concerns selection. Selection by association, rather than indexing, may yet be mechanized. 

“The analogy” here relates to the associative mechanisms inherent in how humans recall information, which is described in the prior paragraph.  The author describes “trails”.  This evocative idea is similar to the modern field of data provenance.  In data provenance, the focus is often on reproducibility, not on finding things, yet there are intriguing similarities.  I won’t explore this area further yet, but it seems to be intriguing.  Perhaps it will open up some new perspectives to explore.

All this is conventional, except for the projection forward of present-day mechanisms and gadgetry. It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing.

The memex is his hypothetical device for capturing and finding this information.  At this point the author describes how to build such a system (more or less) using the technology of the day.  The terminology is interesting, yet also quite telling that a 73 year old paper could describe modern systems so well.

At some point reading through this it occurred to me that in some ways we have built a system similar to what he describes: the internet.  What it doesn’t do is construct the personalized model of inter-relationships between the various components of the system.

Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.

Doesn’t this sound like a web browser?

For me, the key take-away here is encouraging: my own goal of looking for alternatives to hierarchical file systems is not as crazy as it might first seem.  It certainly does not mimic the way in which we tend to organize data, though I have also had people point out to me that nothing prevents us from constructing a higher level layer that can be used to achieve the same goal.  Indeed, that has been done before and I will get to that at a later point.