Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport, Communications of the ACM, July 1978, pp. 558-565.
I had not originally intended to cover this paper, but as I started trying to describe Implementing Atomic Actions on Decentralized Data, I realized that I needed to include it in order to better explain that work. This is one of the papers for which Leslie Lamport was awarded the Turing Award. In it, he is wrestling with the effects of relativity in computer systems.
What does this mean? In essence, as soon as we have more than two distinct computer systems communicating with one another in a network, we must account for the fact that the order of events may now vary for each individual note. Indeed, as it turns out, this effect is not restricted to distributed systems. It can be seen in single computer systems with multiple processors as well (another area where Lamport was significantly involved) in the study of consistency models.
Networks tend to exacerbate these issues because the times involved often make it more apparent that there are issues. This will profoundly impact distributed systems (including distributed file systems, a notable and classic example of distributed systems) and we continue to wrestle with its manifestations 40 years after this paper.
The paper describes a system (Figure 1) in which three different components, labeled Process P, Process Q, and Process R, are sending messages between them. Individual messages are labeled and numbered in sequence and the time between them is shown. It is worth noting that the times are of both when the message was sent as well as when it was received.
This demonstrates that the observed order of events does not necessarily match what an external observer might see as the ordering. Thus, for example, Process R receives messages in a different order than they were sent by Process Q. This behavior might seem surprising initially, but is in fact a common occurrence. This could be due to queue types, intermediate network components, scheduling delays, etc.
He introduces a nomenclature of describing the “happened before” relationship, establishing an ordering of events within the system. He uses the → character to indicate this relationship. Thus, if a comes before b he writes “a → b“. Similarly it is transitive operation: if a → b and b → c then a → c. Finally he defines an operation as concurrent if there is no ordering between a and b. This applies to operations within a single process as well as between processes. He notes in this discussion that sending a message must have happened before receiving a message.
While this might sound simple, it helps us begin to reason about these events, since we can pay attention to the events that do have ordering constraints and ignore those that do not. In many cases, we won’t care about the order of two operations because the net effect is the same, regardless of the order. What we do want to do, however, is ensure that we understand ordering for those operations where it does matter.
Figure 2 then takes the same chart, but this time he adds the concept of a clock tick. The dashed lines represents the tick of a clock; the ticks occurs between events. He then defines the time ordering (“clock function”) is related to the → relationship. If a → b then C(a) < C(b). That is, the clock tick (time) is also well ordered. He observes then that we can extend this definition to cover the prior case, where we look at the ordering of operations within a single process, as well as across processes: recall that a → b is required for messages sent between processes, where a is the send event and b is the receive event and thus C(a) < C(b) in this case as well. He calls this the “Clock Condition”.
He then goes one step further: he flattens out the clock ticks, yielding Figure 3. Now our clock ticks are uniform time and our events are within one of the clock tick intervals. Since we separate dependent operations, we now have a model we can use to order events globally.
This was the point he was trying to make. “Being able to totally order the events can be very useful in implementing a distributed system.” This is a classic understatement and is the focus of much of distributed systems research and implementation to this day: how do you ensure that you have a definitive order of events. We will see this when looking at later work.
At this point he focuses on more rigorously defining his system. This allows him to note that this becomes a “distributed algorithm”. He describes the replicated state machine that is being executed by all the nodes within the distributed system. There is no central authority in this model forcing the ordering.
This is an important point not to miss: in Lamport’s distributed system there is no requirement for a centralized service. He does not insist on a total ordering of events in the system and settles for a partial ordering – one in which he doesn’t worry about the events that aren’t sensitive to their ordering. This is a powerful technique because it gives up total determinism; we gain performance through parallelism.
From this article we get the Lamport Clock which is not so much a clock as a monotonically increasing value that represents relative ordering. Note that a hardware clock will satisfy this invariant requirement as well.