Home » Posts tagged 'Distributed Systems'

Tag Archives: Distributed Systems

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 210 other subscribers
January 2025
S M T W T F S
 1234
567891011
12131415161718
19202122232425
262728293031  

Reflections on Teaching Distributed Systems

During January through April 2023 I taught CPSC 416 at UBC. It was the first time I had taught this course and it would not have been possible without the assistance I received from others, notably Ada Gavrilovska and Ivan Bestchastnikh both of whom allowed me to use their materials in creating my own course. Of course, I reordered things, and adapted them to fit the class.

Teaching a course for the first time is always an illuminating experience. I’ve designed and taught classes on systems topics, including elements of DCE/DFS, the distributed file system on which I worked back when I was a twenty-something software developer, classes on Windows driver development (device drivers, file systems, and file system filter drivers,) and Windows kernel debugging. I even explored utilizing online mechanisms for providing a non-linear educational approach to core OS concepts (processes, threads, scheduling, and synchronization) as part of my MSCS work at Georgia Tech. Thus, I have enjoyed engaging in education and looking for ways to do better for much of my life. I have always found insights each time I teach a new course. CPSC 416 was no exception to this.

I have never taken a distributed systems class. I learned about distributed systems organically. In my senior year at the University of Chicago I worked for the nascent Computer Science department as part of the facilities team and part of that work included networking. I remember soldering connectors together for Ethernet connections of the time – vastly different than the RJ-45 connections we use now, but the same technology we use today. After graduation I took a job at Stanford working with David Cheriton, who ran the “Distributed Systems Group.” The V operating system, which his group developed, is what I used on my desktop, and I built a number of network components as part of my work for him, including a network protocol (VMTP) which I implemented on a BSD 4.2 UNIX based system, diskless bootstrap drivers, and even an IP-multicast version of a multi-player Mazewar variant.

From Stanford I went to Transarc, a CMU-research inspired start-up company that had two very different product directions: one was an online transaction processing system, and the other was a commercialization of the AFS distributed file system. I also worked on the successor (the DCE/DFS project I mentioned earlier.)

Thus, my background in distributed systems was building distributed systems, often from the perspective of not knowing what I was doing but being surrounded by smart people that helped me figure it out. In some ways, that was my objective in teaching CPSC 416: paying that hard work forward.

One observation now: things that you learn in your twenties become “assumed knowledge” quite easily in your fifties. Thus, I just assumed that everyone knew about how databases maintain consistency in the face of failures. This turns out not to be true. So, one of my first lessons here was that I need to explain this up front in order for many of the things I say afterwards make sense. What is the point about replicating a log (what database people usually refer to as a journal) if you don’t understand that a log is the basic mechanism we use to restore consistency in a database. Lesson 1: teach people about databases and recoverability.

The second observation stems from my first oversight: building transactionally safe recoverable systems is hard. You’d think I’d know that, since I built a transactionally safe recoverable system back in the late 1980s and early 1990s as part of my work on Episode, the local physical file system that we used to support some of the nifty features of DCE/DFS. Episode, in some form, continues to be used in production today (file systems have unnaturally long lives if they get any serious adoption.) I would be quite surprised if the underlying transactional system were significantly different than it was thirty years ago when I worked on it. Lesson 2: teach people about building transactionally safe databases. Related to this is explaining key-value stores explicitly. They are a key part of the programming assignments and understanding why we use them helps. They typically form the basis of databases and file systems. Indeed, file systems are typically key-value stores with a name space built on top of them. The keys are limited (integers representing an entry in a potentially sparse table of objects) and the values are mutable (which complicates implementation and correctness).

The third observation is not an original one, as I have learned in conversations with other educators. There is a fundamental mis-alignment between the objectives of students (which is essentially grade maximization) and my objectives (which is “learn useful stuff that will help you throughout your career.”) This isn’t a big surprise – after all, I have published work in plagiarism reduction – but trying to find ways to fix it is challenging. I had not expected the insane amount of pressure students seem to feel to maximize their grades. I did try to mitigate this somewhat by offering extra credit opportunities, though in the end that seemed to create stress for many to exploit those. Why people who are getting grades in the 90%+ range are “afraid of failing” is beyond me. Lesson 3: extra credit creates more stress than it alleviates. I don’t think this is entirely the case but I’ll be more cautious about using it in the future. Still, I don’t want people to worry that they are going to fail so my thought is to provide an incentive to participate that mitigates the likelihood of them failing.

My fourth observation is that I had never read the various papers about distributed consensus side by side before. Doing so was an eye-opening experience. What I learned is that in many cases the complications in those papers relate to: (1) optimizations; and (2) recovery. Thus, next time I teach this class I want to spend more time walking through the baseline protocol and then pointing out optimizations and handling recovery. One example of this was when I had a student point out that the Paxos Made Moderately Complex paper (PMMC) states that during the leader election phase the voting party sends along a list of their accepted but not committed proposals (from the previous leadership). This is not part of the protocol. It is an optimization that makes recovery faster and more efficient, but you can’t rely upon it to maintain correctness. Now that I understand this point of confusions better, I think I can walk people through it and distinguish this. Doing so will help people understand the underlying protocol better and then the optimizations we use to ensure it works correctly. Lesson 4: walk through the papers more carefully, explaining the base protocol and then pointing out that the primary difference is in optimizations and recovery mechanisms.

My fifth observation is that students focus too much on code and not enough on understanding. Distributed systems is an area in which one must think through failure cases, identify how you will handle them, and what you assume is going to be true (your “invariants”) throughout your code base. I did introduce some tools for doing this (modeling and TLA+ specifically) but I did not incorporate them into the actual assignments. I did have them write reports, but those were post-hoc reports. I would like to try making the design cycle a more prominent portion of this work, encouraging people to think about what they are building rather than trying to hack their way through it. One piece of feedback from several students was that my advice to “walk away and think through the project” was quite helpful. I’d like to make the structure of the course make that happen more naturally. I also think that by having explicit design milestones it would reduce stress by encouraging students to work on the projects before the deadline. Lesson 5: design is more important than code, but code helps students verify their design reflects good understanding. The challenge will be in finding the right balance between the two.

I have other, smaller observations as well that I won’t break out but I’ll capture here:

  • Extensions don’t really help.
  • Providing a flexible late policy can be helpful, but it often creates quite a lot of stress.
  • Some students abhor teams, some like them. I need to find a way to accommodate both learning styles in a way that is equitable.
  • Do as much as possible to simplify grading exams.
  • Make a conscious effort after each lesson to create questions for the exam. I provided individualized exams (drawn from a pool of questions) and I wish I’d had more questions from which to draw. What was particularly nice was being able to provide people with their own personalized exam’s answers right at the end of the exam. It also helped me identify some issues.

I do think a number of steps that I took in this course worked well. People (generally) liked the failure examples, they liked the responsiveness, they liked the material. It is easy to focus on just the negatives, but I want to make sure and acknowledge the positives because it is important to preserve those elements.

Finally, I have agreed to teach this course again in the fall (which for UBC means “Winter Term 1”) so I will have an opportunity to incorporate what I have learned into the next course offering. I’m sure I’ll have more insights after that class.

Logic and Lattices for Distributed Programming

Logic and Lattices for Distributed Programming
Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein, and David Maier, in Symposium on Cloud Computing 2012 (SOCC ’12), October 14-17, 2012, San Jose, CA.

This is definitely a different direction than we’ve had in prior papers, though I do have an ulterior motive in presenting this particular paper – we will see it used later.

In recent years there has been interest in achieving application-level
consistency criteria without the latency and availability costs of
strongly consistent storage infrastructure. A standard technique is to
adopt a vocabulary of commutative operations; this avoids the risk
of inconsistency due to message reordering. Another approach was
recently captured by the CALM theorem, which proves that logically
monotonic programs are guaranteed to be eventually consistent. In
logic languages such as Bloom, CALM analysis can automatically
verify that programs achieve consistency without coordination.
In this paper we present BloomL, an extension to Bloom that
takes inspiration from both of these traditions. BloomL generalizes
Bloom to support lattices and extends the power of CALM analysis
to whole programs containing arbitrary lattices. We show how the
Bloom interpreter can be generalized to support ecient evaluation
of lattice-based code using well-known strategies from logic
programming. Finally, we use BloomL to develop several practical
distributed programs, including a key-value store similar to Amazon
Dynamo, and show how BloomL encourages the safe composition
of small, easy-to-analyze lattices into larger programs.

Notice they do mention key-value stores, so you have another hint on how I’ll be referring back to this work in a future post.

This tends more to the theoretical side of systems.  It is not a theory paper (there just isn’t enough formalism, let alone proofs!)   It has performance graphs, which you certainly expect from a systems paper, but not from a theory paper.

The driving factor behind this is the issue of distributed consistency.  At a high level, “distributed consistency” is concerned with ensuring that a group of communicating computers, with some temporal separation, agree on the outcome of operations even when things go wrong.  Perhaps the most famous example of distributed consistency is Paxos.  These days we refer to these as consensus protocols.  I generally describe there being several such: two-phase commit is certainly one of the older ones.  Quorum protocols are another (e.g., weighted voting, which I described previously).  Viewstamped Replication is another.  These days, the popular consensus protocols are
Raft and Blockchain.

Figure 8 (From Paper)

The paper starts by pointing out that monotonic consisency provides a valuable mechanism for reasoning about distributed consistency.  Prior work by the authors establishes that all monotonic programs are “invariant to message reordering and retry”, a property they call confluent. This matters for distributed systems because such a system only moves forward (the operations are durable.)

They point out some weaknesses in the prior definition and motivate improving it by explaining one such obvious case that does not fit within the model (a voting quorum in a distributed protocol.)

Hence, they introduce the lattice.  They do this within the context of their language (BloomL), which works on top of Ruby.  I will not dwell on the details.

The authors define a bounded semijoined lattice.  My reading of what they are saying is that in such a set, there is a unique element that happened first.  They define this formally as a set S, with an operator (“bottom” that I don’t seem to have in my font set) that defines a partial ordering.  There is a unique element ⊥ that represents the least element.

From this definition, they construct their model; the paper drops the “bounded semijoined” part of the definition and simply discusses lattices from that point forward, but it is this partial ordering property that imparts the key characteristics to their subsequent operations.

Why is this important?  Because it demonstrates that these lattices – which are going to turn out to be equivalent to key operations in distributed systems – have consistency guarantees that are desirable.

The authors then turn their attention to utilizing lattices for key-value stores.  They describe data structure versioning and vector clocks.  Vector clocks have a property they desire for lattices: they are partially ordered.  They combine this with a quorum voting protocol, to provide the distributed consensus for their system.

Figure 9 (from paper)
Figure 9 (from paper)

Figure 8 (from the paper) shows the general structure of their key-value store implementation, which is implemented in BloomL and Ruby.  Their sample usage for this is a shopping cart, which they graphically describe in Figure 9 (from the paper).

As one would expect in a distributed system, the key benefit here is that there is no centralized authority deciding on the order of things.  They point out that prior work argues shopping carts are non-monotonic and thus cannot be solved in a distributed systems setting.  The authors point out that using the lattice structure, they achieve a monotonic ordering, which permits them to implement it without a centralized decision maker; in fact the decision maker in this case is really the client itself, as it has all the information from all the servers sufficient to complete the operation.

While a shopping cart might not be the killer application for a distributed systems technology, this paper does describe a powerful tool for providing distributed consensus in a system that can be implemented in a modest amount of code; compared to Paxos, Raft, or Viewstamped Replication, that is a significant contribution.

It does not appear to have byzantine protection, however, so if you live in a hostile environment it might not be the right protocol.  Similarly, if you need stronger consistency guarantees, this might not be the best model either.  But for many applications slightly relaxed consistency guarantees are often more than adequate.

We will see how this can be applied in the future.