The crucial clue that allowed Holmes to solve the Silver Blaze mystery was the dog that didn’t bark: the apparent murderer and horse thief was not a stranger, but the trainer himself, the only one who could have gotten into the stables unnoticed. As I’ve mentioned, this semester Dan Licata and I are co-teaching a new course in functional programming for first-year students at Carnegie Mellon. Apart from parallelism, wbich I’ve already discussed, we are placing a strong emphasis on verification and proof as tools for the practicing programmer, chief among these being the infamous hellhounds of Computer Science, induction and recursion. And I am happy to say that, as in the Holmes story, these are the dogs that didn’t bark. The results from homework and an in-class exam make clear that the vast majority of the class of about 80 students have mastered these crucial techniques with a minimum of fuss and bother. In this post I’d like to explain why I think we’ve succeeded where, apparently, so many others have failed.

One of the pleasures of teaching at Carnegie Mellon is that we the students are bright (among the brightest in the country according to the usual metrics), open-minded (they’re game for whatever I wish to throw at them), and hard-working (Carnegie’s motto is “My heart is in the work.”). I am sure that this is a big part of the story. Most probably our students could master almost anything, no matter how poorly we explain it! But it can’t be just that either. My colleagues at other top schools, both research universities and teaching colleges, public and private, American and international, have often spoken of their frustration and despair at teaching programming using recursion, let alone proof by induction, to beginning (or even advanced) students. Some even go so far as to say that it can’t be done, that while the brightest students “get it”, the less bright students never will, so we should just give up even trying. I’m not convinced. I can’t prove that I’m right, but I do have quite a few years experience at pulling this off, so it is at least conceivable that the problem lies not with the students, but with the teaching methods. So let me offer mine as an example, and perhaps others will be able to use them as well.

Let’s start with the punch line, just to pique your interest (or ire): the answer is type theory. The great idea of constructive type theory is that there is no distinction between programs and proofs: the code embodies the reasoning the justifies it, and the reasoning is just a form of code. This is the *propositions as types principle* (sometimes grandly called the “Curry-Howard isomorphism”, the small problem being that neither Curry nor Howard invented it, nor is it an isomorphism). Theorems are types (specifications), and proofs are programs. Programs/proofs are executed by evaluating them by applying simplification rules to reduce them to canonical form. It’s just generalized high school algebra, in which you plug in values for variables and simplify using equational reasoning, but scaled up to a rich collection of types and programs acting on them. In other words type theory is a *language-based model of computation*, which supports reasoning directly about the code you write, not its translation or interpretation in some other terms.

The significance of a language-based model for teaching cannot be overestimated. One of the classic pitfalls of teaching recursion is to try to explain it by referring to some mysterious entity called a “stack” that is *absolutely invisible* in the program you write. There’s a long song and dance about “frames” and “procedure calls” and “pushing” and “popping” things on this “stack” of which the professor speaks, and no one understands a damn thing. It all sounds very complicated, and impressive in a way, but if there’s one thing people learn, it’s to stay the hell away from it! And in doing so, students are robbed of one of the most powerful and useful programming tools available to them, the key to writing clean, reliable code (and, as I’ve previously argued, for parallelism as well). The alternative is to use Plotkin’s *structural operational semantics* to define the behavior of *all* language features, including recursion. To pull it off, you have to have a semantics for the language you’re using (something that’s in short supply for commercial languages), and you have to be facile in using it yourself.

What makes structural operational semantics so useful is that the execution model is *directly* defined by an *inductive definition* of a transition relation, , on programs. (I don’t have the space to explain it fully here, but you may consult *Practical Foundations for Programming Languages *for a full account.) The crucial idea is to use *inference rules* to give a structural definition of evaluation for programs themselves, not a translation into some underlying abstract machine code. For example, one rule for evaluation of addition expressions is

,

which states that one step of evaluation of the sum consists of a step of evaluation of to obtain , with the result being . Another rule states that

,

where $m$ is a number, and a third states that , where $p$ is the sum of $m$ and $n$. With these (and similar) rules in place, one may give an execution trace in the form

,

which makes explicit the steps of evaluation. Each step is justified by a “proof tree” consisting of a composition of rules. Importantly, the size of the expressions correlates directly with the space required by an implementation, allowing for a clean and clear explanation of space issues such as *tail recursion*.

This is a good start for explaining the semantics of a recursive function, but the real beauty of type theory emerges when teaching how to reason about a recursively defined function. The lesson of type theory is that *induction and recursion are the same thing*: an inductive proof *is* a recursive function, and a recursive function *is* an inductive proof. This means that *there is really only one concept here*, and not two, and that the two go hand-in-hand. It’s not a matter of “eat your spinach, it’s good for you”, but rather of a conceptual unification of proving and programming. For our students thinking inductively and programming recursively go together naturally, the dynamic duo of functional programming.

The starting point is, once again, an inductive definition. For specificity and ease of exposition, let me give an inductive definition of the natural numbers:

The definition uses the *exact same* rule format that we used to define the semantics. Inductive definitions given by a set of rules become second nature. Moreover, they transcribe almost literally into ML:

datatype nat = Zero | Succ of nat

So here we have the primordial example of an inductively defined type. Many data structures, such as various forms of trees, fit naturally into the same format, a fact that we use repeatedly throughout the semester.

Defining a function on an inductive type consists of giving a series of *clauses*, one for each rule defining the elements of that type:

fun plus x y = case x of Zero => y | Succ x’ => Succ (plus x’ y)

Whenever we have an inductive type, we use the same format each time for functions defined over it. Crucially, we allow ourselves “recursive calls” corresponding exactly to the premises of the rule governing a particular clause. There is no need for any fuss about “pushing” things, nor any discussion of “sizes” of things, just a pattern of programming that is guided directly by the definition of the type on which we are working.

And now comes the best part. If we want to *prove* something about a recursively defined function, we write a proof that has *exactly the same form, *one case for each rule in the inductive definition. And, just as we get to make a “recursive call” for each premise of each rule, so we also get to make a “recursive call” to the proof for each premise, that is, an appeal to the *inductive hypothesis*. The proof is, of course, a program that constructs a proof for every element of an inductive type by composing the inductive steps around a base case.

Importantly, the students “get it”. On the midterm we ask them to prove (in an in-class setting) a small theorem about a piece of code, or to write a piece of code and prove a small theorem about it, and they are able to do this quite reliably. In other words, it works!

Perhaps the most important thing, though, is to convey the sense of *inevitability* and *naturality* about induction and recursion. Don’t talk about proof as a separate topic! Don’t operationalize recursion beyond the bounds of the language model. Simply do the natural thing, and it will all seem natural and obvious … and fun!

Moreover, it scales. Our high-water mark prior to the midterm is to develop and debug a continuation-based regular expression matcher that has a subtle bug in it. We state carefully the specification of the matcher, and use an inductive proof attempt to reveal the error. The proof does not go through! And the failure makes clear that there is a bug that can be fixed in a number of ways, repairing the proof along with the code. I will detail this another time, but the impatient reader may wish to consult my *JFP* paper entitled “Proof-directed debugging” for an earlier version of what we taught this semester.

Let me close with two remarks.

Matthias Felleisen at Northeastern has been advocating essentially this methodology for decades; his work has strongly influenced mine. His results are nothing short of *spectacular*, and with a much broader range of students than I ordinarily face here at Carnegie Mellon. So, no excuses: read his *How to Design Programs*, and follow it!

Finally, this article explains why Haskell is not suitable for my purposes: the principle of induction is *never* valid for a Haskell program! The problem is that Haskell is a *lazy language*, rather than a *language with laziness*. It forces on you the existence of “undefined” values of every type, invalidating the bedrock principle of proof by induction. Put in other terms suggested to me by John Launchbury, Haskell has a *paucity of types*, lacking even the bog standard type of natural numbers. Imagine!

(And please note the analogy with *dynamic languages*, as opposed to *languages with dynamic types. *It’s the same point, albeit in another guise, namely the paucity of types.)

I’m shocked and dismayed that CMU CS majors apparently have such a hard time understanding the stack model of execution. How do they get through the imperative programming part of the curriculum?

My point is that they

don’thave trouble, because of the way we teach them, but I have heard many, many reports of difficulties with this. And I observe that one of the reasons is that the abstraction is not taught, rather a clumsy account of its implementation in terms of concepts that are not present in the language itself.In response to “There’s a long song and dance about ‘frames’ and ‘procedure calls’ and ‘pushing’ and ‘popping’ things on this ‘stack’ of which the professor speaks, and no one understands a damn thing,” it’s worth noting that the exercises we did in Professor Pfenning’s class with “programs as data” and virtual machines actually made understanding this stuff surprisingly clear. On the other hand, I feel like it only helped us understand what recursion is and how it works, not how to use it to solve problems.

I’m curious if you guys considered teaching 15-150 (functional) BEFORE 15-122 (imperative). Was the issue that first semester freshmen generally lacked the math background provided in Concepts of Math, or were there other reasons?

It comes down to the distinction between languages and machines, discussed in my post by that name. It’s terrible that a language is so poorly constructed that it can only be explained by telling you how to compile it, and forcing you to think in terms of concepts that have no existence at the level of the language itself in order to “understand it”. Not only is it confusing and distracting, it provides you with zero methodological help in using the construct to get real work done. Compare our treatment of red-black trees to be presented next Tuesday with the one you learned in 122. I think it will be obvious which is better.

The staging of 122 “before” 150 is an accident of the roll-out plan; there is no causal dependency. Once things are established, students may start with 150 or 122; I would recommend 150 myself.

Thanks for the reply.

I had a chance to learn the material from the beginning of Structure and Interpretation of Computer Programs in my high school class (our teacher had taught it while he was a graduate student at UC Berkeley, and taught us that curriculum), and it made understanding recursion really, really easy when I moved on to Java and C, so I completely agree that it would be great to take 150 before 122.

However, I think about 80% of incoming CS freshman test out of 15-110 (my year was about 120 out of 150 if I recall correctly), which gives them the pre-requisite for 15-122 their first semester, but only a handful of them have Concepts (21-127) completed, so won’t roughly 75% of freshmen be forced to start 122? Have you considered making Concepts a co-requisite for 150 instead of a pre-requisite and starting with an introduction to basic proof techniques (or offering some sort of crash-course seminar)?

(It’s also worth adding that I feel Scheme was a pretty awful language to learn this stuff in compared to ML.)

Thanks for the post! It was very interesting and illuminating. (Not saying I completely buy your arguments yet, but….)

However, I don’t understand the distinction between a ” lazy language, rather than a language with laziness”. Could you provide a brief explanation or a link?

(Also: I loved the Sherlock Holmes reference. But if recursion is the dog that didn’t bark, whom does that make the murderer?)

It seems to follow the pattern of the distinction between “dynamically typed language” and “language with dynamic types” from the last post – in a dynamically typed language you do not have the option to use static types even if you want to, while a statically typed language can have a type “dynamic” that provides runtime checks.

To make a similar distinction between Haskell and ML seems to be claiming that Haskell does not allow one to define strict data types, even if you want to. That seems to be a bit too strong. Any user-defined type includes lazily evaluating nonterminating expressions as a sort of undefined value, but using strict constructor fields (in standards since 1.3 in 1996) you can ensure this is the only incomplete value.

The type “data Nat = Zero | Succ !Nat” includes the same defined values as the equivalent ML type, plus one extra value for a lazily evaluated nonterminating expression. This adds an extra case to any proof for the possibility that an argument is undefined, but the other cases proceed inductively, similarly to trying to reason in ML about expressions of type nat, which either fail to terminate, or yield a value.

I think I missed the final point. Bottoms screw up formal reasoning whether the computation is strict or lazy. Even in a strict (Turing complete) language you can “prove” nonsense like forall a. a, and build whatever arbitrary inductive proof you’d like from that. Love the posts so far, and would be fascinated by a follow up explaining why that you find bottoms less formally problematic in a strict language.

I don’t find talk about “bottom” to be particularly helpful; in fact, it’s particular confusing. The whole business of “bottom” is an artifact of denotational semantics, and not an essential concept of programming languages. However, for the sake of a reply, I will say this. In a lazy language divergence (“bottom”) is a VALUE, whereas in a strict language divergence is an EFFECT. That makes all the difference.

Hi Bob,

This is possibly a question which better fits a second class on FP than the first class, but what about eta-rules?

They don’t arise immediately from a view of equality as the congruent closure of the reduction relation, but you really need them to talk about things like optimizations and serious program transformations. For example, consider showing

`(map f) o (map g) == map (f o g)`

, or hoisting expressions out of recursive functions.That equation is not valid ML, so of course I wouldn’t teach it. We do not spend much time on equational reasoning; it is not anywhere near as important as facility with proofs by induction to verify crucial invariants about code.

I figure you must have considered teaching in “pure ML” (ie, redoing the Basis so that side-effects are isolated by type). Since you haven’t done this, what’s the case against it?

Well, in Haskell you’d have to use coinduction rather than induction, since the recursive definitions of datatypes are understood as final coalgebras. Good luck.

You should be able to use induction, too, but you probably need a base case for a bottom element in your proofs to make things absolutely rigorous. That (if it works) breaks the symmetry between programming and proving, though, because you cannot define functions by specifying what happens for bottom separately.

Haskell datatypes are both initial algebras and final coalgebras in their relevant category, so they come equipped with

somenotion of induction (and coinduction, of course). But it’s different from ‘normal.’This is neither here nor there. So what? I want induction. I want inductive types. I wouldn’t have them in Haskell.