April 12, 2011

In an earlier post I wrote about our approach to teaching students to reason inductively when programming recursively.  This generated some discussion, because teaching induction and recursion is notoriously difficult, to the point that many of my colleagues have all but given up hope of ever getting these ideas across to the majority of their students.  I’ve already explained our methods, and suggested some reasons why we seem to have succeeded where others have not.

We’re nearing the end of semester, and I must say that I am so proud of our students, I just have to brag a bit.  First, some background.  As I’ve mentioned, I am co-teaching a new course in functional programming with Dan Licata this semester as part of a thorough revamping of introductory CS that places a strong emphasis on reasoning about the correctness and efficiency of programs in both a sequential and parallel setting.  Dan and I have worked closely with an amazing team of teaching assistants in developing the new curriculum and course materials.  For the sake of continuity, Dan does all the lecturing, and I sit in the audience, sometimes kibbitzing a bit, but mostly just paying attention to the signals that students are sending and thinking about what we could do better next time.  Despite it being a moderately large class of about 83 students, we have managed to maintain an interactive atmosphere in which students freely ask, and respond, to questions, sometimes working in pairs for a few minutes on their own during the lecture.  This has worked very well for everyone, creating a conducive environment for learning to write beautiful code.

Today’s lecture was about persistent and ephemeral data structures.  One goal was simply to draw the distinction, and show how it is expressed in the interface of an abstract type; it is good to have some terminology available for comparing functional and imperative programming styles.  Another goal was to point out the importance of persistence for parallelism, since a parallel computation inherently requires that a data structure have “multiple futures”, rather than the “single future” provided by the ephemeral case.  (Put another way, the exclusive consideration of ephemeral representations in the usual data structures course taught worldwide looks more and more misguided as time goes on.  Indeed, our planned new approach to teaching data structures reverses the emphasis.  This is one reason why I consider object-oriented languages to be unsuitable for a modern introductory curriculum, conventional wisdom notwithstanding.)  A third goal was to re-acquaint the students with imperative programming so that they could make direct comparison with what they have been learning all semester, and recognize the limitations of imperative methods when it comes to parallelism and distribution.  Simply put, imperative methods are useful, if at all, in the “corner case” of sequential execution of programs acting on ephemeral data structures; in all other cases what you want are functional (transformational) methods.

But enough about context.  What made me feel so proud today was the sophistication displayed by our students during today’s interactive program development.  The exercise was to implement binary search tree delete in the persistent (transformational) style, and in the ephemeral (imperative) style in two different ways.  Dan set up the problem, and asked the students to help him write the code.  What impressed me so deeply was that as I listened to the murmur in the classroom, and to the proposals offered aloud, I could hear students saying things like

Well, inductively, we can assume that the problem has been solved for the left subtree, so to continue we need only ….

Outstanding!  (And remember, these are freshmen!)  Just as I had hoped, for these students thinking inductively comes naturally as the obvious way to think about a piece of recursive code!  They know instinctively how to formulate an inductive argument for the correctness of program, rather than resort to error-prone hand-waving about “going around and around the loop” that is all too typical at this level.  What makes this possible is that functional programs are inherently mathematical, allowing the students to concentrate on the ideas, rather than on the intricacies of coding in less expressive imperative languages.  I think that this gives them the confidence to tackle some quite difficult problems, such as the Barnes-Hut n-body algorithm or Jamboree game-search algorithm, for a one-week homework assignment.  Material that used to be complicated, or deferred to more advanced courses, can be done readily with beginners if you have the right framework in which to express the ideas.

As I walked back to my office, a group of students on the elevator with me groaned about the pain of having to do imperative programming again (most had learned the old ways last semester), but cheered me up by saying that doing so made a good case for why functional methods are better.  What more could one ask?

## The dog that didn’t bark

March 21, 2011

The crucial clue that allowed Holmes to solve the Silver Blaze mystery was the dog that didn’t bark: the apparent murderer and horse thief was not a stranger, but the trainer himself, the only one who could have gotten into the stables unnoticed.  As I’ve mentioned, this semester Dan Licata and I are co-teaching a new course in functional programming for first-year students at Carnegie Mellon.  Apart from parallelism, wbich I’ve already discussed, we are placing a strong emphasis on verification and proof as tools for the practicing programmer, chief among these being the infamous hellhounds of Computer Science, induction and recursion.  And I am happy to say that, as in the Holmes story, these are the dogs that didn’t bark.  The results from homework and an in-class exam make clear that the vast majority of the class of about 80 students have mastered these crucial techniques with a minimum of fuss and bother.  In this post I’d like to explain why I think we’ve succeeded where, apparently, so many others have failed.

One of the pleasures of teaching at Carnegie Mellon is that we the students are bright (among the brightest in the country according to the usual metrics), open-minded (they’re game for whatever I wish to throw at them), and hard-working (Carnegie’s motto is “My heart is in the work.”).  I am sure that this is a big part of the story.  Most probably our students could master almost anything, no matter how poorly we explain it!  But it can’t be just that either.  My colleagues at other top schools, both research universities and teaching colleges, public and private, American and international, have often spoken of their frustration and despair at teaching programming using recursion, let alone proof by induction, to beginning (or even advanced) students.  Some even go so far as to say that it can’t be done, that while the brightest students “get it”, the less bright students never will, so we should just give up even trying.  I’m not convinced.  I can’t prove that I’m right, but I do have quite a few years experience at pulling this off, so it is at least conceivable that the problem lies not with the students, but with the teaching methods.  So let me offer mine as an example, and perhaps others will be able to use them as well.

Let’s start with the punch line, just to pique your interest (or ire): the answer is type theory.  The great idea of constructive type theory is that there is no distinction between programs and proofs: the code embodies the reasoning the justifies it, and the reasoning is just a form of code.  This is the propositions as types principle (sometimes grandly called the “Curry-Howard isomorphism”, the small problem being that neither Curry nor Howard invented it, nor is it an isomorphism).  Theorems are types (specifications), and proofs are programs.  Programs/proofs are executed by evaluating them by applying simplification rules to reduce them to canonical form.  It’s just generalized high school algebra, in which you plug in values for variables and simplify using equational reasoning, but scaled up to a rich collection of types and programs acting on them.  In other words type theory is a language-based model of computation, which supports reasoning directly about the code you write, not its translation or interpretation in some other terms.

The significance of a language-based model for teaching cannot be overestimated.  One of the classic pitfalls of teaching recursion is to try to explain it by referring to some mysterious entity called a “stack” that is absolutely invisible in the program you write.  There’s a long song and dance about “frames” and “procedure calls” and “pushing” and “popping” things on this “stack” of which the professor speaks, and no one understands a damn thing.  It all sounds very complicated, and impressive in a way, but if there’s one thing people learn, it’s to stay the hell away from it!  And in doing so, students are robbed of one of the most powerful and useful programming tools available to them, the key to writing clean, reliable code (and, as I’ve previously argued, for parallelism as well).  The alternative is to use Plotkin’s structural operational semantics to define the behavior of all language features, including recursion.  To pull it off, you have to have a semantics for the language you’re using (something that’s in short supply for commercial languages), and you have to be facile in using it yourself.

What makes structural operational semantics so useful is that the execution model is directly defined by an inductive definition of a transition relation, $M\mapsto M'$, on programs.  (I don’t have the space to explain it fully here, but you may consult Practical Foundations for Programming Languages for a full account.)  The crucial idea is to use inference rules to give a structural definition of evaluation for programs themselves, not a translation into some underlying abstract machine code.  For example, one rule for evaluation of addition expressions is

$\displaystyle{M\mapsto M'\over M+N\mapsto M'+N}$,

which states that one step of evaluation of the sum $M+N$ consists of a step of evaluation of $M$ to obtain $M'$, with the result being $M'+N$.  Another rule states that

$\displaystyle{N\mapsto N'\over m+N\mapsto m+N'}$,

where $m$ is a number, and a third states that $\displaystyle{m+n\mapsto p}$, where $p$ is the sum of $m$ and $n$.  With these (and similar) rules in place, one may give an execution trace in the form

$M_1\mapsto M_2 \mapsto \cdots \mapsto M_k$,

which makes explicit the steps of evaluation.  Each step is justified by a “proof tree” consisting of a composition of rules.  Importantly, the size of the expressions $M_i$ correlates directly with the space required by an implementation, allowing for a clean and clear explanation of space issues such as tail recursion.

This is a good start for explaining the semantics of a recursive function, but the real beauty of type theory emerges when teaching how to reason about a recursively defined function.  The lesson of type theory is that induction and recursion are the same thing: an inductive proof is a recursive function, and a recursive function is an inductive proof.  This means that there is really only one concept here, and not two, and that the two go hand-in-hand.  It’s not a matter of “eat your spinach, it’s good for you”, but rather of a conceptual unification of proving and programming.  For our students thinking inductively and programming recursively go together naturally, the dynamic duo of functional programming.

The starting point is, once again, an inductive definition.  For specificity and ease of exposition, let me give an inductive definition of the natural numbers:

$\displaystyle{\strut\over\texttt{Zero}\,\textsf{nat}}$

$\displaystyle{m\,\textsf{nat}\over \texttt{Succ}(m)\,\textsf{nat}}$

The definition uses the exact same rule format that we used to define the semantics.  Inductive definitions given by a set of rules become second nature.  Moreover, they transcribe almost literally into ML:

datatype nat = Zero | Succ of nat

So here we have the primordial example of an inductively defined type.  Many data structures, such as various forms of trees, fit naturally into the same format, a fact that we use repeatedly throughout the semester.

Defining a function on an inductive type consists of giving a series of clauses, one for each rule defining the elements of that type:

fun plus x y = case x of  Zero => y | Succ x’ => Succ (plus x’ y)

Whenever we have an inductive type, we use the same format each time for functions defined over it.  Crucially, we allow ourselves “recursive calls” corresponding exactly to the premises of the rule governing a particular clause.  There is no need for any fuss about “pushing” things, nor any discussion of “sizes” of things, just a pattern of programming that is guided directly by the definition of the type on which we are working.

And now comes the best part.  If we want to prove something about a recursively defined function, we write a proof that has exactly the same form, one case for each rule in the inductive definition.  And, just as we get to make a “recursive call” for each premise of each rule, so we also get to make a “recursive call” to the proof for each premise, that is, an appeal to the inductive hypothesis.  The proof is, of course, a program that constructs a proof for every element of an inductive type by composing the inductive steps around a base case.

Importantly, the students “get it”.  On the midterm we ask them to prove (in an in-class setting) a small theorem about a piece of code, or to write a piece of code and prove a small theorem about it, and they are able to do this quite reliably.  In other words, it works!

Perhaps the most important thing, though, is to convey the sense of inevitability and naturality about induction and recursion.  Don’t talk about proof as a separate topic!  Don’t operationalize recursion beyond the bounds of the language model.  Simply do the natural thing, and it will all seem natural and obvious … and fun!

Moreover, it scales.  Our high-water mark prior to the midterm is to develop and debug a continuation-based regular expression matcher that has a subtle bug in it.  We state carefully the specification of the matcher, and use an inductive proof attempt to reveal the error.  The proof does not go through!  And the failure makes clear that there is a bug that can be fixed in a number of ways, repairing the proof along with the code.  I will detail this another time, but the impatient reader may wish to consult my JFP paper entitled “Proof-directed debugging” for an earlier version of what we taught this semester.

Let me close with two remarks.

Matthias Felleisen at Northeastern has been advocating essentially this methodology for decades; his work has strongly influenced mine.  His results are nothing short of spectacular, and with a much broader range of students than I ordinarily face here at Carnegie Mellon.  So, no excuses: read his How to Design Programs, and follow it!

Finally, this article explains why Haskell is not suitable for my purposes: the principle of induction is never valid for a Haskell program!  The problem is that Haskell is a lazy language, rather than a language with laziness.  It forces on you the existence of “undefined” values of every type, invalidating the bedrock principle of proof by induction.  Put in other terms suggested to me by John Launchbury, Haskell has a paucity of types, lacking even the bog standard type of natural numbers.  Imagine!

(And please note the analogy with dynamic languages, as opposed to languages with dynamic types. It’s the same point, albeit in another guise, namely the paucity of types.)

Follow