Just out, an essay on the Cambridge University Press author’s blog about “programming paradigms”, and why I did not structure Practical Foundations for Programming Languages around them.

Thoughts from an existential type.

Just out, an essay on the Cambridge University Press author’s blog about “programming paradigms”, and why I did not structure Practical Foundations for Programming Languages around them.

Advertisements

It is well-known that constructivists renounce “proof by contradiction”, and that classicists scoff at the critique. “Those constructivists,” the criticism goes, “want to rule out proofs by contradiction. How absurd! Look, Pythagoras showed that the square root of two is irrational by deriving a contradiction from the assumption that it is rational. There is nothing wrong with this. Ignore them!”

On examination that sort of critique fails, because *a proof by contradiction is not a proof that derives a contradiction*. Pythagoras’s proof is valid, one of the eternal gems of mathematics. No one questions the validity of that argument, even if they question proof by contradiction.

Pythagoras’s Theorem expresses a negation: *it is not the case that* the square root of two can be expressed as the ratio of two integers. Assume that it can be so represented. A quick deduction shows that this is impossible. So the assumption is false. Done. This is a* direct proof* of a negative assertion; it is *not* a “proof by contradiction”.

What, then, *is* a proof by contradiction? It is the *affirmation* of a positive statement by refutation of its denial. It is a *direct proof* of the negation of a negative assertion that is then pressed into service as a *direct proof* of the assertion, which it is not.* *Anyone is free to ignore the distinction for the sake of convenience, as a philosophical issue, or as a sly use of “goto” in a proof, but the distinction nevertheless exists and is important. Indeed, part of the beauty of constructive mathematics is that one can draw such distinctions, for, once drawn, one can selectively disregard them. Once blurred, forever blurred, a pure loss of expressiveness.

For the sake of explanation, let me rehearse a standard example of a proof by contradiction. The claim is that there exists irrationals a and b such that a to the b power is rational. Here is an indirect proof, a true proof by contradiction. Move number one, let us prove instead that it is impossible that any two irrationals a and b are such that a to the b is irrational. This is a negative statement, so of course one proves it be deriving a contradiction from assuming it. But it is not the original statement! This will be clear from examining the information content of the proof.

Suppose, for a contradiction, that every two irrationals a and b are such that a to the b power is irrational. We know from Pythagoras that root two is irrational, so plug it in for both a and b, and conclude that root two to the root two power is irrational. Now use the assumption again, taking a to be root two to the root two, and b to be root two. Calculate a to the power of b, it is two, which is eminently rational. Contradiction.

We have now proved that it is not the case that every pair of irrationals, when exponentiated, give an irrational. There is nothing questionable about this proof as far as I am aware. But it does not prove that there are two irrationals whose exponent is rational! If you think it does, then I ask you, please name them for me. That information is not in this proof (there are other proofs that do name them, but that is not relevant for my purposes). You may, if you wish, disregard the distinction I am drawing, that is your prerogative, and neither I nor anyone has any problem with that. But you cannot claim that it is a *direct proof*, it is rather an *indirect proof*, that proceeds by refuting the negative of the intended assertion.

So why am I writing this? Because I have learned, to my dismay, that in U.S. computer science departments–of all places!–students are being taught, *erroneously,* that any proof that derives a contradiction is a “proof by contradiction”. It is not. Any proof of a negative must proceed by contradiction. A proof by contradiction in the long-established sense of the term is, contrarily, an indirect proof of a positive by refutation of the negative. This distinction is important, even if you want to “mod out” by it in your work, for it is only by drawing the distinction that one can even define the equivalence with which to quotient.

That’s my main point. But for those who may not be familiar with the distinction between direct and indirect proof, let me take the opportunity to comment on why one might care to draw such a distinction. It is entirely a matter of intellectual honesty: the information content of the foregoing indirect proof does not fulfill the expectation stated in the theorem. It is a kind of boast, an overstatement, to claim otherwise. Compare the original statement with the reformulation used in the proof. The claim that it is not the case that every pair of irrationals exponentiate to an irrational is uncontroversial. The proof proves it directly, and there is nothing particularly surprising about it. One would even wonder why anyone would bother to state it. Yet the supposedly equivalent claim stated at the outset appears much more fascinating, because most people cannot easily think up an example of two irrationals that exponentiate to rationals. Nor does the proof provide one. Once, when shown the indirect proof, a student of mine blurted out “oh that’s so cheap.” Precisely.

Why should you care? Maybe you don’t, but there are nice benefits to keeping the distinction, because it demarcates the boundary between constructive proofs, which have direct interpretation as functional programs, and classical proofs, which have only an indirect such interpretation (using continuations, to be precise, and giving up canonicity). Speaking as a computer scientist, this distinction matters, and it’s not costly to maintain. May I ask that you adhere to it?

*Edit: **rewrote final paragraph, sketchy and irrelevant, and improved prose throughout. *

I am building a web page devoted to the 2nd edition of *Practical Foundations for Programming Languages*, recently published by Cambridge University Press. Besides an errata, the web site features a commentary on the text explaining major design decisions and suggesting alternatives. I also plan to include additional exercises and to make sample solutions available to faculty teaching from the book.

The purpose of the commentary is to provide the “back story” for the development, which is often only hinted at, or is written between the lines, in *PFPL* itself. To emphasize enduring principles over passing fads, I have refrained from discussing particular languages in the book. But this makes it difficult for many readers to see the relevance. One purpose of the commentary is to clarify these connections by explaining *why* I said what I said.

As a starting point, I explain why I ignore the familiar concept of a “paradigm” in my account of languages. The idea seems to have been inspired by Kuhn’s (in)famous book *The Structure of Scientific Revolutions*, and was perhaps a useful device at one time. But by now the idea of a paradigm is just too vague to be useful, and there are many better ways to explain and systematize language structure. And so I have avoided it.

I plan for the commentary to be a living document that I will revise and expand as the need arises. I hope for it to provide some useful background for readers in general, and teachers in particular. I wish for the standard undergraduate PL course to evolve from a superficial taxonomy of the weird animals in the language zoo to a systematic study of the general theory of computation. Perhaps *PFPL* can contribute to effecting that change.

Today I received my copies of *Practical Foundations for Programming Languages, Second Edition* on Cambridge University Press. The new edition represents a substantial revision and expansion of the first edition, including these:

- A new chapter on type refinements has been added, complementing previous chapters on dynamic typing and on sub-typing.
- Two old chapters were removed (general pattern matching, polarization), and several chapters were very substantially rewritten (higher kinds, inductive and co-inductive types, concurrent and distributed Algol).
- The parallel abstract machine was revised to correct an implied extension that would have been impossible to carry out.
- Numerous corrections and improvements were made throughout, including memorable and pronounceable names for languages.
- Exercises were added to the end of each chapter (but the last). Solutions are available separately.
- The index was revised and expanded, and some conventions systematized.
- An inexcusably missing easter egg was inserted.

I am grateful to many people for their careful reading of the text and their suggestions for correction and improvement.

In writing this book I have attempted to organize a large body of material on programming language concepts, all presented in the unifying framework of type systems and structural operational semantics. My goal is to give precise definitions that provide a clear basis for discussion and a foundation for both analysis and implementation. The field needs such a foundation, and I hope to have helped provide one.

A recent discussion of introductory computer science education led to the topic of teaching recursion. I was surprised to learn that students are being taught that recursion requires understanding something called a “stack” that is nowhere in evidence in their code. Few, if any, students master the concept, which is usually “covered” only briefly. Worst, they are encouraged to believe that recursion is a mysterious bit of esoterica that is best ignored.

And thus is lost one of the most important and beautiful concepts in computing.

The discussion then moved on to the implementation of recursion in certain inexplicably popular languages for teaching programming. As it turns out, the compilers mis-implement recursion, causing unwarranted space usage in common cases. Recursion is dismissed as problematic and unimportant, and the compiler error is elevated to a “design principle” — to be snake-like is to do it wrong.

And thus is lost one of the most important and beautiful concepts in computing.

And yet, for all the stack-based resistance to the concept, *recursion** has nothing to do with a stack*. Teaching recursion does not need any mumbo-jumbo about “stacks”. Implementing recursion does not require a “stack”. The idea that the two concepts are related is simply mistaken.

What, then, is recursion? It is nothing more than *self-reference*, the ability to name a computation for use within the computation itself. *Recursion is what it is*, and nothing more. No stacks, no tail calls, no proper or improper forms, no optimizations, just self-reference pure and simple. Recursion is not tied to “procedures” or “functions” or “methods”; one can have self-referential values of all types.

Somehow these very simple facts, which date back to the early 1930’s, have been replaced by damaging myths that impede teaching and using recursion in programs. It is both a conceptual and a practical loss. For example, the most effective methods for expressing parallelism in programs rely heavily on recursive self-reference; much would be lost without it. And the allegation that “real programmers don’t use recursion” is beyond absurd: the very concept of a digital computer is grounded in recursive self-reference (the cross-connection of gates to form a latch). (Which, needless to say, does not involve a stack.) Not only do real programmers use recursion, there could not even be programmers were it not for recursion.

I have no explanation for why this terrible misconception persists. But I do know that when it comes to programming languages, attitude trumps reality every time. Facts? We don’t need no stinking facts around here, amigo. You must be some kind of mathematician.

If all the textbooks are wrong, what is right? How *should* one explain recursion? It’s simple. If you want to refer to yourself, you need to give yourself a name. “I” will do, but so will any other name, by the miracle of α-conversion. A computation is given a name using a *fixed point* (not *fixpoint*, dammit) operator: *fix x is e* stands for the expression *e* named *x* for use within *e*. Using it, the textbook example of the factorial function is written thus:

fix f is fun n : nat in case n {zero => 1 | succ(n') => n * f n'}.

Let us call this whole expression *fact,* for convenience. If we wish to evaluate it, perhaps because we wish to apply it to an argument, its value is

fun n : nat in case n {zero => 1 | succ(n') => n *factn'}.

The recursion has been *unrolled* one step ahead of execution. If we reach *fact* again, as we will for a positive argument, *fact* is evaluated again, in the same way, and the computation continues. *There are no stacks involved in this explanation*.

Nor is there a stack involved in the implementation of fixed points. It is only necessary to make sure that the named computation does indeed name itself. This can be achieved by a number of means, including circular data structures (non-well-founded abstract syntax), but the most elegant method is by *self-application*. Simply arrange that a self-referential computation has an implicit argument with which it refers to itself. Any use of the computation unrolls the self-reference, ensuring that the invariant is maintained. No storage allocation is required.

Consequently, a self-referential functions such as

fix f is fun (n : nat, m:nat) in case n {zero => m | succ(n') => f (n',n*m)}

execute without needing any asymptotically significant space. It is quite literally a loop, and *no special arrangement* is required to make sure that this is the case. All that is required is to implement recursion properly (as self-reference), and you’re done. *There is no such thing as tail-call optimization. *It’s not a matter of optimization, but of proper implementation. Calling it an optimization suggests it is optional, or unnecessary, or provided only as a favor, when it is more accurately described as a matter of getting it right.

So what, then, is the source of the confusion? The problem seems to be a too-close association between compound expressions and recursive functions or procedures. Consider the classic definition of factorial given earlier. The body of the definition involves the expression

n *factn'

where there is a pending multiplication to be accounted for. Once the recursive call (to itself) completes, the multiplication can be carried out, and it is necessary to keep track of this pending obligation. *But this phenomenon has nothing whatsoever to do with recursion.* If you write

n *squaren'

then it is equally necessary to record where the external call is to return its value. In typical accounts of recursion, the two issues get confused, a regrettable tragedy of error.

Really, the need for a stack arises the moment one introduces compound expressions. This can be explained in several ways, none of which need pictures or diagrams or any discussion about frames or pointers or any extra-linguistic concepts whatsoever. The best way, in my opinion, is to use Plotkin’s structural operational semantics, as described in my *Practical Foundations for Programming Languages (Second Edition)* on Cambridge University Press.

There is no reason, nor any possibility, to avoid recursion in programming. But folk wisdom would have it otherwise. That’s just the trouble with folk wisdom, everyone knows it’s true, even when it’s not.

*Update*: Dan Piponi and Andreas Rossberg called attention to a pertinent point regarding stacks and recursion. The conventional notion of a run-time stack records two distinct things, the *control state* of the program (such as subroutine return addresses, or, more abstractly, pending computations, or continuations), and the *data state* of the program (a term I just made up because I don’t know a better one, for managing multiple simultaneous activations of a given procedure or function). Fortran (back in the day) didn’t permit multiple activations, meaning that at most one instance of a procedure can be in play at a given time. One consequence is that α-equivalence can be neglected: the arguments of a procedure can be placed in a statically determined spot for the call. As a member of the Algol-60 design committee Dijkstra argued, successfully, for admitting multiple procedure activations (and hence, with a little extra arrangement, recursive/self-referential procedures). Doing so requires that α-equivalence be implemented properly; two activations of the same procedure cannot share the same argument locations. The data stack implements α-equivalence using de Bruijn indices (stack slots); arguments are passed on the data stack using activation records in the now-classic manner invented by Dijkstra for the purpose. It is not self-reference that gives rise to the need for a stack, but rather re-entrancy of procedures, which can arise in several ways, not just recursion. Moreover, recursion does not always require re-entrancy—the so-called tail call optimization is just the observation that certain recursive procedures are not, in fact, re-entrant. (Every looping construct illustrates this principle, albeit on an *ad hoc* basis, rather than as a general principle.)

Having just returned from the annual Oregon Programming Languages Summer School, at which I teach every year, I am once again very impressed with the impressive growth in the technical sophistication of the field and with its ability to attract brilliant young students whose enthusiasm and idealism are inspiring. Eugene was, as ever, an ideal setting for the summer school, providing a gorgeous setting for work and relaxation. I was particularly glad for the numerous chances to talk with students outside of the classroom, usually over beer, and I enjoyed, as usual, the superb cycling conditions in Eugene and the surrounding countryside. Many students commented to me that the atmosphere at the summer school is wonderful, filled with people who are passionate about programming languages research, and suffused with a spirit of cooperation and sharing of ideas.

Started by Zena Ariola a dozen years ago, this year’s instance was organized by Greg Morrisett and Amal Ahmed in consultation with Zena. As usual, the success of the school depended critically on the dedication of Jim Allen, who has been the *de facto* chief operating officer since it’s inception. Without Jim, OPLSS could not exist. His attention to detail, and his engagement with the students are legendary. Support from the National Science Foundation CISE Division, ACM SIGPLAN, Microsoft Research, Jane Street Capital, and BAE Systems was essential for providing an excellent venue, for supporting a roster of first-rate lecturers, and for supporting the participation of students who might otherwise not have been able to attend. And, of course, an outstanding roster of lecturers donated their time to come to Eugene for a week to share their ideas with the students and their fellow lecturers.

The schedule of lectures is posted on the web site, all of which were taped, and are made available on the web. In addition many speakers provided course notes, software, and other backing materials that are also available online. So even if you were not able to attend, you can still benefit from the summer school, and perhaps feel more motivated to come next summer. Greg and I will be organizing, in consultation with Zena. Applying the principle “don’t fix what isn’t broken”, we do not anticipate major changes, but there is always room for improvement and the need to freshen up the content every year. For me the central idea of the summer school is the applicability of deep theory to everyday practice. Long a dream held by researchers such as me, these connections become more “real” every year as the theoretical abstractions of yesterday become the concrete practices of today. It’s breathtaking to see how far we’ve come from the days when I was a student just beginning to grasp the opportunities afforded by ideas from proof theory, type theory, and category theory (the Holy Trinity) to building beautiful software systems. No longer the abstruse fantasies of mad (computer) scientists, these ideas are the very air we breathe in PL research. Gone are the days of *ad hoc* language designs done in innocence of the foundations on which they rest. Nowadays serious industrial-strength languages are emerging that are grounded in theory and informed by practice.

Two examples have arisen just this summer, Rust (from Mozila) and Swift (from Apple), that exemplify the trend. Although I have not had time to study them carefully, much less write serious code using them, it is evident from even a brief review of their web sites that these are serious languages that take account of the academic developments of the last couple of decades in formulating new language designs to address new classes of problems that have arisen in programming practice. These languages are type safe, a basic criterion of sensibility, and feature sophisticated type systems that include ideas such as sum types, which have long been missing from commercial languages, or provided only in comically obtuse ways (such as objects). The infamous null pointer mistakes have been eradicated, and the importance of pattern matching (in the sense of the ML family of languages) is finally being appreciated as the cure for Boolean blindness. For once I can look at new industrial languages without an overwhelming sense of disappointment, but instead with optimism and enthusiasm that important ideas are finally, at long last, being recognized and adopted. As has often been observed, it takes 25 years for an academic language idea to make it into industrial practice. With Java it was simply the 1970’s idea of automatic storage management; with languages such as Rust and Swift we are seeing ideas from the 80’s and 90’s make their way into industrial practice. It’s cause for celebration, and encouragement for those entering the field: the right ideas do win out in the end, one just has to have the courage to be irrelevant.

I hope to find the time to comment more meaningfully on the recent developments in practical programming languages, including Rust and Swift, but also languages such as Go and OCaml that are also making inroads into programming practice. (The overwhelming success and future dominance of Haskell is self-evident. Kudos!) But for now, let me say that the golden age of programming language research is here and now, and promises to continue indefinitely.

*Update*: word smithing.