## Summer of Programming Languages

July 6, 2014

Having just returned from the annual Oregon Programming Languages Summer School, at which I teach every year, I am once again very impressed with the impressive growth in the technical sophistication of the field and with its ability to attract brilliant young students whose enthusiasm and idealism are inspiring.  Eugene was, as ever, an ideal setting for the summer school, providing a gorgeous setting for work and relaxation.  I was particularly glad for the numerous chances to talk with students outside of the classroom, usually over beer, and I enjoyed, as usual, the superb cycling conditions in Eugene and the surrounding countryside.  Many students commented to me that the atmosphere at the summer school is wonderful, filled with people who are passionate about programming languages research, and suffused with a spirit of cooperation and sharing of ideas.

Started by Zena Ariola a dozen years ago, this year’s instance was organized by Greg Morrisett and Amal Ahmed in consultation with Zena.  As usual, the success of the school depended critically on the dedication of Jim Allen, who has been the de facto chief operating officer since it’s inception.  Without Jim, OPLSS could not exist.  His attention to detail, and his engagement with the students are legendary.   Support from the National Science Foundation CISE Division, ACM SIGPLANMicrosoft Research, Jane Street Capital, and BAE Systems was essential for providing an excellent venue,  for supporting a roster of first-rate lecturers, and for supporting the participation of students who might otherwise not have been able to attend.  And, of course, an outstanding roster of lecturers donated their time to come to Eugene for a week to share their ideas with the students and their fellow lecturers.

The schedule of lectures is posted on the web site, all of which were taped, and are made available on the web.  In addition many speakers provided course notes, software, and other backing materials that are also available online.  So even if you were not able to attend, you can still benefit from the summer school, and perhaps feel more motivated to come next summer.  Greg and I will be organizing, in consultation with Zena.  Applying the principle “don’t fix what isn’t broken”, we do not anticipate major changes, but there is always room for improvement and the need to freshen up the content every year.  For me the central idea of the summer school is the applicability of deep theory to everyday practice.  Long a dream held by researchers such as me, these connections become more “real” every year as the theoretical abstractions of yesterday become the concrete practices of today.  It’s breathtaking to see how far we’ve come from the days when I was a student just beginning to grasp the opportunities afforded by ideas from proof theory, type theory, and category theory (the Holy Trinity) to building beautiful software systems.  No longer the abstruse fantasies of mad (computer) scientists, these ideas are the very air we breathe in PL research.  Gone are the days of ad hoc language designs done in innocence of the foundations on which they rest.  Nowadays serious industrial-strength languages are emerging that are grounded in theory and informed by practice.

Two examples have arisen just this summer, Rust (from Mozila) and Swift (from Apple), that exemplify the trend.  Although I have not had time to study them carefully, much less write serious code using them, it is evident from even a brief review of their web sites that these are serious languages that take account of the academic developments of the last couple of decades in formulating new language designs to address new classes of problems that have arisen in programming practice.  These languages are type safe, a basic criterion of sensibility, and feature sophisticated type systems that include ideas such as sum types, which have long been missing from commercial languages, or provided only in comically obtuse ways (such as objects).  The infamous null pointer mistakes have been eradicated, and the importance of pattern matching (in the sense of the ML family of languages) is finally being appreciated as the cure for Boolean blindness.  For once I can look at new industrial languages without an overwhelming sense of disappointment, but instead with optimism and enthusiasm that important ideas are finally, at long last, being recognized and adopted.  As has often been observed, it takes 25 years for an academic language idea to make it into industrial practice.  With Java it was simply the 1970’s idea of automatic storage management; with languages such as Rust and Swift we are seeing ideas from the 80’s and 90’s make their way into industrial practice.  It’s cause for celebration, and encouragement for those entering the field: the right ideas do win out in the end, one just has to have the courage to be irrelevant.

I hope to find the time to comment more meaningfully on the recent developments in practical programming languages, including Rust and Swift, but also languages such as Go and OCaml that are also making inroads into programming practice.  (I’ve had quite enough to say about Haskell for the time being, so I’ll give that one a rest, but with a tip of the hat to its enormous popularity and influence, despite my criticisms.)  But for now, let me say that the golden age of programming language research is here and now, and promises to continue indefinitely as we develop a grand unified theory of programming and mathematics.

## Bellman Confirms A Suspicion

April 21, 2014

As is by now well-known, I regard the supposed opposition between static and dynamic languages as a fallacy: the latter, being a special case of the former, can scarcely be an alternative to it.  I cannot tell you how many times I’ve been handed arguments along the lines of “oh, static languages are just fine, but I want something more dynamic,” the speaker not quite realizing the absurdity of what they are saying.  Yet somehow this sort of argument has an appeal, and I’ve often wondered why.  I think it’s mostly just semantic innocence, but I’ve long had the suspicion that part of it is that it sounds good to be dynamic (active, outgoing, nimble) rather than static (passive, boring, staid).  As we all know, much of the popularity of programming languages comes down to such superficialities and misunderstandings, so what else is new?

Well, nothing, really, except that I recently learned (from Guy Blelloch) the origin of the notably inapt term dynamic programming for a highly useful method of memoization invented by Richard Bellman that is consonant with my suspicion.  Bellman, it turns out, had much the same thought as mine about the appeal of the word “dynamic”, and used it consciously to further his own ends:

“I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes.

“An interesting question is, ‘Where did the name, dynamic programming, come from?’ The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various rea- sons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities” (p. 159).

Hilarious, or what?  It explains a lot, I must say, and confirms a long-standing suspicion of mine about the persistent belief in a non-existent opposition.

Update: does anyone know why we say “memoization” rather than “memorization”?

## Old neglected theorems are still theorems

March 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

$\begin{array}{rcl} \textit{gcd}(m,0) & = & m \\ \textit{gcd}(0,m) & = & m \\ \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\ \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m

The type of $\textit{gcd}$ defined in this manner has partial function type $(\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}$, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say $2^{2^n}$, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.

## Homotopy type theory lectures and notes on-line

December 4, 2013

My activity on this blog has been reduced to nil recently because I have been spending my time preparing a series of lectures on homotopy type theory, starting from basic principles and ending with the application of univalence and higher inductive types to algebraic topology.  The course web page contains links to the video-taped lectures and to the course notes prepared by the students this semester.  These will be available indefinitely and are accessible to anyone interested in the course.  My hope is that these will provide a useful introduction to the HoTT Book, which is available for free on the web and may be printed on demand.  My intention is to write an introduction to dependent type theory for a CS audience, which will serve as a reference for much ongoing work in the area and as a prolegomenon to the HoTT Book itself.

## What, If Anything, Is A Declarative Language?

July 18, 2013

Back in the 1980’s it was very fashionable to talk about “declarative” programming languages.  But to my mind there was never a clear definition of a “declarative language”, and hence no way to tell what is declarative and what is not.  Lacking any clear meaning, the term came to refer to the arbitrary conflation of functional with logic programming to such an extent that “functional-and-logic-programming” almost became a Germanic thing-in-itself (ding an sich).  Later, as the logic programming wave subsided, the term “declarative”,  like “object-oriented”, came to be an expression of approval, and then, mercifully, died out.

Or so I had thought.  Earlier this week I attended a thriller of an NSF-sponsored workshop on high-level programming models for parallelism, where I was surprised by the declarative zombie once again coming to eat our brains.  This got me to thinking, again, about whether the term has any useful meaning.  For what it’s worth, and perhaps to generate useful debate, here’re some things that I think people mean, and why I don’t think they mean very much.

1. “Declarative” means “high-level”.  This just seems to replace one vague term by another.
2. “Declarative” means “not imperative”.  But this flies in the face of reality.  Functional languages embrace and encompass imperative programming as a special case, and even Prolog has imperative features, such as assert and retract, that have imperative meaning.
3. “Declarative” means “functional”.  OK, but then we don’t really need another word.
4. “Declarative” means “what, not how”.  But every language has an operational semantics that defines how to execute its programs, and you must be aware of that semantics to understand programs written in it.  Haskell has a definite evaluation order, just as much as ML has a different one, and even Prolog execution is defined by a clear operational semantics that determines the clause order and that can be influenced by “cut”.
5. “Declarative” means “equational”.  This does not distinguish anything, because there is a well-defined notion of equivalence for any programming language, namely observational equivalence.  Different languages induce different equivalences, of course, but how does one say that one equivalence is “better” than another?  At any rate, I know of no stress on equational properties of logic programs, so either logic programs are not “declarative” or “equational reasoning” is not their defining characteristic.
6. “Declarative” means “referentially transparent”.  The misappropriation of Quine’s terminology only confuses matters.  All I’ve been able to make of this is that “referentially transparent” means that beta-equivalence is valid.  But beta equivalence is not a property of an arbitrary programming language, nor in any case is it clear why this equivalence is first among equals.  In any case why you would decide a priori on what equivalences you want before you even know what it means to run a program?
7. “Declarative” means “has a denotation”.  This gets closer to the point, I think, because we might well say that a declarative semantics is one that gives meaning to programs as some kind of mapping between some sort of spaces.  In other words, it would be a synonym for “denotational semantics”.  But every language has a denotational semantics (perhaps by interpretation into a Kripke structure to impose sequencing), so having one does not seem to distinguish a useful class of languages.  Moreover, even in the case of purely functional programs, the usual denotational semantics (as continuous maps) is not fully abstract, and the fully abstract semantics (as games) is highly operational.  Perhaps a language is declarative in proportion to being able to give it semantics in some “familiar” mathematical setting?
8. “Declarative” means “implicitly parallelizable“.  This was certainly the sense intended at the NSF meeting, but no two “declarative” languages seemed to have much in common.  Charlie Garrod proposes just “implicit”, which is pretty much synonymous with “high level”, and may be the most precise sense there is to the term.

No doubt this list is not exhaustive, but I think it covers many of the most common interpretations.  It seems to me that none of them have a clear meaning or distinguish a well-defined class of languages.  Which leads me to ask, is there any such thing as a declarative programming language?

[Thanks to the late Steve Gould for inspiring the title of this post.]

[Update: wordsmithing.]

## Exceptions are shared secrets

December 3, 2012

It’s quite obvious to me that the treatment of exceptions in Haskell is wrong. Setting aside the example I gave before of an outright unsoundness, exceptions in Haskell are nevertheless done improperly, even if they happen to be sound. One reason is that the current formulation is not stable under seemingly mild extensions to Haskell that one might well want to consider, notably any form of parameterized module or any form of shadowing of exception declarations. For me this is enough to declare the whole thing wrong, but as it happens Haskell is too feeble to allow full counterexamples to be formulated, so one may still claim that what is there now is ok … for now.

But I wish to argue that Haskell exceptions are nevertheless poorly designed, because it misses the crucial point that exception values are shared secrets. Let us distinguish two parties, the raiser of the exception, and the handler of it. The fundamental idea of exceptions is to transfer a value from the raiser to the handler without the possibility of interception by another party. While the language of secrecy seems appropriately evocative, I hasten to add that I am not here concerned with “attackers” or suchlike, but merely with the difficulties of ensuring modular composition of programs from components. In such a setting the “attacker” is yourself, who is not malicious, but who is fallible.

By raising an exception the raiser is “contacting” a handler with a message. The raiser wishes to limit which components of a program may intercept that message. More precisely, the raiser wishes to ensure that only certain previously agreed-upon components may handle that exception, perhaps only one. This property should remain stable under extension to the program or composition with any other component. It should not be possible for an innocent third party to accidentally intercept a message that was not intended for it.

Achieving this requires a secrecy mechanism that allows the raiser and the handler(s) to agree upon their cooperation. This is accomplished by dynamic classification, exactly as it is done properly in Standard ML (but not O’Caml). The idea is that the raiser has access to a dynamically generated constructor for exception values, and any handler has access to the corresponding dynamically generated matcher for exception values. This means that the handler, and only the handler, can decode the message sent by the raiser; no other party can do anything with it other than pass it along unexamined. It is “perfectly encrypted” and cannot be deciphered by any unintended component.

The usual exception mechanisms, as distinct from exception values, allow for “wild-card handlers”, which means that an exception can be intercepted by a third party. This means that the raiser cannot ensure that the handler actually receives the message, but it can ensure, using dynamic classification, that only a legitimate handler may decipher it. Decades of experience with Standard ML shows that this is a very useful thing indeed, and has application far beyond just the simple example considered here. For full details, see my forthcoming book, for a full discussion of dynamic classification and its role for ensuring integrity and confidentiality in a program. Dynamic classification is not just for “security”, but is rather a good tool for everyday programming.

So why does Haskell not do it this way? Well, I’m not the one to answer that question, but my guess is that doing so conflicts with the monadic separation of effects. To do exceptions properly requires dynamic allocation, and this would force code that is otherwise functional into the IO monad. Alternatively, one would have to use unsafePerformIO—as in ezyang’s implementation—to “hide” the effects of exception allocation. But this would then be further evidence that the strict monadic separation of effects is untenable.

Update: Reworked last paragraph to clarify the point I am making; the previous formulation appears to have invited misinterpretation.

Update: This account of exceptions also makes clear why the perennial suggestion to put exception-raising information into types makes no sense to me. I will write more about this in a future post, but meanwhile contemplate that a computation may raise an exception that is not even in principle nameable in the type. That is, it is not conservativity that’s at issue, it’s the very idea.

## PFPL is out!

December 3, 2012

Practical Foundations for Programming Languages, published by Cambridge University Press, is now available in print! It can be ordered from the usual sources, and maybe some unusual ones as well. If you order directly from Cambridge using this link, you will get a 20% discount on the cover price (pass it on).

Since going to press I have, inevitably, been informed of some (so far minor) errors that are corrected in the online edition. These corrections will make their way into the second printing. If you see something fishy-looking, compare it with the online edition first to see whether I may have already corrected the mistake. Otherwise, send your comments to me.

By the way, the cover artwork is by Scott Draves, a former student in my group, who is now a professional artist as well as a researcher at Google in NYC. Thanks, Scott!

Update: The very first author’s copy hit my desk today!

## Introductory FP Course Materials

September 15, 2012

The course materials for our first-semester introductory programming course (which I’ve discussed elsewhere on this blog) are now available here.

The course materials for our second-semester data structures and algorithms course are available here.

Thanks to Dan Licata and Guy Blelloch for helping make these available.  Comments are most welcome.

## Intro Curriculum Update

August 17, 2012

In previous posts I have talked about the new introductory CS curriculum under development at Carnegie Mellon. After a year or so of planning, we began to roll out the new curriculum in the Spring of 2011, and have by now completed the transition. As mentioned previously, the main purpose is to bring the introductory sequence up to date, with particular emphasis on introducing parallelism and verification. A secondary purpose was to restore the focus on computing fundamentals, and correct the drift towards complex application frameworks that offer the students little sense of what is really going on. (The poster child was a star student who admitted that, although she had built a web crawler the previous semester, she in fact has no idea how to build a web crawler.) A particular problem is that what should have been a grounding in the fundamentals of algorithms and data structures turned into an exercise in object-oriented programming, swamping the core content with piles of methodology of dubious value to beginning students. (There is a new, separate, upper-division course on oo methodology for students interested in this topic.) A third purpose was to level the playing field, so that students who had learned about programming on the street were equally as challenged, if not more so, than students without much or any such experience. One consequence would be to reduce the concomitant bias against women entering CS, many fewer of whom having prior computing experience than the men.

The solution was a complete do-over, jettisoning the traditional course completely, and starting from scratch. The most important decision was to emphasize functional programming right from the start, and to build on this foundation for teaching data structures and algorithms. Not only does FP provide a much more natural starting point for teaching programming, it is infinitely more amenable to rigorous verification, and provides a natural model for parallel computation. Every student comes to university knowing some algebra, and they are therefore familiar with the idea of computing by calculation (after all, the word algebra derives from the Arabic al jabr, meaning system of calculation). Functional programming is a generalization of algebra, with a richer variety of data structures and a richer set of primitives, so we can build on that foundation. It is critically important that variables in FP are, in fact, mathematical variables, and not some distorted facsimile thereof, so all of their mathematical intuitions are directly applicable. So we can immediately begin discussing verification as a natural part of programming, using principles such as mathematical induction and equational reasoning to guide their thinking. Moreover, there are natural concepts of sequential time complexity, given by the number of steps required to calculate an answer, and parallel time complexity, given by the data dependencies in a computation (often made manifest by the occurrences of variables). These central concepts are introduced in the first week, and amplified throughout the semester.

Two major homework exercises embody the high points of the first-semester course, one to do with co-development of code with proof, the other to do with parallelism. Co-development of program and proof is illustrated by an online regular expression matcher. The problem is a gem for several reasons. One is that it is essentially impossible for anyone to solve by debugging a blank screen. This sets us up nicely for explaining the importance of specification and proof as part of the development process. Another is that it is very easy, almost inevitable, for students to make mistakes that are not easily caught or diagnosed by testing. We require the students to carry out a proof of the correctness of the matcher, and in the process discover a point where the proof breaks down, which then leads to a proper solution. (See my JFP paper “Proof-Directed Debugging” for a detailed development of the main ideas.) The matcher also provides a very nice lesson in the use of higher-order functions to capture patterns of control, resulting in an extremely clean and simple matcher whose correctness proof is highly non-trivial.

The main parallelism example is the Barnes-Hut algorithm for solving the n-body problem in physics. Barnes-Hut is an example of a “tree-based” method, invented by Andrew Appel, for solving this well-known problem. At a high level the main idea is to decompose space into octants (or quadrants if you’re working in the plane), recursively solving the problem for each octant and then combining the solutions to make an overall solution. The key idea is to use an approximation for bodies that are far enough away—a distant constellation can be regarded as an atomic body for the purposes of calculating the effects of its stars on the sun, say. The problem is naturally parallelizable, because of the recursive decomposition. Moreover, it provides a very nice link to their high school physics. Since FP is just an extension of mathematics, the equations specifying the force law and Newton’s Law carry over directly to the code. This is an important sense in which FP builds on and amplifies their prior mathematical experience, and shows how one may connect computer science with other sciences in a rather direct manner.

The introductory FP course establishes the basis for the new data structures and algorithms course that most students take in either the third or fourth semester. This course represents a radical departure from tradition in several ways. First, it is a highly rigorous course in algorithms that rivals the upper-division course taught at most universities (including our own) for the depth and breadth of ideas it develops. Second, it takes the stance that all algorithms are parallel algorithms, with sequential being but a special case of parallel. Of course some algorithms have a better parallelizability ratio (a precise technical characterization of the ability to make use of parallelism), and this can be greatly affected by data structure selection, among other considerations. Third, the emphasis is on persistent abstract types, which are indispensable for parallel computing. No more linked lists, no more null pointer nonsense, no more mutation. Instead we consider abstract types of graphs, trees, heaps, and so forth, all with a persistent semantics (operations create “new” ones, rather than modify “old” ones). Fourth, we build on the reasoning methods introduced in the first semester course to prove the correctness and the efficiency of algorithms. Functional programming makes all of this possible. Programs are concise and amenable to proof, they are naturally parallel, and they enjoy powerful typing properties that enforce abstraction in a mathematically rigorous manner. Fifth, there is a strong emphasis on problems of practical interest. For example, homework 1 is the shotgun method for genome sequencing, a parallel algorithm of considerable practical importance and renown.

There is a third introductory course in imperative programming, taken in either the first or second semester (alternating with the functional programming course at the student’s discretion), that teaches the “old ways” of doing things using a “safe” subset of C. Personally, I think this style of programming is obsolete, but there are many good reasons to continue to teach it, the biggest probably being the legacy problem. The emphasis is on verification, using simple assertions that are checked at run-time and which may be verified statically in some situations. It is here that students learn how to do things “the hard way” using low-level representations. This course is very much in the style of the 1970’s era data structures course, the main difference being that the current incarnation of Pascal has curly braces, rather than begin-end.

For the sake of those readers who may not be up to date on such topics, it seems important to emphasize that functional programming subsumes imperative programming. Every functional language is capable of expressing the old-fashioned sequential pointer-hacking implementation of data structures. You can even reproduce Tony Hoare’s self-described “billion dollar mistake” of the cursed “null pointer” if you’d like! But the whole point is that it is rarely useful, and almost never elegant, to work that way. (Curiously, the “monad mania” in the Haskell community stresses an imperative, sequential style of programming that is incompatible with parallelism, but this is not forced on you as it is in the imperative world.) From this point of view there no loss, and considerable gain, in teaching functional programming from the start. It puts a proper emphasis on mathematically sane programming methods, but allows for mutation-based programming where appropriate (for example, in engendering “side effects” on users).

I encourage readers to review the syllabi and course materials. There is quite a large body of material in place that we plan to expand into textbook-level treatments in the near future. We also plan to write a journal article documenting our experiences with these courses.

I am very grateful to my colleagues Guy Blelloch, Dan Licata, and Frank Pfenning for their efforts in helping to develop the new introductory curriculum at Carnegie Mellon, and to the teaching assistants whose hard work and dedication put the ideas into practice.

Update: Unfortunately, the homework assignments for these courses are not publicly available, because we want to minimize the temptation for students to make use of old assignments and solutions in preparing their own work.  I am working with my colleagues to find some way in which we can promote the ideas without sacrificing too much of the integrity of the course materials.  I apologize for the inconvenience.

## Church’s Law

August 9, 2012

A new feature of this year’s summer school was a reduction in the number of lectures, and an addition of daily open problem sessions for reviewing the day’s material. This turned out to be a great idea for everyone, because it gave us more informal time together, and gave the students a better chance at digesting a mountain of material. It also turned out to be a bit of an embarrassment for me, because I posed a question off the top of my head for which I thought I had two proofs, neither of which turned out to be valid. The claimed theorem is, in fact, true, and one of my proofs is easily corrected to resolve the matter (the other, curiously, remains irredeemable for reasons I’ll explain shortly). The whole episode is rather interesting, so let me recount a version of it here for your enjoyment.

The context of the discussion was extensional type theory, or ETT, which is characterized by an identification of judgemental with propositional equality: if you can prove that two objects are equal,then they are interchangeable for all purposes without specific arrangement. The alternative, intensional type theory,or ITT, regards judgemental equality as definitional equality (symbolic evaluation), and gives computational meaning to proofs of equality of objects of a type, allowing in particular transport across two instances of a family whose indices are equal. NuPRL is an example of an ETT; CiC is an example of an ITT.

Within the framework of ETT, the principle of function extensionality comes “for free”, because you can prove it to hold within the theory. Function extensionality states that $f=g:A\to B$ whenever $x:A\vdash f(x)=g(x):B$. That is, two functions are if they are equal on all arguments (and, implicitly, respect equality of arguments). Function extensionality holds definitionally if your definitional equivalence includes the $\eta$ and $\xi$ rules, but in any case does not have the same force as extensional equality. Function extensionality as a principle of equality cannot be derived in ITT, but must be added as an additional postulate (or derived from a stronger postulate, such as univalence or the existence of a one-dimensional interval type).

Regardless of whether we are working in an extensional or an intensional theory, it is easy to see that all functions of type $N\to N$ definable in type theory are computable. For example, we may show that all such functions may be encoded as recursive functions in the sense of Kleene, or in a more modern formulation we may give a structural operational semantics that provides a deterministic execution model for such functions (given $n:N$, run $f:N\to N$ on $n$ until it stops, and yield that as result). Of course the proof relies on some fairly involved meta-theory, but it is all constructively valid (in an informal sense) and hence provides a legitimate computational interpretation of the theory. Another way to say the same thing is to say that the comprehension principles of type theory are such that every object deemed to exist has a well-defined computational meaning, so it follows that all functions defined within it are going to be computable.

This is all just another instance of Church’s Law, the scientific law stating that any formalism for defining computable functions will turn out to be equivalent to, say, the λ-calculus when it comes to definability of number-theoretic functions. (Ordinarily Church’s Law is called Church’s Thesis, but for reasons given in my Practical Foundations book, I prefer to give it the full status of a scientific law.) Type theory is, in this respect, no better than any other formalism for defining computable functions. By now we have such faith in Church’s Law that this remark is completely unsurprising, even boring to state explicitly.

So it may come as a surprise to learn that Church’s Law is false. I’m being provocative here, so let me explain what I mean before I get flamed to death on the internet.   The point I wish to make is that there is an important distinction between the external and the internal properties of a theory. For example, in first-order logic the Löwenheim-Skolem Theorem tells us that if a first-order theory has an infinite model, then it has a countable model. This implies that, externally to ZF set theory, there are only countably many sets, even though internally to ZF set theory we can carry out Cantor’s argument to show that the powerset operation takes us to exponentially higher cardinalities far beyond the countable. One may say that the “reason” is that the evidence for the countability of sets is a bijection that is not definable within the theory, so that it cannot “understand” its own limitations. This is a good thing.

The situation with Church’s Law in type theory is similar. Externally we know that every function on the natural numbers is computable. But what about internally? The internal statement of Church’s Law is this: $\Pi f:N\to N.\Sigma n:N. n\Vdash f$, where the notation $n\Vdash f$ means, informally, that $n$ is the code of a program that, when executed on input $m:N$, evaluates to $f(m)$. In Kleene’s original notation this would be rendered as $\Pi m:N.\Sigma p:N.T(n,m,p)\wedge Id(U(p),f(m))$, where the $T$ predicate encodes the operational semantics, and the $U$ predicate extracts the answer from a successful computation. Note that the expansion makes use of the identity type at the type $N$. The claim is that Church’s Law, stated as a type (proposition) within ETT, is false, which is to say that it entails a contradiction.

When I posed this as an exercise at the summer school, I had in mind two different proofs, which I will now sketch. Neither is valid, but there is a valid proof that I’ll come to afterwards.

Both proofs begin by applying the so-called Axiom of Choice. For those not familiar with type theory, the “axiom” of choice is in fact a theorem, stating that every total binary relation contains a function. Explicitly,

$(\Pi x:A.\Sigma y:B.R(x,y)) \to \Sigma f:A\to B.\Pi x:A.R(x,f(x)).$

The function $f$ is the “choice function” that associates a witness to the totality of $R$ to each argument $x$. In the present case if we postulate Church’s Law, then by the axiom of choice we have

$\Sigma F:(N\to N)\to N.\Pi f:N\to N. F(f)\Vdash f$.

That is, the functional $F$ picks out, for each function $f$ in $N\to N$, a (code for a) program that witnesses the computability of $f$. This should already seem suspicious, because by function extensionality the functional $F$ must assign the same program to any two extensionally equal functions.

We may easily see that $F$ is injective, for if $F(f)$ is $F(g)$, then both track both $f$ and $g$, and hence $f$ and $g$ are (extensionally) equal. Thus we have an injection from $N\to N$ into $N$, which seems “impossible” … except that it is not! Let’s try the proof that this is impossible, and see where it breaks down. Suppose that $i:(N\to N)\to N$ is injective. Define $d(x)=i^{-1}(x)(x)+1$, and consider $d(i(d))=i^{-1}(i(d))(i(d))+1=d(i(d))+1$ so $0=1$ and we are done. Not so fast! Since $i$ is only injective, and not necessarily surjective, it is not clear how to define $i^{-1}$. The obvious idea is to send $x=i(f)$ to $f$, and any $x$ outside the image of $i$ to, say, the identity. But there is no reason to suppose that the image of $i$ is decidable, so the attempted definition breaks down. I hacked around with this for a while, trying to exploit properties of $F$ to repair the proof (rather than work with a general injection, focus on the specific functional $F$), but failed. Andrej Bauer pointed out to me, to my surprise, that there is a model of ETT (which he constructed) that contains an injection of $N\to N$ into $N$! So there is no possibility of rescuing this line of argument.

(Incidentally, we can show within ETT that there is no bijection between $N$ and $N\to N$, using surjectivity to rescue the proof attempt above. Curiously, Lawvere has shown that there can be no surjection from $N$ onto $N\to N$, but this does not seem to help in the present situation. This shows that the concept of countability is more subtle in the constructive setting than in the classical setting.)

But I had another argument in mind, so I was not worried. The functional $F$ provides a decision procedure for equality for the type $N\to N$: given $f,g:N\to N$, compare $F(f)$ with $F(g)$. Surely this is impossible! But one cannot prove within type theory that $\textrm{Id}_{N\to N}(-,-)$ is undecidable, because type theory is consistent with the law of the excluded middle, which states that every proposition is decidable. (Indeed, type theory proves that excluded middle is irrefutable for any particular proposition $P$: $\neg\neg(P\vee\neg P)$.) So this proof also fails!

At this point it started to seem as though Church’s Law could be independent of ETT, as startling as that sounds. For ITT it is more plausible: equality of functions is definitional, so one could imagine associating an index with each function without disrupting anything. But for ETT this seemed implausible to me. Andrej pointed me to a paper by Maietti and Sambin that states that Church’s Law is incompatible with function extensionality and choice. So there must be another proof that refutes Church’s Law, and indeed there is one based on the aforementioned decidability of function equivalence (but with a slightly different line of reasoning than the one I suggested).

First, note that we can use the equality test for functions in $N\to N$ to check for halting. Using the $T$ predicate described above, we can define a function that is constantly $0$ iff a given (code of a) program never halts on given input. We may then use the above-mentioned equality test to check for halting. So it suffices to show that the halting problem for (codes of) functions and inputs is not computable to complete the refutation of the internal form of Church’s Law.

Specifically, assume given $h:N\times N\to N$ that, given a code for a function and an input, yields $0$ or $1$ according to whether or not that function halts when applied to that input. Define $d:N\to N$ by $\lambda x:N.\neg h(x,x)$, the usual diagonal function. Now apply the functional $F$ obtained from Church’s Law using the Axiom of Choice to obtain $n=F(d)$, the code for the function $d$, and consider $h(n,n)$ to derive the needed contradiction. Notice that we have used Church’s Law here to obtain a code for the type-theoretic diagonal function, which is then passed to the halting tester in the usual way.

As you can see, the revised argument follows along lines similar to what I had originally envisioned (in the second version), but requires a bit more effort to push through the proof properly. (Incidentally, I don’t think the argument can be made to work in pure ITT, but perhaps it would go through for ITT enriched with function extensionality.)

Thus, Church’s Law is false internally to extensional type theory, even though it is evidently true externally for that theory. You can see the similarity to the situation in first-order logic described earlier. Even though all functions of type $N\to N$ are computable, type theory itself is not capable of recognizing this fact (at least, not in the extensional case). And this is a good thing, not a bad thing! The whole beauty of constructive mathematics lies in the fact that it is just mathematics, free of any self-conscious recognition that we are writing programs when proving theorems constructively. We never have to reason about machine indices or any such nonsense, we just do mathematics under the discipline of not assuming that every proposition is decidable. One benefit is that the same mathematics admits interpretation not only in terms of computability, but also in terms of continuity in topological spaces, establishing a deep connection between two seemingly disparate topics.

(Hat tip to Andrej Bauer for help in sorting all this out. Here’s a link to a talk and a paper about the construction of a model of ETT in which there is an injection from $N\to N$ to $N$.)

Update: word-smithing.

Follow