Yet Another Reason Not To Be Lazy Or Imperative

August 26, 2012

In an earlier post I argued that, contrary to much of the literature in the area, parallelism is all about efficiency, and has little or nothing to do with concurrency.  Concurrency is concerned with controlling non-determinism, which can arise in all sorts of situations having nothing to do with parallelism.  Process calculi, for example, are best viewed as expressing substructural composition of programs, and have very little to do with parallel computing.  (See my PFPL and Rob Simmons’ forthcoming Ph.D. dissertation for more on this perspective.)  Parallelism, on the other hand, is captured by analyzing the cost of a computation whose meaning is independent of its parallel execution.  A cost semantics specifies the abstract cost of a program that is validated by a provable implementation that transfers the abstract cost to a precise concrete cost on a particular platform.  The cost of parallel execution is neatly captured by the concept of a cost graph that captures the dynamic data dependencies among subcomputations.  Details such as the number of processors or the nature of the interconnect are factored into the provable implementation, which predicts the asymptotic behavior of a program on a hardware platform based on its cost graph.  One advantage of cost semantics for parallelism is that it is easy to teach freshmen how to write parallel programs; we’ve been doing this successfully for two years now, with little fuss or bother.

This summer Guy Blelloch and I began thinking about other characterizations of the complexity of programs besides the familiar abstractions of execution time and space requirements of a computation.  One important measure, introduced by Jeff Vitter, is called I/O Complexity.  It measures the efficiency of algorithms with respect to memory traffic, a very significant determiner of performance of programs.  The model is sufficiently abstract as to encompass several different interpretations of I/O complexity.  Basically, the model assumes an unbounded main memory in which all data is ultimately stored, and considers a cache of M=c\times B blocked into chunks of size B that provides quick access to main memory.  The complexity of algorithms is analyzed in terms of these parameters, under the assumption that in-cache accesses are cost-free, so that the only significant costs are those incurred by loading and flushing the cache.  You may interpret the abstract concepts of main memory and cache in the standard way as a two-level hierarchy representing, say, on- and off-chip memory access, or instead as representing a disk (or other storage medium) loaded into memory for processing.  The point is that the relative costs of processing cached versus uncached data is huge, and worth considering as a measure of the efficiency of an algorithm.

As usual in the algorithms world Vitter makes use of a low-level machine model in which to express and evaluate algorithms.  Using this model Vitter obtains a lower-bound for sorting in the I/O model, and a matching upper bound using a k-way merge sort, where k is chosen as a function of M and B (that is, it is not cache oblivious in the sense of Leiserson, et al.)  Although such models provide a concrete, and well-understood, basis for analyzing algorithms, we all know full well that programming at such a low-level is at best a tedious exercise.  Worse, machine models provide no foundation for composition of programs, the single most important characteristic of higher-level language models.  (Indeed, the purpose of types is to mediate composition of components; without types, you’re toast.)

The point of Guy’s and my work this summer is to adapt the I/O model to functional programs, avoiding the mess, bother, and futility of trying to work at the machine level.  You might think that it would be impossible to reason about the cache complexity of a functional program (especially if you’re under the impression that functional programming necessarily has something to do with Haskell, which it does not, though you may perhaps say that Haskell has something to do with functional programming).  Traditional algorithms work, particularly as concerns cache complexity, is extremely finicky about memory management in order to ensure that reasonable bounds are met, and you might reasonably suspect that it will ever be thus.  The point of our paper, however, is to show that the same asymptotic bounds obtained by Vitter in the I/O model may be met using purely functional programming, provided that the functional language is (a) non-lazy (of course), and (b) implemented properly (as we describe).

Specifically, we give a cost semantics for functional programs (in the paper, a fragment of ML) that takes account of the memory traffic engendered by evaluation, and a provable implementation that validates the cost semantics by describing how to implement it on a machine-like model.  The crux of the matter is to account for the cache effects that arise from maintaining a control stack during evaluation, even though the abstract semantics has no concept of a stack (it’s part of the implementation, and cannot be avoided).  The cost semantics makes explicit the reading and allocation of values in the store (using Felleisen, Morrisett, and H’s “Abstract Models of Memory Management”), and imposes enough structure on the store to capture the critical concept of locality that is required to ensure good cache (or I/O) behavior.  The abstract model is parameterized by M and B described above, but interpreted as representing the number of objects in the cache and the neighborhood of an object in memory (the objects that are allocated near it, and that are therefore fetched along with the object whenever the cache is loaded).

The provable implementation is given in two steps.  First, we show how to transfer the abstract cost assigned to a computation into the amount of memory traffic incurred on an abstract machine with an explicit control stack.  The key idea here is an amortization argument that allows us to obtain tight bounds on the overhead required to maintain the stack.  Second, we show how to implement the crucial read and allocate operations that underpin the abstract semantics and the abstract machine.  Here we rely on a competitive analysis, given by Sleator, et al., of the ideal cache model, and on an amortization of the cost of garbage collection in the style of Appel.  We also make use of an original (as far as I know) technique for implementing the control stack so as to avoid unnecessary interference with the data objects in cache.  The net result is that the cost semantics provides an accurate asymptotic analysis of the I/O complexity of a functional algorithm, provided that it is implemented in the manner we describe in the paper (which, in fact, is not far from standard implementation techniques, the only trick being how to manage the control stack properly).  We then use the model to derive bounds for several algorithms that are comparable to those obtained by Vitter using a low-level machine model.

The upshot of all of this is that we can reason about the I/O or cache complexity of functional algorithms, much as we can reason about the parallel complexity of functional algorithms, namely by using a cost semantics.  There is no need to drop down to a low-level machine model to get a handle on this important performance metric for your programs, provided, of course, that you’re not stuck with a lazy language (for those poor souls, there is no hope).


Believing in Computer Science

August 25, 2012

It’s not every day that I can say that I agree with Bertrand Meyer, but today is an exception. Meyer has written an opinion piece in the current issue of C.ACM about science funding that I think is worth amplifying. His main point is that funding agencies, principally the NSF and the ERC, are constantly pushing for “revolutionary” research, at the expense of “evolutionary” research. Yet we all (including the funding agencies) know full well that, in almost every case, real progress is made by making seemingly small advances on what is already known, and that whether a body of research is revolutionary or not can only be assessed with considerable hindsight. Meyer cites the example of Hoare’s formulation of his logic of programs, which was at the time a relatively small increment on Floyd’s method for proving properties of programs. For all his brilliance, Hoare didn’t just invent this stuff out of thin air, he built on and improved upon the work that had gone before, as of course have hundreds of others built on his in turn. This all goes without saying, or ought to, but as Meyer points out, we computer scientists are constantly bombarded with direct and indirect exhortations to abandon all that has gone before, and to make promises that no one can honestly keep.

Meyer’s rallying cry is for incrementalism. It’s a tough row to hoe. Who could possibly argue against fostering earth-shattering research that breaks new ground and summarily refutes all that has gone before? And who could possibly defend work that is obviously just another slice of the same salami, perhaps with a bit of mustard this time? And yet what he says is obviously true. Funding agencies routinely beg the very question under consideration by stipulating a priori that there is something wrong with a field, and that an entirely new approach is required. With all due respect to the people involved, I would say that calls such as these are both ill-informed and outrageously arrogant.

But where does this attitude come from? Meyer cites “market envy” as one particularly powerful influence. Funding agencies wish to see themselves as analogous to venture capitalists investing in the next big thing, losing track of the fundamental differences between basic research and product development. (We see this sort of nonsense all the time in national politics; there is always a constituency for the absurd proposition that a government should be run like a business, as if there were any similarity at all between the two. What they really mean is, turn the government’s money over to business, but that can’t be said too loudly for fear that people will catch on.)

Another influence, which Meyer doesn’t mention, seems to be a long-standing problem in computer science. It seems to me that many researchers who move into political and administrative roles are either bored with, or do not believe in, computer science as an academic discipline. Their own research area is, or maybe always was, boring, or has been obviated by technological developments or scientific advances. So they move into politics, perhaps carrying a sense that there is something wrong with the field that can only be corrected by radical surgery. They then demand that researchers do what they themselves never did in their own careers, kicking the ladder out from behind them.

From this somewhat contentious premise, much follows. The constant implication, stated and unstated, that CS research is not worth doing for its own sake. The emphasis on interdisciplinarity as an end in itself, rather than a means to an end. The emphasis on applications to other disciplines being more important than CS itself. The sense that “broader impacts” are far more important than the “innovative claims” (the actual work) in a research proposal.

Seen from this perspective, Meyer’s position is much more easily defensible. When funding agencies are talking about “breakthroughs” and “paradigm shifts”, what they really mean is “anything but computer science”. When Meyer talks about incrementalism, what he really means is “computer science is worth doing for its own sake”.

And I, for once, agree with him.

Polarity in Type Theory

August 25, 2012

There has recently arisen some misguided claims about a supposed opposition between functional and object-oriented programming.  The claims amount to a belated recognition of a fundamental structure in type theory first elucidated by Jean-Marc Andreoli, and developed in depth by Jean-Yves Girard in the context of logic, and by Paul Blain-Levy and Noam Zeilberger in the context of programming languages.  In keeping with the general principle of computational trinitarianism, the concept of polarization has meaning in proof theory, category theory, and type theory, a sure sign of its fundamental importance.

Polarization is not an issue of language design, it is an issue of type structure.  The main idea is that types may be classified as being positive or negative, with the positive being characterized by their structure and the negative being characterized by their behavior.  In a sufficiently rich type system one may consider, and make effective use of, both positive and negative types.  There is nothing remarkable or revolutionary about this, and, truly, there is nothing really new about it, other than the terminology.  But through the efforts of the above-mentioned researchers, and others, we have learned quite a lot about the importance of polarization in logic, languages, and semantics.  I find it particularly remarkable that Andreoli’s work on proof search turned out to also be of deep significance for programming languages.  This connection was developed and extended by Zeilberger, on whose dissertation I am basing this post.

The simplest and most direct way to illustrate the ideas is to consider the product type, which corresponds to conjunction in logic.  There are two possible ways that one can formulate the rules for the product type that from the point of view of inhabitation are completely equivalent, but from the point of view of computation are quite distinct.  Let us first state them as rules of logic, then equip these rules with proof terms so that we may study their operational behavior.  For the time being I will refer to these as Method 1 and Method 2, but after we examine them more carefully, we will find more descriptive names for them.

Method 1 of defining conjunction is perhaps the most familiar.  It consists of this introduction rule

\displaystyle\frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}

and the following two elimination rules

\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash A\;\textsf{true}}\qquad\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash B\;\textsf{true}}.

Method 2 of defining conjunction is only slightly different.  It consists of the same introduction

\displaystyle \frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}

and one elimination rule

\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true} \quad \Gamma,A\;\textsf{true},B\;\textsf{true}\vdash C\;\textsf{true}}{\Gamma\vdash C\;\textsf{true}}.

From a logical point of view the two formulations are interchangeable in that the rules of the one are admissible with respect to the rules of the other, given the usual structural properties of entailment, specifically reflexivity and transitivity.  However, one can discern a difference in “attitude” in the two formulations that will turn out to be a manifestation of the concept of polarity.

Method 1 is a formulation of the idea that a proof of a conjunction is anything that behaves conjunctively, which means that it supports the two elimination rules given in the definition.  There is no commitment to the internal structure of a proof, nor to the details of how projection operates; as long as there are projections, then we are satisfied that the connective is indeed conjunction.  We may consider that the elimination rules define the connective, and that the introduction rule is derived from that requirement.  Equivalently we may think of the proofs of conjunction as being coinductively defined to be as large as possible, as long as the projections are available.  Zeilberger calls this the pragmatist interpretation, following Count Basie’s principle, “if it sounds good, it is good.”

Method 2 is a direct formulation of the idea that the proofs of a conjunction are inductively defined to be as small as possible, as long as the introduction rule is valid.  Specifically, the single introduction rule may be understood as defining the structure of the sole form of proof of a conjunction, and the single elimination rule expresses the induction, or recursion, principle associated with that viewpoint.  Specifically, to reason from the fact that A\wedge B\;\textsf{true} to derive C\;\textsf{true}, it is enough to reason from the data that went into the proof of the conjunction to derive C\;\textsf{true}.  We may consider that the introduction rule defines the connective, and that the elimination rule is derived from that definition.  Zeilberger calls this the verificationist interpretation.

These two perspectives may be clarified by introducing proof terms, and the associated notions of reduction that give rise to a dynamics of proofs.

When reformulated with explicit proofs, the rules of Method 1 are the familiar rules for ordered pairs:

\displaystyle\frac{\Gamma\vdash M:A\quad\Gamma\vdash N:B}{\Gamma\vdash \langle M, N\rangle:A\wedge B}

\displaystyle\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{fst}(M):A}\qquad\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{snd}(M):B}.

The associated reduction rules specify that the elimination rules are post-inverse to the introduction rules:

\displaystyle\textsf{fst}(\langle M,N\rangle)\mapsto M \qquad \textsf{snd}(\langle M,N\rangle)\mapsto N.

In this formulation the proposition A\wedge B is often written A\times B, since it behaves like a Cartesian product of proofs.

When formulated with explicit proofs, Method 2 looks like this:

\displaystyle \frac{\Gamma\vdash M:A\quad\Gamma\vdash M:B}{\Gamma\vdash M\otimes N:A\wedge B}

\displaystyle\frac{\Gamma\vdash M:A\wedge B \quad \Gamma,x:A,y:B\vdash N:C}{\Gamma\vdash \textsf{split}(M;x,y.N):C}

with the reduction rule

\displaystyle\textsf{split}(M\otimes N;x,y.P)\mapsto [M,N/x,y]P.

With this formulation it is natural to write A\wedge B as A\otimes B, since it behaves like a tensor product of proofs.

Since the two formulations of “conjunction” have different internal structure, we may consider them as two different connectives.  This may, at first, seem pointless, because it is easily seen that x:A\times B\vdash M:A\otimes B for some M and that x:A\otimes B\vdash N:A\times B for some N, so that the two connectives are logically equivalent, and hence interchangeable in any proof.  But there is nevertheless a reason to draw the distinction, namely that they have different dynamics.

It is easy to see why.  From the pragmatic perspective, since the projections act independently of one another, there is no reason to insist that the components of a pair be evaluated before they are used.  Quite possibly we may only ever project the first component, so why bother with the second?  From the verificationist perspective, however, we are pattern matching against the proof of the conjunction, and are demanding both components at once, so it makes sense to evaluate both components of a pair in anticipation of future pattern matching.  (Admittedly, in a structural type theory one may immediately drop one of the variables on the floor and never use it, but then why give it a name at all?  In a substructural type theory such as linear type theory, this is not a possibility, and the interpretation is forced.)  Thus, the verficationist formulation corresponds to eager evaluation of pairing, and the pragmatist formulation to lazy evaluation of pairing.

Having distinguished the two forms of conjunction by their operational behavior, it is immediately clear that both forms are useful, and are by no means opposed to one another.  This is why, for example, the concept of a lazy language makes no sense, rather one should instead speak of lazy types, which are perfectly useful, but by no means the only types one should ever consider.  Similarly, the concept of an object-oriented language seems misguided, because it emphasizes the pragmatist conception, to the exclusion of the verificationist, by insisting that everything be an object characterized by its methods.

More broadly, it is useful to classify types into two polarities, the positive and the negative, corresponding to the verificationist and pragmatist perspectives.  Positive types are inductively defined by their introduction forms; they correspond to colimits, or direct limits, in category theory.  Negative types are coinductively defined by their elimination forms; they correspond to limits, or inverse limits, in category theory.  The concept of polarity is intimately related to the concept of focusing, which in logic sharpens the concept of a cut-free proof and elucidates the distinction between synchronous and asynchronous connectives, and which in programming languages provides an elegant account of pattern matching, continuations, and effects.

As ever, enduring principles emerge from the interplay between proof theory, category theory, and type theory.  Such concepts are found in nature, and do not depend on cults of personality or the fads of the computer industry for their existence or importance.

Update: word-smithing.

Intro Curriculum Update

August 17, 2012

In previous posts I have talked about the new introductory CS curriculum under development at Carnegie Mellon. After a year or so of planning, we began to roll out the new curriculum in the Spring of 2011, and have by now completed the transition. As mentioned previously, the main purpose is to bring the introductory sequence up to date, with particular emphasis on introducing parallelism and verification. A secondary purpose was to restore the focus on computing fundamentals, and correct the drift towards complex application frameworks that offer the students little sense of what is really going on. (The poster child was a star student who admitted that, although she had built a web crawler the previous semester, she in fact has no idea how to build a web crawler.) A particular problem is that what should have been a grounding in the fundamentals of algorithms and data structures turned into an exercise in object-oriented programming, swamping the core content with piles of methodology of dubious value to beginning students. (There is a new, separate, upper-division course on oo methodology for students interested in this topic.) A third purpose was to level the playing field, so that students who had learned about programming on the street were equally as challenged, if not more so, than students without much or any such experience. One consequence would be to reduce the concomitant bias against women entering CS, many fewer of whom having prior computing experience than the men.

The solution was a complete do-over, jettisoning the traditional course completely, and starting from scratch. The most important decision was to emphasize functional programming right from the start, and to build on this foundation for teaching data structures and algorithms. Not only does FP provide a much more natural starting point for teaching programming, it is infinitely more amenable to rigorous verification, and provides a natural model for parallel computation. Every student comes to university knowing some algebra, and they are therefore familiar with the idea of computing by calculation (after all, the word algebra derives from the Arabic al jabr, meaning system of calculation). Functional programming is a generalization of algebra, with a richer variety of data structures and a richer set of primitives, so we can build on that foundation. It is critically important that variables in FP are, in fact, mathematical variables, and not some distorted facsimile thereof, so all of their mathematical intuitions are directly applicable. So we can immediately begin discussing verification as a natural part of programming, using principles such as mathematical induction and equational reasoning to guide their thinking. Moreover, there are natural concepts of sequential time complexity, given by the number of steps required to calculate an answer, and parallel time complexity, given by the data dependencies in a computation (often made manifest by the occurrences of variables). These central concepts are introduced in the first week, and amplified throughout the semester.

Two major homework exercises embody the high points of the first-semester course, one to do with co-development of code with proof, the other to do with parallelism. Co-development of program and proof is illustrated by an online regular expression matcher. The problem is a gem for several reasons. One is that it is essentially impossible for anyone to solve by debugging a blank screen. This sets us up nicely for explaining the importance of specification and proof as part of the development process. Another is that it is very easy, almost inevitable, for students to make mistakes that are not easily caught or diagnosed by testing. We require the students to carry out a proof of the correctness of the matcher, and in the process discover a point where the proof breaks down, which then leads to a proper solution. (See my JFP paper “Proof-Directed Debugging” for a detailed development of the main ideas.) The matcher also provides a very nice lesson in the use of higher-order functions to capture patterns of control, resulting in an extremely clean and simple matcher whose correctness proof is highly non-trivial.

The main parallelism example is the Barnes-Hut algorithm for solving the n-body problem in physics. Barnes-Hut is an example of a “tree-based” method, invented by Andrew Appel, for solving this well-known problem. At a high level the main idea is to decompose space into octants (or quadrants if you’re working in the plane), recursively solving the problem for each octant and then combining the solutions to make an overall solution. The key idea is to use an approximation for bodies that are far enough away—a distant constellation can be regarded as an atomic body for the purposes of calculating the effects of its stars on the sun, say. The problem is naturally parallelizable, because of the recursive decomposition. Moreover, it provides a very nice link to their high school physics. Since FP is just an extension of mathematics, the equations specifying the force law and Newton’s Law carry over directly to the code. This is an important sense in which FP builds on and amplifies their prior mathematical experience, and shows how one may connect computer science with other sciences in a rather direct manner.

The introductory FP course establishes the basis for the new data structures and algorithms course that most students take in either the third or fourth semester. This course represents a radical departure from tradition in several ways. First, it is a highly rigorous course in algorithms that rivals the upper-division course taught at most universities (including our own) for the depth and breadth of ideas it develops. Second, it takes the stance that all algorithms are parallel algorithms, with sequential being but a special case of parallel. Of course some algorithms have a better parallelizability ratio (a precise technical characterization of the ability to make use of parallelism), and this can be greatly affected by data structure selection, among other considerations. Third, the emphasis is on persistent abstract types, which are indispensable for parallel computing. No more linked lists, no more null pointer nonsense, no more mutation. Instead we consider abstract types of graphs, trees, heaps, and so forth, all with a persistent semantics (operations create “new” ones, rather than modify “old” ones). Fourth, we build on the reasoning methods introduced in the first semester course to prove the correctness and the efficiency of algorithms. Functional programming makes all of this possible. Programs are concise and amenable to proof, they are naturally parallel, and they enjoy powerful typing properties that enforce abstraction in a mathematically rigorous manner. Fifth, there is a strong emphasis on problems of practical interest. For example, homework 1 is the shotgun method for genome sequencing, a parallel algorithm of considerable practical importance and renown.

There is a third introductory course in imperative programming, taken in either the first or second semester (alternating with the functional programming course at the student’s discretion), that teaches the “old ways” of doing things using a “safe” subset of C. Personally, I think this style of programming is obsolete, but there are many good reasons to continue to teach it, the biggest probably being the legacy problem. The emphasis is on verification, using simple assertions that are checked at run-time and which may be verified statically in some situations. It is here that students learn how to do things “the hard way” using low-level representations. This course is very much in the style of the 1970’s era data structures course, the main difference being that the current incarnation of Pascal has curly braces, rather than begin-end.

For the sake of those readers who may not be up to date on such topics, it seems important to emphasize that functional programming subsumes imperative programming. Every functional language is capable of expressing the old-fashioned sequential pointer-hacking implementation of data structures. You can even reproduce Tony Hoare’s self-described “billion dollar mistake” of the cursed “null pointer” if you’d like! But the whole point is that it is rarely useful, and almost never elegant, to work that way. (Curiously, the “monad mania” in the Haskell community stresses an imperative, sequential style of programming that is incompatible with parallelism, but this is not forced on you as it is in the imperative world.) From this point of view there no loss, and considerable gain, in teaching functional programming from the start. It puts a proper emphasis on mathematically sane programming methods, but allows for mutation-based programming where appropriate (for example, in engendering “side effects” on users).

To learn more about these courses, please see the course web pages:

I encourage readers to review the syllabi and course materials. There is quite a large body of material in place that we plan to expand into textbook-level treatments in the near future. We also plan to write a journal article documenting our experiences with these courses.

I am very grateful to my colleagues Guy Blelloch, Dan Licata, and Frank Pfenning for their efforts in helping to develop the new introductory curriculum at Carnegie Mellon, and to the teaching assistants whose hard work and dedication put the ideas into practice.

Update: Unfortunately, the homework assignments for these courses are not publicly available, because we want to minimize the temptation for students to make use of old assignments and solutions in preparing their own work.  I am working with my colleagues to find some way in which we can promote the ideas without sacrificing too much of the integrity of the course materials.  I apologize for the inconvenience.

Haskell Is Exceptionally Unsafe

August 14, 2012

It is well known that Haskell is not type safe.  The most blatant violation is the sometimes-necessary, and aptly named, unsafePerformIO operation.  You are enjoined not to use this in an unsafe manner, and must be careful to ensure that the encapsulated computation may be executed at any time because of the inherent unpredictability of lazy evaluation.  (The analogous operation in monadic ML, safePerformIO, is safe, because of the value restriction on polymorphism.)  A less blatant violation is that the equational theory of the language with seq is different from the equational theory without it, and the last I knew the GHC compiler was willing to make transformations that are valid only in the absence of this construct.  This too is well known.  A proper reformulation of the equational theory was given by Patricia Johann a few years ago as a step towards solving it.  (If the GHC compiler is no longer unsafe in this respect, it would be good to know.)

I’ve asked around a little bit, including some of the Haskell insiders, whether it is equally well known that the typed exception mechanism in Haskell is unsound.  From what I can tell this seems not to be as well-understood, so let me explain the situation as I see it, and I am sure I will be quickly corrected if I am wrong.  The starting point for me was the realization that in Haskell pure code (outside of the IO monad) may raise an exception, and, importantly for my point, the exceptions are user-defined.  Based on general semantic considerations, it seemed to me that this cannot possibly be sound, and Karl Crary helped me to isolate the source of the problem.

The difficulty is really nothing to do with exceptions per se, but rather with exception values.  (So my point has nothing whatsoever to do with imprecise exceptions.)  It seems to me that the root cause is an all-too-common misunderstanding of the concept of typed exceptions as they occur, for example, in Standard ML.  The mistake is a familiar one, the confusion of types with classes; it arises often in discussions related to oop.  To clarify the situation let me begin with a few remarks exception values in ML, and then move on to the issue at hand.

The first point, which is not particularly relevant to the present discussion, is that exceptions have nothing to do with dynamic binding.  The full details are in my book, so I will only summarize the situation here.  Many seem to have the idea that an exception handler, which typically looks like a sequence of clauses consisting of an exception and an action, amounts to a dynamic binding of each exception to its associated handler.  This is not so.  In fact the left-hand side of an exception clause is a pattern, including a variable (of which a wild card is one example), or nested patterns of exceptions and values.  I realize that one may implement the exception mechanism using a single dynamically bound symbol holding the current exception handler, but this implementation is but one of many possible ones, and does not in any case define the abstraction of exceptions.  Exceptions have no more to do with dynamic binding than does the familiar if-then-else available in any language.

The second point, which is pertinent to this discussion, is that exceptions have only one type.  You read that right: the typed exception mechanism in Standard ML is, in fact, uni-typed (where I have I heard that before?).  It is perfectly true that in Standard ML one may associate a value of any type you like with an exception, and this value may be communicated to the handler of that exception.  But it is perfectly false to say that there are multiple types of exceptions; there is only one, but it has many (in fact, “dynamically many”) classes.  When you declare, say, exception Error of string in Standard ML you are introducing a new class of type string->exn, so that Error s is an exception value of type exn carrying a string.  An exception handler associates to a computation of type α a function of type exn->α that handles any exceptions thrown by that computation.  To propagate uncaught exceptions, the handler is  implicitly post-composed with the handler fn x => raise x.  Pattern matching recovers the type of the associated value in the branch that handles a particular exception.  So, for example, a handler of the form Error x => exp propagates the fact that x has type string into the expression exp, and similarly for any other exception carrying any other type.  (Incidentally, another perfectly good handler is Error “abcdef” => exp, which handles only an Error exception with associated value “abcdef”, and no other.  This debunks the dynamic binding interpretation mentioned earlier.)

The third point is that exception classes are dynamically generated in the sense that each evaluation of an exception declaration generates a fresh exception.  This is absolutely essential for modularity, and is extremely useful for a wide variety of other purposes, including managing information flow security within a program. (See Chapter 34 of my book for a fuller discussion.)  O’Caml attempted to impose a static form of exceptions, but it is now recognized that this does not work properly in the presence of functors or separate compilation.  This means that exception declarations are inherently stateful and cannot be regarded as pure.

This got me to wondering how Haskell could get away with user-defined typed exceptions in “pure” code.  The answer seems to be “it can’t”, as the following example illustrates:

import Control.Exception

import Data.Typeable

newtype Foo = Foo (() -> IO ())

{- set Foo’s TypeRep to be the same as ErrorCall’s -}

instance Typeable Foo where

  typeOf _ = typeOf (undefined :: ErrorCall)

instance Show Foo where  show _ = “”

instance Exception Foo

main = Control.Exception.catch (error “kaboom”) (\ (Foo f) -> f ())

If you run this code in GHC, you get “internal error: stg_ap_v_ret”, which amounts to “going wrong.”  The use of exceptions is really incidental, except insofar as it forces the use of the class Typeable, which is exploited here.

How are we to understand this unsoundness?  My own diagnosis is that typed exceptions are mis-implemented in Haskell using types, rather than classes.  There is no need in ML for any form of type casting to implement exceptions, because there is exactly one type of exception values, albeit one with many classes.  Haskell, on the other hand, tries to have exceptions with different types.  This immediately involves us in type casting, and hilarity ensues.

The problem appears to be difficult to fix, because to do so would require that exception values be declared within the IO monad, which runs against the design principle (unavoidable, in my opinion) that exceptions are permissible in pure code.  (Exceptions are, according to Aleks Nanevski, co-monadic, not monadic.)  Alternatively, since Haskell lacks modularity, a possible fix is to permit only static exceptions in essentially the style proposed in O’Caml.  This amounts to collecting up the exception declarations on a whole-program basis, and implicitly making a global data declaration that declares all the exception constructors “up front”.  Exception handlers would then work by pattern matching, and all would be well.  Why this is not done already is a mystery to me; the current strategy looks a lot like a kludge once you see the problem with safety.

Insofar as my interpretation of what is going on here is correct, the example once again calls into question the dogma of the separation of Church from state as it is realized in Haskell.  As beguiling as it is, the idea, in its current form, simply does not work.  Dave MacQueen recently described it to me as a “tar baby”, from the Brer Rabbit story: the more you get involved with it, the more entangled you become.

Update: minor re-wording.

Extensionality, Intensionality, and Brouwer’s Dictum

August 11, 2012

There seems to be a popular misunderstanding about the propositions-as-types principle that has led some to believe that intensional type theory (ITT) is somehow preferable to or more sensible than extensional type theory (ETT).  Now, as a practical matter, few would dispute that ETT is much easier to use than ITT for mechanizing everyday mathematics.  Some justification for this will be given below, but I am mainly concerned with matters of principle.  Specifically, I wish to dispute the claim that t ETT is somehow “wrong” compared to ITT.  The root of the problem appears to be a misunderstanding of the fundamental ideas of intuitionism, which are expressed by the proposition-as-types principle.

The most popular conception appears to be the trivial one, namely that certain inductively defined formal systems of logic correspond syntactically to certain inductively defined formal systems of typing.  Such correspondences are not terribly interesting, because they can easily be made to hold by construction: all you need to do is to introduce proof terms that summarize a derivation, and then note that the proofs of a proposition correspond to the terms of the associated type.  In this form the propositions-as-types principle is often dubbed, rather grandly, the Curry-Howard Isomorphism.  It’s a truism that most things in mathematics are named after anyone but their discoverers, and that goes double in this case.  Neither Curry nor Howard discovered the principle (Howard himself disclaims credit for it), though they both did make contributions to it.  Moreover, this unfortunate name deprives credit to those who did the real work in inventing the concept, including Brouwer, Heyting, Kolmogorov, deBruijn, and Martin-Löf.  (Indeed, it is sometimes called the BHK Correspondence, which is both more accurate and less grandiose.)  Worse, there is an “isomorphism” only in the most trivial sense of an identity by definition, hardly worth emphasizing.

The interesting conception of the propositions-as-types principle is what I call Brouwer’s Dictum, which states that all of mathematics, including the concept of a proof, is to be derived from the concept of a construction, a computation classified by a type.  In intuitionistic mathematics proofs are themselves “first-class” mathematical objects that inhabit types that may as well be identified with the proposition that they prove.  Proving a proposition is no different than constructing a program of a type.  In this sense logic is a branch of mathematics, the branch concerned with those constructions that are proofs.  And mathematics is itself a branch of computer science, since according to Brouwer’s Dictum all of mathematics is to be based on the concept of computation.  But notice as well that there are many more constructions than those that correspond to proofs.  Numbers, for example, are perhaps the most basic ones, as would be any inductive or coinductive types, or even more exotic objects such as Brouwer’s own choice sequences.  From this point of view the judgement M\in A stating that M is a construction of type A is of fundamental importance, since it encompasses not only the formation of “ordinary” mathematical constructions, but also those that are distinctively intuitionistic, namely mathematical proofs.

An often misunderstood point that must be clarified before we continue is that the concept of proof in intuitionism is not to be identified with the concept of a formal proof in a fixed formal system.  What constitutes a proof of a proposition is a judgement, and there is no reason to suppose a priori that this judgement ought to be decidable.  It should be possible to recognize a proof when we see one, but it is not required that we be able to rule out what is a proof in all cases.  In contrast formal proofs are inductively defined and hence fully circumscribed, and we expect it to be decidable whether or not a purported formal proof is in fact a formal proof, that is whether it is well-formed according to the given inductively defined rules.  But the upshot of Gödel’s Theorem is that as soon as we fix the concept of formal proof, it is immediate that it is not an adequate conception of proof simpliciter, because there are propositions that are true, which is to say have a proof, but have no formal proof according to the given rules.  The concept of truth, even in the intuitionistic setting, eludes formalization, and it will ever be thus.  Putting all this another way, according to the intuitionistic viewpoint (and the mathematical practices that it codifies), there is no truth other than that given by proof.  Yet the rules of proof cannot be given in decidable form without missing the point.

It is for this reason that the first sense of the propositions-as-types principle discussed above is uninteresting, for it only ever codifies a decidable, and hence incomplete, conception of proof.  Moreover, the emphasis on an isomorphism between propositions and types also misses the point, because it fails to account for the many forms of type that do not correspond to propositions.  The formal correspondence is useful in some circumstances, namely those in which the object of study is a formal system.  So, for example, in LF the goal is to encode formal systems, and hence it is essential in the LF methodology that type checking be decidable.  But when one is talking about a general theory of computation, which is to say a general theory of mathematical constructions, there is no reason to expect either an isomorphism or decidability.  (So please stop referring to propositions-as-types as “the Curry-Howard Isomorphism”!)

We are now in a position to discuss the relationship between ITT and ETT, and to correct the misconception that ETT is somehow “wrong” because the typing judgement is not decidable.  The best way to understand the proper relationship between the two is to place them into the broader context of homotopy type theory, or HTT.  From the point of view of homotopy type theory ITT and ETT represent extremal points along a spectrum of type theories, which is to say a spectrum of conceptions of mathematical construction in Brouwer’s sense.  Extensional type theory is the theory of homotopy sets, or hSets for short, which are spaces that are homotopically discrete, meaning that the only path (evidence for equivalence) of two elements is in fact the trivial self-loop between an element and itself.  Therefore if we have a path between x and y in A, which is to say a proof that they are equivalent, then x and y are equal, and hence interchangeable in all contexts.  The bulk of everyday mathematics takes place within the universe of hSets, and hence is most appropriately expressed within ETT, and experience has born this out.  But it is also interesting to step outside of this framework and consider richer conceptions of type.

For example, as soon as we introduce universes, one is immediately confronted with the need to admit types that are not hSets.  A universe of hSets naturally includes non-trivial paths between elements witnessing their isomorphism as hSets, and hence their interchangeability in all contexts.  Taking a single universe of hSets as the sole source of such additional structure leads to (univalent) two-dimensional type theory.  In this terminology ETT is then to be considered as one-dimensional type theory.  Universes are not the only source of higher dimensionality.  For example, the interval has two elements, 0 and 1 connected by a path, the segment between them, which may be seen as evidence for their interchangeability (we can slide them along the segment one to the other).  Similarly, the circle S^1 is a two-dimensional inductively defined type with one element, a base point, and one path, a non-reflexive self-loop from the base point to itself.  It is now obvious that one may consider three-dimensional type theory, featuring types such as S^2, the sphere, and so forth.  Continuing this through all finite dimensions, we obtain finite-dimensional type theory, which is just ITT (type theory with no discreteness at any dimension).

From this perspective one can see more clearly why it has proved so awkward to formalize everyday mathematics in ITT.  Most such work takes place in the universe of hSets, and makes no use of higher-dimensional structure.  The natural setting for such things is therefore ETT, the theory of types as homotopically discrete sets.  By formalizing such mathematics within ITT one is paying the full cost of higher-dimensionality without enjoying any of its benefits.  This neatly confirms experience with using NuPRL as compared to using Coq for formulating the mathematics of homotopy sets, and why even die-hard ITT partisans find themselves wanting to switch to ETT for doing real work (certain ideological commitments notwithstanding).  On the other hand, as higher-dimensional structure becomes more important to the work we are doing, something other than ETT is required.  One candidate is a formulation of type theory with explicit levels, representing the dimensionality restriction appropriate to the problem domain.  So work with discrete sets would take place within level 1, which is just extensional type theory.  Level 2 is two-dimensional type theory, and so forth, and the union of all finite levels is something like ITT.  To make this work requires that there be a theory of cumulativity of levels, a theory of resizing that allows us to move work at a higher level to a lower level at which it still makes sense, and a theory of truncation that allows suppression of higher-dimensional structure (generalizing proof irrelevance and “squash” types).

However this may turn out, it is clear that the resulting type theory will be far richer than merely the codification of the formal proofs of some logical system.  Types such as the geometric spaces mentioned above do not arise as the types of proofs of propositions, but rather are among the most basic of mathematical constructions, in complete accordance with Brouwer’s dictum.

Church’s Law

August 9, 2012

A new feature of this year’s summer school was a reduction in the number of lectures, and an addition of daily open problem sessions for reviewing the day’s material. This turned out to be a great idea for everyone, because it gave us more informal time together, and gave the students a better chance at digesting a mountain of material. It also turned out to be a bit of an embarrassment for me, because I posed a question off the top of my head for which I thought I had two proofs, neither of which turned out to be valid. The claimed theorem is, in fact, true, and one of my proofs is easily corrected to resolve the matter (the other, curiously, remains irredeemable for reasons I’ll explain shortly). The whole episode is rather interesting, so let me recount a version of it here for your enjoyment.

The context of the discussion was extensional type theory, or ETT, which is characterized by an identification of judgemental with propositional equality: if you can prove that two objects are equal,then they are interchangeable for all purposes without specific arrangement. The alternative, intensional type theory,or ITT, regards judgemental equality as definitional equality (symbolic evaluation), and gives computational meaning to proofs of equality of objects of a type, allowing in particular transport across two instances of a family whose indices are equal. NuPRL is an example of an ETT; CiC is an example of an ITT.

Within the framework of ETT, the principle of function extensionality comes “for free”, because you can prove it to hold within the theory. Function extensionality states that f=g:A\to B whenever x:A\vdash f(x)=g(x):B. That is, two functions are if they are equal on all arguments (and, implicitly, respect equality of arguments). Function extensionality holds definitionally if your definitional equivalence includes the \eta and \xi rules, but in any case does not have the same force as extensional equality. Function extensionality as a principle of equality cannot be derived in ITT, but must be added as an additional postulate (or derived from a stronger postulate, such as univalence or the existence of a one-dimensional interval type).

Regardless of whether we are working in an extensional or an intensional theory, it is easy to see that all functions of type N\to N definable in type theory are computable. For example, we may show that all such functions may be encoded as recursive functions in the sense of Kleene, or in a more modern formulation we may give a structural operational semantics that provides a deterministic execution model for such functions (given n:N, run f:N\to N on n until it stops, and yield that as result). Of course the proof relies on some fairly involved meta-theory, but it is all constructively valid (in an informal sense) and hence provides a legitimate computational interpretation of the theory. Another way to say the same thing is to say that the comprehension principles of type theory are such that every object deemed to exist has a well-defined computational meaning, so it follows that all functions defined within it are going to be computable.

This is all just another instance of Church’s Law, the scientific law stating that any formalism for defining computable functions will turn out to be equivalent to, say, the λ-calculus when it comes to definability of number-theoretic functions. (Ordinarily Church’s Law is called Church’s Thesis, but for reasons given in my Practical Foundations book, I prefer to give it the full status of a scientific law.) Type theory is, in this respect, no better than any other formalism for defining computable functions. By now we have such faith in Church’s Law that this remark is completely unsurprising, even boring to state explicitly.

So it may come as a surprise to learn that Church’s Law is false. I’m being provocative here, so let me explain what I mean before I get flamed to death on the internet.   The point I wish to make is that there is an important distinction between the external and the internal properties of a theory. For example, in first-order logic the Löwenheim-Skolem Theorem tells us that if a first-order theory has an infinite model, then it has a countable model. This implies that, externally to ZF set theory, there are only countably many sets, even though internally to ZF set theory we can carry out Cantor’s argument to show that the powerset operation takes us to exponentially higher cardinalities far beyond the countable. One may say that the “reason” is that the evidence for the countability of sets is a bijection that is not definable within the theory, so that it cannot “understand” its own limitations. This is a good thing.

The situation with Church’s Law in type theory is similar. Externally we know that every function on the natural numbers is computable. But what about internally? The internal statement of Church’s Law is this: \Pi f:N\to N.\Sigma n:N. n\Vdash f, where the notation n\Vdash f means, informally, that n is the code of a program that, when executed on input m:N, evaluates to f(m). In Kleene’s original notation this would be rendered as \Pi m:N.\Sigma p:N.T(n,m,p)\wedge Id(U(p),f(m)), where the T predicate encodes the operational semantics, and the U predicate extracts the answer from a successful computation. Note that the expansion makes use of the identity type at the type N. The claim is that Church’s Law, stated as a type (proposition) within ETT, is false, which is to say that it entails a contradiction.

When I posed this as an exercise at the summer school, I had in mind two different proofs, which I will now sketch. Neither is valid, but there is a valid proof that I’ll come to afterwards.

Both proofs begin by applying the so-called Axiom of Choice. For those not familiar with type theory, the “axiom” of choice is in fact a theorem, stating that every total binary relation contains a function. Explicitly,

(\Pi x:A.\Sigma y:B.R(x,y)) \to \Sigma f:A\to B.\Pi x:A.R(x,f(x)).

The function f is the “choice function” that associates a witness to the totality of R to each argument x. In the present case if we postulate Church’s Law, then by the axiom of choice we have

\Sigma F:(N\to N)\to N.\Pi f:N\to N. F(f)\Vdash f.

That is, the functional F picks out, for each function f in N\to N, a (code for a) program that witnesses the computability of f. This should already seem suspicious, because by function extensionality the functional F must assign the same program to any two extensionally equal functions.

We may easily see that F is injective, for if F(f) is F(g), then both track both f and g, and hence f and g are (extensionally) equal. Thus we have an injection from N\to N into N, which seems “impossible” … except that it is not! Let’s try the proof that this is impossible, and see where it breaks down. Suppose that i:(N\to N)\to N is injective. Define d(x)=i^{-1}(x)(x)+1, and consider d(i(d))=i^{-1}(i(d))(i(d))+1=d(i(d))+1 so 0=1 and we are done. Not so fast! Since i is only injective, and not necessarily surjective, it is not clear how to define i^{-1}. The obvious idea is to send x=i(f) to f, and any x outside the image of i to, say, the identity. But there is no reason to suppose that the image of i is decidable, so the attempted definition breaks down. I hacked around with this for a while, trying to exploit properties of F to repair the proof (rather than work with a general injection, focus on the specific functional F), but failed. Andrej Bauer pointed out to me, to my surprise, that there is a model of ETT (which he constructed) that contains an injection of N\to N into N! So there is no possibility of rescuing this line of argument.

(Incidentally, we can show within ETT that there is no bijection between N and N\to N, using surjectivity to rescue the proof attempt above. Curiously, Lawvere has shown that there can be no surjection from N onto N\to N, but this does not seem to help in the present situation. This shows that the concept of countability is more subtle in the constructive setting than in the classical setting.)

But I had another argument in mind, so I was not worried. The functional F provides a decision procedure for equality for the type N\to N: given f,g:N\to N, compare F(f) with F(g). Surely this is impossible! But one cannot prove within type theory that \textrm{Id}_{N\to N}(-,-) is undecidable, because type theory is consistent with the law of the excluded middle, which states that every proposition is decidable. (Indeed, type theory proves that excluded middle is irrefutable for any particular proposition P: \neg\neg(P\vee\neg P).) So this proof also fails!

At this point it started to seem as though Church’s Law could be independent of ETT, as startling as that sounds. For ITT it is more plausible: equality of functions is definitional, so one could imagine associating an index with each function without disrupting anything. But for ETT this seemed implausible to me. Andrej pointed me to a paper by Maietti and Sambin that states that Church’s Law is incompatible with function extensionality and choice. So there must be another proof that refutes Church’s Law, and indeed there is one based on the aforementioned decidability of function equivalence (but with a slightly different line of reasoning than the one I suggested).

First, note that we can use the equality test for functions in N\to N to check for halting. Using the T predicate described above, we can define a function that is constantly 0 iff a given (code of a) program never halts on given input. We may then use the above-mentioned equality test to check for halting. So it suffices to show that the halting problem for (codes of) functions and inputs is not computable to complete the refutation of the internal form of Church’s Law.

Specifically, assume given h:N\times N\to N that, given a code for a function and an input, yields 0 or 1 according to whether or not that function halts when applied to that input. Define d:N\to N by \lambda x:N.\neg h(x,x), the usual diagonal function. Now apply the functional F obtained from Church’s Law using the Axiom of Choice to obtain n=F(d), the code for the function d, and consider h(n,n) to derive the needed contradiction. Notice that we have used Church’s Law here to obtain a code for the type-theoretic diagonal function, which is then passed to the halting tester in the usual way.

As you can see, the revised argument follows along lines similar to what I had originally envisioned (in the second version), but requires a bit more effort to push through the proof properly. (Incidentally, I don’t think the argument can be made to work in pure ITT, but perhaps it would go through for ITT enriched with function extensionality.)

Thus, Church’s Law is false internally to extensional type theory, even though it is evidently true externally for that theory. You can see the similarity to the situation in first-order logic described earlier. Even though all functions of type N\to N are computable, type theory itself is not capable of recognizing this fact (at least, not in the extensional case). And this is a good thing, not a bad thing! The whole beauty of constructive mathematics lies in the fact that it is just mathematics, free of any self-conscious recognition that we are writing programs when proving theorems constructively. We never have to reason about machine indices or any such nonsense, we just do mathematics under the discipline of not assuming that every proposition is decidable. One benefit is that the same mathematics admits interpretation not only in terms of computability, but also in terms of continuity in topological spaces, establishing a deep connection between two seemingly disparate topics.

(Hat tip to Andrej Bauer for help in sorting all this out. Here’s a link to a talk and a paper about the construction of a model of ETT in which there is an injection from N\to N to N.)

Update: word-smithing.