## Old neglected theorems are still theorems

March 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

$\begin{array}{rcl} \textit{gcd}(m,0) & = & m \\ \textit{gcd}(0,m) & = & m \\ \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\ \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m

The type of $\textit{gcd}$ defined in this manner has partial function type $(\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}$, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say $2^{2^n}$, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.

## What, If Anything, Is A Declarative Language?

July 18, 2013

Back in the 1980’s it was very fashionable to talk about “declarative” programming languages.  But to my mind there was never a clear definition of a “declarative language”, and hence no way to tell what is declarative and what is not.  Lacking any clear meaning, the term came to refer to the arbitrary conflation of functional with logic programming to such an extent that “functional-and-logic-programming” almost became a Germanic thing-in-itself (ding an sich).  Later, as the logic programming wave subsided, the term “declarative”,  like “object-oriented”, came to be an expression of approval, and then, mercifully, died out.

Or so I had thought.  Earlier this week I attended a thriller of an NSF-sponsored workshop on high-level programming models for parallelism, where I was surprised by the declarative zombie once again coming to eat our brains.  This got me to thinking, again, about whether the term has any useful meaning.  For what it’s worth, and perhaps to generate useful debate, here’re some things that I think people mean, and why I don’t think they mean very much.

1. “Declarative” means “high-level”.  This just seems to replace one vague term by another.
2. “Declarative” means “not imperative”.  But this flies in the face of reality.  Functional languages embrace and encompass imperative programming as a special case, and even Prolog has imperative features, such as assert and retract, that have imperative meaning.
3. “Declarative” means “functional”.  OK, but then we don’t really need another word.
4. “Declarative” means “what, not how”.  But every language has an operational semantics that defines how to execute its programs, and you must be aware of that semantics to understand programs written in it.  Haskell has a definite evaluation order, just as much as ML has a different one, and even Prolog execution is defined by a clear operational semantics that determines the clause order and that can be influenced by “cut”.
5. “Declarative” means “equational”.  This does not distinguish anything, because there is a well-defined notion of equivalence for any programming language, namely observational equivalence.  Different languages induce different equivalences, of course, but how does one say that one equivalence is “better” than another?  At any rate, I know of no stress on equational properties of logic programs, so either logic programs are not “declarative” or “equational reasoning” is not their defining characteristic.
6. “Declarative” means “referentially transparent”.  The misappropriation of Quine’s terminology only confuses matters.  All I’ve been able to make of this is that “referentially transparent” means that beta-equivalence is valid.  But beta equivalence is not a property of an arbitrary programming language, nor in any case is it clear why this equivalence is first among equals.  In any case why you would decide a priori on what equivalences you want before you even know what it means to run a program?
7. “Declarative” means “has a denotation”.  This gets closer to the point, I think, because we might well say that a declarative semantics is one that gives meaning to programs as some kind of mapping between some sort of spaces.  In other words, it would be a synonym for “denotational semantics”.  But every language has a denotational semantics (perhaps by interpretation into a Kripke structure to impose sequencing), so having one does not seem to distinguish a useful class of languages.  Moreover, even in the case of purely functional programs, the usual denotational semantics (as continuous maps) is not fully abstract, and the fully abstract semantics (as games) is highly operational.  Perhaps a language is declarative in proportion to being able to give it semantics in some “familiar” mathematical setting?
8. “Declarative” means “implicitly parallelizable“.  This was certainly the sense intended at the NSF meeting, but no two “declarative” languages seemed to have much in common.  Charlie Garrod proposes just “implicit”, which is pretty much synonymous with “high level”, and may be the most precise sense there is to the term.

No doubt this list is not exhaustive, but I think it covers many of the most common interpretations.  It seems to me that none of them have a clear meaning or distinguish a well-defined class of languages.  Which leads me to ask, is there any such thing as a declarative programming language?

[Thanks to the late Steve Gould for inspiring the title of this post.]

[Update: wordsmithing.]

## Introductory FP Course Materials

September 15, 2012

The course materials for our first-semester introductory programming course (which I’ve discussed elsewhere on this blog) are now available here.

The course materials for our second-semester data structures and algorithms course are available here.

Thanks to Dan Licata and Guy Blelloch for helping make these available.  Comments are most welcome.

## Intro Curriculum Update

August 17, 2012

In previous posts I have talked about the new introductory CS curriculum under development at Carnegie Mellon. After a year or so of planning, we began to roll out the new curriculum in the Spring of 2011, and have by now completed the transition. As mentioned previously, the main purpose is to bring the introductory sequence up to date, with particular emphasis on introducing parallelism and verification. A secondary purpose was to restore the focus on computing fundamentals, and correct the drift towards complex application frameworks that offer the students little sense of what is really going on. (The poster child was a star student who admitted that, although she had built a web crawler the previous semester, she in fact has no idea how to build a web crawler.) A particular problem is that what should have been a grounding in the fundamentals of algorithms and data structures turned into an exercise in object-oriented programming, swamping the core content with piles of methodology of dubious value to beginning students. (There is a new, separate, upper-division course on oo methodology for students interested in this topic.) A third purpose was to level the playing field, so that students who had learned about programming on the street were equally as challenged, if not more so, than students without much or any such experience. One consequence would be to reduce the concomitant bias against women entering CS, many fewer of whom having prior computing experience than the men.

The solution was a complete do-over, jettisoning the traditional course completely, and starting from scratch. The most important decision was to emphasize functional programming right from the start, and to build on this foundation for teaching data structures and algorithms. Not only does FP provide a much more natural starting point for teaching programming, it is infinitely more amenable to rigorous verification, and provides a natural model for parallel computation. Every student comes to university knowing some algebra, and they are therefore familiar with the idea of computing by calculation (after all, the word algebra derives from the Arabic al jabr, meaning system of calculation). Functional programming is a generalization of algebra, with a richer variety of data structures and a richer set of primitives, so we can build on that foundation. It is critically important that variables in FP are, in fact, mathematical variables, and not some distorted facsimile thereof, so all of their mathematical intuitions are directly applicable. So we can immediately begin discussing verification as a natural part of programming, using principles such as mathematical induction and equational reasoning to guide their thinking. Moreover, there are natural concepts of sequential time complexity, given by the number of steps required to calculate an answer, and parallel time complexity, given by the data dependencies in a computation (often made manifest by the occurrences of variables). These central concepts are introduced in the first week, and amplified throughout the semester.

Two major homework exercises embody the high points of the first-semester course, one to do with co-development of code with proof, the other to do with parallelism. Co-development of program and proof is illustrated by an online regular expression matcher. The problem is a gem for several reasons. One is that it is essentially impossible for anyone to solve by debugging a blank screen. This sets us up nicely for explaining the importance of specification and proof as part of the development process. Another is that it is very easy, almost inevitable, for students to make mistakes that are not easily caught or diagnosed by testing. We require the students to carry out a proof of the correctness of the matcher, and in the process discover a point where the proof breaks down, which then leads to a proper solution. (See my JFP paper “Proof-Directed Debugging” for a detailed development of the main ideas.) The matcher also provides a very nice lesson in the use of higher-order functions to capture patterns of control, resulting in an extremely clean and simple matcher whose correctness proof is highly non-trivial.

The main parallelism example is the Barnes-Hut algorithm for solving the n-body problem in physics. Barnes-Hut is an example of a “tree-based” method, invented by Andrew Appel, for solving this well-known problem. At a high level the main idea is to decompose space into octants (or quadrants if you’re working in the plane), recursively solving the problem for each octant and then combining the solutions to make an overall solution. The key idea is to use an approximation for bodies that are far enough away—a distant constellation can be regarded as an atomic body for the purposes of calculating the effects of its stars on the sun, say. The problem is naturally parallelizable, because of the recursive decomposition. Moreover, it provides a very nice link to their high school physics. Since FP is just an extension of mathematics, the equations specifying the force law and Newton’s Law carry over directly to the code. This is an important sense in which FP builds on and amplifies their prior mathematical experience, and shows how one may connect computer science with other sciences in a rather direct manner.

The introductory FP course establishes the basis for the new data structures and algorithms course that most students take in either the third or fourth semester. This course represents a radical departure from tradition in several ways. First, it is a highly rigorous course in algorithms that rivals the upper-division course taught at most universities (including our own) for the depth and breadth of ideas it develops. Second, it takes the stance that all algorithms are parallel algorithms, with sequential being but a special case of parallel. Of course some algorithms have a better parallelizability ratio (a precise technical characterization of the ability to make use of parallelism), and this can be greatly affected by data structure selection, among other considerations. Third, the emphasis is on persistent abstract types, which are indispensable for parallel computing. No more linked lists, no more null pointer nonsense, no more mutation. Instead we consider abstract types of graphs, trees, heaps, and so forth, all with a persistent semantics (operations create “new” ones, rather than modify “old” ones). Fourth, we build on the reasoning methods introduced in the first semester course to prove the correctness and the efficiency of algorithms. Functional programming makes all of this possible. Programs are concise and amenable to proof, they are naturally parallel, and they enjoy powerful typing properties that enforce abstraction in a mathematically rigorous manner. Fifth, there is a strong emphasis on problems of practical interest. For example, homework 1 is the shotgun method for genome sequencing, a parallel algorithm of considerable practical importance and renown.

There is a third introductory course in imperative programming, taken in either the first or second semester (alternating with the functional programming course at the student’s discretion), that teaches the “old ways” of doing things using a “safe” subset of C. Personally, I think this style of programming is obsolete, but there are many good reasons to continue to teach it, the biggest probably being the legacy problem. The emphasis is on verification, using simple assertions that are checked at run-time and which may be verified statically in some situations. It is here that students learn how to do things “the hard way” using low-level representations. This course is very much in the style of the 1970’s era data structures course, the main difference being that the current incarnation of Pascal has curly braces, rather than begin-end.

For the sake of those readers who may not be up to date on such topics, it seems important to emphasize that functional programming subsumes imperative programming. Every functional language is capable of expressing the old-fashioned sequential pointer-hacking implementation of data structures. You can even reproduce Tony Hoare’s self-described “billion dollar mistake” of the cursed “null pointer” if you’d like! But the whole point is that it is rarely useful, and almost never elegant, to work that way. (Curiously, the “monad mania” in the Haskell community stresses an imperative, sequential style of programming that is incompatible with parallelism, but this is not forced on you as it is in the imperative world.) From this point of view there no loss, and considerable gain, in teaching functional programming from the start. It puts a proper emphasis on mathematically sane programming methods, but allows for mutation-based programming where appropriate (for example, in engendering “side effects” on users).

I encourage readers to review the syllabi and course materials. There is quite a large body of material in place that we plan to expand into textbook-level treatments in the near future. We also plan to write a journal article documenting our experiences with these courses.

I am very grateful to my colleagues Guy Blelloch, Dan Licata, and Frank Pfenning for their efforts in helping to develop the new introductory curriculum at Carnegie Mellon, and to the teaching assistants whose hard work and dedication put the ideas into practice.

Update: Unfortunately, the homework assignments for these courses are not publicly available, because we want to minimize the temptation for students to make use of old assignments and solutions in preparing their own work.  I am working with my colleagues to find some way in which we can promote the ideas without sacrificing too much of the integrity of the course materials.  I apologize for the inconvenience.

## Words matter

February 1, 2012

Yesterday, during a very nice presentation by Ohad Kammar at Carnegie Mellon, the discussion got derailed, in part, because of a standard, and completely needless, terminological confusion involving the word “variable”.  I’m foolish enough to try to correct it.

The problem is that we’ve all been taught to confuse variables with variables—that is, program variables with mathematical variables.  The distinction is basic.  Since time immemorial (well, at least since al Khwarizmi) we have had the notion of a variable, properly so-called, which is given meaning by substitution.  A variable is an unknown, or indeterminate, quantity that can be replaced by any value of its type (a type being, at least since Russell, the range of significance of a variable).  Frege gave the first systematic study of the quantifiers, and Church exploited the crucial concept of a variable to give the most sharply original and broadly applicable model of computation, the $\lambda$-calculus.

Since the dawn of Fortran something that is not a variable has come to be called a variable.  A program variable, in the sense of Fortran and every imperative language since, is not given meaning by substitution.  Rather, it is given meaning by (at least) two operations associated with it, one to get its contents and one to put new contents into it.  (And, maybe, an operation to form a reference to it, as in C or even Algol.)  Now as many of you know, I think that the concept of a program variable in this sense is by and large a bad idea, or at any rate not nearly as important as it has been made out to be in conventional (including object-oriented) languages, but that’s an argument for another occasion.

Instead, I’m making a plea.  Let’s continue to call variables variables.  It’s a perfectly good name, and refers to what is perhaps one of the greatest achievements of the human mind, the fundamental concept of algebra, the variable.  But let’s stop calling those other things variables!  In my Practical Foundations for Programming Languages I coined (as far as I know) a word that seems perfectly serviceable, namely an assignable.  The things called variables in imperative languages should, rather, be called assignables.  The word is only a tad longer than variable, and rolls off the tongue just as easily, and has the advantage of being an accurate description of what it really is.  What’s not to like?

Why bother?  For one thing, some languages have both concepts, a necessity if you want your language to be mathematically civilized (and you do).  For another, in the increasingly important world of program verification, the specification formalisms, being mathematical in nature, make use of variables, which most definitely are not assignables!  But the real reason to make the distinction is, after all, because words matter.  Two different things deserve to have two different names, and it only confuses matters to use the same word for both.  This week’s confusion was only one example of many that I have seen over the years.

So, my suggestion: let’s call variables variables, and let’s call those other things assignables.  In the fullnesss of time (i.e., once the scourge of imperative programming has been lifted) we may not need the distinction any longer.  But until then, why not draw the distinction properly?

Addendum: It seems worth mentioning that in PFPL I have a novel (afaik) treatment of the concept of a reference, which is clarified in a subsequent post.

## The semester’s over

May 4, 2011

One reason that I started this blog was to record my experiences with a new course on functional programming for freshmen at Carnegie Mellon.  Classes have ended, the final exam is graded, and we are in the process of taking stock of what we accomplished, and what we could do better.  This was the first instance of the course, taught to 83 first-year computer science students who volunteered to be part of the new curriculum.  (This represents about 2/3 of the freshman class.)  All had taken a new course on imperative programming taught by Frank Pfenning the previous semester, and all were simultaneously taking 251, the legendary theory foundation course developed here over the last dozen or so years.

Starting this fall the course will be part of the standard curriculum, and will be taught to a broader range of students, including the electrical engineers and the full class of CS students.  We’re still working out whether the standard order will be imperative, then functional, or functional, then imperative.  Much depends on the mathematical maturity of the incoming students.  Those with weak mathematical skills (the vast majority), but some programming experience, will likely start with imperative programming, which they will take simultaneously with a standard discrete math course taught in the mathematics department.  Those with strong mathematical skills, or high ambitions, will start with functional programming.  A small number will park for one semester, taking discrete math and a programming course primarily intended for non-majors, to bring themselves up to speed.

Our new data structures and algorithms course, being developed by Guy Blelloch, will be offered in the fall for students who have already have the IP and FP classes.  Guy plans to make heavy use of functional programming, and will be stressing parallel algorithms on persistent data structures as the common case, leaving the traditional sequential, ephemeral case to the imperative programming class.  Guy’s course is essentially a continuation (ahem) of the FP class that emphasizes more sophisticated data structures and algorithms, and more complex programming problems.

For those who might be interested, I’m attaching the final exam for the course.  The students did exceptionally well (average score of 88%), even though we feared it was too long and difficult beforehand.  The spread was very narrow, perhaps because the exam was too easy, or because this being a class of volunteers, their skills were unusually strong and uniform.  We’ll know more after the fall once we’ve taught the class to a broader range of students.  I do think the exam is representative of what we expected them to be able to do at the end, and they showed us that they can do it.  A clear success!

15-150 Final Exam (Spring 2011)

## Of Course ML Has Monads!

May 1, 2011

Examined from the point of view of ML, monads are but a particular of use of modules.  The signature of monads is given by the definition

val ret : 'a -> 'a monad
end

There are many, many, many structures that satisfy this signature; I needn’t (and, in any case, can’t) rehearse them all here.  One particularly simple example should suffice to give the general idea:

structure Option : MONAD = struct
type 'a monad = 'a option
fun ret x = SOME x
fun bnd (SOME x) k = k x
| bnd NONE k = NONE
end

This is of course the option monad, which is sometimes used to model the data flow aspects of exceptions, perhaps with some elaboration of the NONE case to associate an exceptional value with a non-result.  (The control flow aspects are not properly modeled this way, however.  For that one needs, in addition, access to some sort of jump mechanism.)

Examples like this one proliferate.  A monad is represented by a structure.  Any structure that provides the facilities specified by the MONAD signature gives rise to the characteristic sequentialization mechanisms codified by it.  Monad transformers are functors that transform one monad into another, with no fuss or bother, and no ad hoc mechanisms required.  Standard modular programming techniques suffice to represent monads; moreover, the techniques involved are fully general, and are equally applicable to other signatures of interest (arrows, or quivers, or bows, or what have you).  Moreover, it is shown in my paper with Chakravarty and Dreyer how to integrate modules into the type inference mechanism of ML so that one can get automatic functor instantiation in those limited cases where it is self-evident what is intended.  This has been implemented by Karl Crary in a prototype compiler for an extension of Standard ML, and it would be good to see this supported in more broadly available compilers for the language.

The bulk of the mania about monads is therefore accounted for by modules.  I have no doubt, however, that you are wondering about the infamous IO monad in Haskell (and it’s associated work-around, unsafePerformIO).  Isn’t that a fundamental feature of the language that cannot be replicated in ML?  Hardly!  It’s entirely a matter of designing the signatures of the standard basis library modules, and nothing more.  The default basis library does not attempt to segregate effects into a monad, but it is perfectly straightforward to do this yourself, by providing your own layer over the standard basis, or to reorganize the standard basis to enforce the separation.  For example, the signature of reference cells might look like this:

signature REF = sig
type 'a ref
val ref : 'a -> 'a ref IO.monad
val ! : 'a ref -> 'a IO.monad
val := : 'a ref -> 'a -> unit IO.monad
end

Here we are presuming that we have a fixed declaration

structure IO : MONAD = ...

that packages up the basic IO primitives that are already implemented in the run-time system of ML, more or less like in Haskell.  The other signatures, such as those for mutable arrays or for performing input and output, would be modified in a similar manner to push effects into the IO monad.  Et voila, you have monadic effects, just like in Haskell.

There’s really nothing to it.  In fact, the whole exercise was carried out by a Carnegie Mellon student, Phillippe Ajoux, a couple of years ago.  He also wrote a number of programs in this style just to see how it all goes: swimmingly.  He also devised syntactic extensions to the Moscow ML compiler that provide a nicer notation for programming with monads, much as in Haskell, but better aligned with ML’s conventions.  (Ideally it should be possible to provide syntactic support for any signature, not just monads, but I’m not aware of a worked-out design for the general case, involving as it would an intermixing of parsing and elaboration.)

My point is that the ML module system can be deployed by you to impose the sorts of effect segregation imposed on you by default in Haskell.  There is nothing special about Haskell that makes this possible, and nothing special about ML that inhibits it.  It’s all a mode of use of modules.

So why don’t we do this by default?  Because it’s not such a great idea.  Yes, I know it sounds wonderful at first, but then you realize that it’s pretty horrible.  Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming.  You cannot easily convert between functional and monadic style without a radical restructuring of code.  And you inevitably need unsafePerformIO to get anything serious done.  In practical terms, you are deprived of the useful concept of a benign effect, and that just stinks!

The moral of the story is that of course ML “has monads”, just like Haskell.  Whether you want to use them is up to you; they are just as useful, and just as annoying, in ML as they are in Haskell.  But they are not forced on you by the language designers!

Update: This post should’ve been called “ML Has Monads, Why Not?”, or “Of Course ML Has Comonads!”, but then no one was wondering about that.

Update: I now think that the last sentence is excessive.  My main point is simply that it’s very simple to go one way or the other with effects, if you have modules to structure things; it’s all a matter of library design.  A variant of ML that enforced the separation of effects is very easily constructed; the question is whether it is useful or not.  I’ve suggested that the monadic separation is beguiling, but not clearly a great idea.  Alternatively, one can say that we’re not that far away from eliminating laziness from Haskell, at least in this respect: just re-do the standard basis library in ML, and you’re a good ways there.  Plus you have modules, and we understand how to integrate type classes with modules, so the gap is rather small.

## Persistence of Memory

April 9, 2011

Salvador Dali’s masterpiece, The Persistence of Memory, is one of the most recognizable works of art in the world.  The critic Dawn Ades described it as “a Surrealist meditation on the collapse of our notions of a fixed cosmic order” induced by Einstein’s penetrating analysis of the concepts of time and space in the physical world.  Just as Dali’s Persistence of Memory demarcated the transition from age-old conceptions of time and space in physics, so does the computational concept of persistence of memory mark the transition from sequential time and mutable space to parallel time and immutable space.

A short while ago I described the distinction between parallelism and concurrency in terms of a cost model that assigns a parallel, as well as sequential, time complexity to a program.  Parallelism is all about efficiency, not semantics; the meaning of a program is not affected by whether it is executed on one processor or many.  Functional languages expose parallelism by limiting sequential dependencies among the parts of a computation; imperative languages introduce inessential dependencies that impede parallelism.

Another critical ingredient for parallelism is the concept of a persistent data structure, one for which its associated operations transform, rather than alter, it.  A persistent dictionary, for example, has the characteristic that inserting an element results in a new dictionary with an expanded domain; the old dictionary persists, and is still available for further transformation.  When calculating the sum of 2 and 2, resulting in 4, no one imagines that the 2’s are “used up” in the process!  Nor does one worry whether the sum of 1 and 3 is the “same” 4 or a “different” 4!  The very question is absurd (or, more precisely, trivial).  So why do we worry about whether the result of inserting 2 into dict “uses up” the old dict?  And why do we worry about whether inserting 2 into the empty dictionary twice results in the “same” dictionary or a “different” one?

Yet both academic and practical computing has all but confined itself to ephemeral data structures, which exhibit precisely such behavior.  Operations on an ephemeral data structure “use up” the data structure, making it unavailable for further use without going to some trouble to get it back.  The pathologies resulting from this abound.  Standard compiler texts, for example, devote a chapter to concepts like “block structured symbol tables”, which are, in fact, nothing more than persistent dictionaries done the hard way.  More generally, whenever a data structure has multiple futures, such as when backtracking or exploiting parallelism, ephemeral data structures get in the way.  Indeed,  the bulk of object-oriented programming, with its absurd over-emphasis on the “message passing” metaphor, stress the alteration of objects as the central organizing principle, confounding parallelism and complicating simple algorithms.

A prime virtue of functional languages is that persistence is the default case, but they can as readily support ephemeral data structures as any imperative (including object-oriented) language.  All functional languages include types of mutable cells and mutable arrays, and provide support for conventional, sequential, imperative programming with semicolons and even curly braces!  (Some do this better than others; Haskell is, in my view, the world’s best imperative programming language, and second-best functional language, but that’s a subject for another post.)  But why would you want to? Why deprive yourself of the benefits of persistence, and insist instead on an ephemeral data structure?

This question came up recently in our planning for the Functional Programming class that we are teaching this semester for freshmen at Carnegie Mellon.  All semester we have been using functional programming techniques to build clean, verifiable, modular, parallel programs.  The students routinely prove theorems about their code, structure programs using abstract types, and exploit parallelism to improve the asymptotic performance of their programs.  Recent homework assignments include the implementation of the parallel Barnes-Hut algorithm for the n-body problem in physics, and the parallel Jamboree algorithm for game-tree search in perfect information games.  Persistent data structures are the key to making this possible; just try to code Barnes-Hut in an imperative language, and you will find yourself in a morass worrying about concurrency when you should instead be envisioning a recursive tree decomposition of space, and the computation of forces using formulas from high school physics.

We tried hard to find good motivations for using an ephemeral data structure when you can just as easily (actually, much more easily) use a persistent one.  As we went through them, we realized that all of the standard arguments are questionable or false.  The usual one is some vague notion of “efficiency” in either time or space.  While I concede that one can, in principle, solve a particular, confined problem more efficiently by doing absolutely everything by hand (memory management, scheduling, arithmetic), in the overwhelming majority of cases the demands of evolution of code far outweigh the potential advantages of doing everything by hand.  Modularity matters most when it comes to building and evolving large systems; functional languages, with persistent data structures, support modularity the best.  (I’ll have more to say about this in a future post.)

Most of the arguments about efficiency, though, ignore questions of functionality.  It is senseless to compare the “efficiency” of one data structure that provides different functionality than another.  A persistent data structure does more for you than does an ephemeral one.  It allows you to have multiple futures, including those that evolve in parallel with one another.  It makes no sense to insist that some ephemeral approximation of such a data structure is “more efficient” if it does not provide those capabilities!  And making it do so is, invariably, a bitch!  Conventional ephemeral data structures are not readily parallelizable; it’s often a publishable result to get a decent degree of parallelism using imperative methods.  By contrast, even freshmen (admittedly, Carnegie Mellon freshmen) can implement a parallel game tree search or a tree-based solution to the n-body problem in a one-week homework assignment.

So what’s the deal?  Why should we care about ephemeral data structures at all?  I have no idea.  Mostly, it’s a legacy cost imposed on us by the overzealous emphasis on imperative methods and machine-based, rather than language-based, models of computation.  But this will change.  Starting this fall, introductory data structures and algorithms will be liberated from the limitations of imperative, object-oriented programming, and will instead stress persistent (as well as ephemeral) data structures, and parallel (including as a special case sequential) algorithms.  The future of computing depends on parallelism (for efficiency), distribution (for scale), and verification (for quality).  Only functional languages support all three naturally and conveniently; the old ones just don’t cut it.

Update: Here’s a chart summarizing the situation as I see it:

$\displaystyle \begin{array}{|c|c|c|} \hline & \text{Ephemeral} & \text{Persistent} \\ \hline \text{Sequential} & \textit{imperative} & \textit{benign effects} \\ \hline \text{Parallel} & \textit{hard to get right} & \textit{functional} \\ \hline \end{array}$

Conventional imperative programming works well for the ephemeral, sequential case; it is notoriously hard to use imperative methods for parallelism.  Benign effects, as exemplified by self-adjusting data structures, can be used to give rise to persistent behavior in the sequential setting, but the use of effects impedes parallelism.

## Functions Are Values

April 2, 2011

After the midterm the course staff for the Functional Programming class had a T-shirt made up saying “Functions Are Values”.  What prompted this is a curious blind spot shared by nearly every student in the class about the nature of functions in ML (or, for that matter, in math).  The students performed admirably on the midterm examination, capably writing higher-order functions, proving correctness of programs by structural induction, and even using higher-order functions to express staged computation.  And yet nearly every student got the following simple “gimme” question wrong:

What is the type and value of the expression “fn x => raise Fail”?

I sensed trouble when, during the examination, a student asked me to clarify the question for him.  Needless to say, I was at a loss for words!  Sure enough, all but one student got this simple question wrong, sometimes spectacularly.

Many said “it has no value” and offered a guess as to its type.  Others said “it raises an exception” and therefore has type “exn”.  (For those who don’t know ML, the type exn is the type of values associated with an exception.)  Still others said “the compiler will notice that it always raises an exception, and will therefore reject it”, or words to that effect.  Where the hell did they get that bizarre notion?  Whatever the deficiencies of our teaching may be, we certainly never said anything close to that!  Others got the type right, but still could not explain what is its value.

Given that they are clearly capable of writing higher-order functional programs, how could this happen?  Obviously the fault is not with our students, but with ourselves.  But what did we do wrong?  How did we manage to stump so many students with what we thought was a free 3-pointer?  To be honest, I don’t know.  But I have a few theories that I thought I would air here in hopes of helping others avoid my mistakes, or nip misunderstandings in the bud.

Throughout the course we have been very careful to develop a rigorous language-based model of computation using structural operational semantics.  We assign meaning to the program you write, not to its translation into some mysterious underlying machine that doesn’t really exist anyway (in other words, we avoided telling the conventional lies).  Nowhere did we ever explain anything by reference to “the compiler”, except at one point, which I now realize was a cardinal mistake.  Don’t make it yourself.  Here’s how it happened.  We meant well, but the spirit is weak, and sometimes we stray.  Forgive me Father, for I have sinned…..

Here’s where we went wrong, and I think invited wild speculation that led to the mistakes on the examination.  One cool use of higher-order functions is to stage computation.  A good example is provided by a continuation-based regular expression matcher, which I will not explain in any detail here.  The crucial point is that it has the type

regexp -> string -> (string -> bool) -> bool

which says that it accepts a regular expression, a string, and a continuation, matching an initial segment of the string, if possible, and passing the rest to the continuation, or returning false if not.  As this description suggests we can think of the matcher as a curried form of a three-argument function, and that’s that.  But a more interesting formulation stages the computation so that, given the regexp, it computes a matcher for that regular expression that may be applied repeatedly to many different candidate strings without reprocessing the regular expression.  (The code for this is quite beautiful; it’s available on our course web site if you’re interested.)

All this can be explained quite neatly using operational semantics, showing how the recursion over the regexp is completed, yielding a matching function as result.  It’s all very cool, except for one little thing: the result contains numerous beta redices that can be simplified to obtain a clearer and simpler representation of the matching code.  Since the equations involved are all so self-evident, we stated (and here’s our mistake) that “the compiler can simplify these away to obtain the following code”, which was, of course, much clearer.  What we said was perfectly true, but unfortunately it opened the floodgates of incomprehension.  The trouble is, the students have no idea what a compiler is or how it works, so for them it’s a “magic wand” that somehow manages to get their ML code executed on an x86.  And we invited them to speculate on what this mystical compiler does, and thereby (I conjecture) invited them to think that (a) what the compiler does is paramount for understanding anything, and (b) the compiler does whatever they wish to imagine it does.  Somehow the students read our simple three-pointer as a “trick question” that was supposed to involve some mumbo-jumbo about the compiler, and that’s what we got, alas.

So, my first bit of advice is, don’t mention the compiler as an explanation for anything!  They’ll have plenty of chances later in their careers to learn about compilers, and to understand how semantics informs compilation, what is valid equational reasoning, and so forth.  But for freshmen, stick with the model.  Otherwise you’re courting disaster.  (I’ve already mentioned other situations where mentioning “the compiler” only muddies the waters and confuses students, the teaching of recursion being a prime example.  I intend to discuss others in future posts.)

Now that I’ve taken the blame for my mistakes, I feel less guilty about shifting some of it elsewhere.  I have two thoughts about why students resist the idea of functions being values of no fundamentally different character than any others.  One source, no doubt, is that many of them have “learned” programming on the street, and have developed all sorts of misconceptions that are being applied here.  Most popular languages make it hard to handle functions as values: you have to name them, you have to go to special trouble to pass them as arguments or return them as results, they are called “methods” and have a strange semantics involving state, and on and on.  From this students acquire the sense that functions are “special”, and cannot be thought of like other things.

Another source of misunderstanding is that elementary mathematics (up through freshman year, at least) stresses the pointful style, always speaking of f(x), rather than f itself, and carrying this forward through calculus, writing d f(x) / dx for the derivative, confusing the function with its values, and so forth.  Here again the message is clear: functions are “special” and are not to be treated on the same footing as numbers.  It’s a pity, because I think that the point-free style is clearer and more natural, if non-standard.  The differential is an operator acting on functions that produces a linear approximation to the function at each point in its domain (if the function in fact has such).  Be that as it may, the consequence of the pointful style is that students develop early on a “fear of functions” that is hard to break.

## What is a functional language?

March 16, 2011

I mentioned in a previous post that I am developing a new course on functional programming for first-year students.  The immediate question, which none of you asked, is “what do you mean by functional programming?”.  Good question!  Now, if only I had a crisp answer, I’d be in good shape.

Socially speaking, the first problem I run into is a fundamental misunderstanding about the term “functional programming”.  Most people think that it stands for an ideological position that is defined in opposition to what everyone else thinks and does.  So in the mind of a typical colleague, when I speak of functional programming, she hears “deviant religious doctrine” and “knows” instinctively to disregard it.  Yet what I mean by functional programming subsumes what everyone else is doing as a narrow special case.  For as readers of this blog surely know, there is no better imperative language than a functional language!  In fact I often say that Haskell is the world’s best imperative programming language; I’m only half (if that much) joking.

So if functional programming subsumes imperative programming, then what the hell am I talking about when I advocate for functional programming at the introductory level?  And why, then, do we also have an imperative programming course?  Very good questions.  I’m not sure I have a definitive answer, but this being a blog, I can air my opinions.

Were it up to me, we would have one introductory course, probably two semesters long, on programming, which would, of course, emphasize functional as well as imperative techniques, all in one language.  Personally, I think this would be the right way to go, and would prepare the students for all future courses in our curriculum, and for work out there in the “real world.”  Not everyone believes me (and sometimes I am not sure I believe myself), so we must compromise.  While it surely sounds all nice and ecumenical to teach both imperative and functional programming, the reality is that we’re teaching old and new programming methods.  It’s not so much the imperative part that counts, but their obsolescence.  It is deemed important (and, to be honest, I can see the merit in the argument) that students learn the “old ways” because that’s how most of the world works.  So we have to use a language that is ill-defined (admits programs that cannot be given any meaning in terms of the language itself), that forces sequential, rather than parallel thinking, that demands manual allocation of memory, and that introduces all sorts of complications, such as null pointers, that need not exist at all.  And then we have to teach about things called “tools” that help manage the mess we’ve created—tools that do things like recover from Boolean blindness or check for null pointer errors.  To be sure, these tools are totally amazing, but then so are hot metal typesetting machines and steam locomotives.

So what, then, is a functional language?  I can do no better than list some criteria that are important for what I do; the languages that satisfy these criteria are commonly called functional languages, so that’s what I’ll call them too.

1. It should have a well-defined semantics expressed in terms of the programs you write, not in terms of how it is “compiled” onto some hypothetical hardware and software platform.  I want to be able to reason about the code I’m writing, not about the code the compiler writes.  And I want to define the meaning of my program in terms of the constructs I’m using, not in terms of some supposed underlying mechanism that implements it.  This is the essence of abstraction.
2. It should support both computation by evaluation and computation by execution.  Evaluation is a smooth generalization of high school- and university-level mathematics, and is defined in terms of standard mathematical concepts such as tuples, functions, numbers, and so forth.  Variables are given meaning by substitution, and evaluation consists of simplifying expressions.  Execution is the action of a program on another agent or data structure conceived of as existing independently of the program itself.  The data lives “over there” and the program “over here” diddles that data.  Assignables (my word for what in imperative languges are wrongly called variables) are given meaning by get and put operations (fetch and store), not by substitution.  Execution consists of performing these operations.
3. It should be natural to consider both persistent and ephemeral data structures.  This is essentially a restatement of (2).  Persistent data structures are values that can be manipulated like any other value, regardless of their apparent “size” or “complexity”.  We can as easily pass functions as arguments and return them as results as we can do the same with integers.  We can think of trees or streams as values in the same manner.  Ephemeral data structures are what you learn about in textbooks.  They are external to the program, and manipulated destructively by execution.  These are useful, but not nearly as central as the current texts would have you believe (for lack of consideration of any other kind).
4. It should have a natural parallel cost model based on the  dependencies among computations so that one may talk about parallel algorithms and their complexity as naturally as one currently discusses sequential algorithms.  Imperative-only programming languages fail this criterion miserably, because there are too many implicit dependencies among the components of a computation, with the result that sequentiality is forced on you by the artificial constraints of imperative thinking.
5. It should have a rich type structure that permits introduction of new abstract types and which supports modular program development.  By giving short shrift to expressions and evaluation, imperative language have an impoverished notion of type—and not support for modularity.  Worse, object-oriented programming, a species of imperative programming, is fundamentally antimodular because of the absurd emphasis on inheritance and the reliance on class-based organizations in which functionality metastasizes throughout a program.

What languages satisfy these criteria best?  Principally ML (both Standard ML and Objective Caml, the “objective” part notwithstanding), and Haskell. (Unfortunately, Haskell loses on the cost model, about which more in another post.)

This is why we teach ML.

Follow