## Old neglected theorems are still theorems

March 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

$\begin{array}{rcl} \textit{gcd}(m,0) & = & m \\ \textit{gcd}(0,m) & = & m \\ \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\ \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m

The type of $\textit{gcd}$ defined in this manner has partial function type $(\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}$, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say $2^{2^n}$, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.

## What, If Anything, Is A Declarative Language?

July 18, 2013

Back in the 1980’s it was very fashionable to talk about “declarative” programming languages.  But to my mind there was never a clear definition of a “declarative language”, and hence no way to tell what is declarative and what is not.  Lacking any clear meaning, the term came to refer to the arbitrary conflation of functional with logic programming to such an extent that “functional-and-logic-programming” almost became a Germanic thing-in-itself (ding an sich).  Later, as the logic programming wave subsided, the term “declarative”,  like “object-oriented”, came to be an expression of approval, and then, mercifully, died out.

Or so I had thought.  Earlier this week I attended a thriller of an NSF-sponsored workshop on high-level programming models for parallelism, where I was surprised by the declarative zombie once again coming to eat our brains.  This got me to thinking, again, about whether the term has any useful meaning.  For what it’s worth, and perhaps to generate useful debate, here’re some things that I think people mean, and why I don’t think they mean very much.

1. “Declarative” means “high-level”.  This just seems to replace one vague term by another.
2. “Declarative” means “not imperative”.  But this flies in the face of reality.  Functional languages embrace and encompass imperative programming as a special case, and even Prolog has imperative features, such as assert and retract, that have imperative meaning.
3. “Declarative” means “functional”.  OK, but then we don’t really need another word.
4. “Declarative” means “what, not how”.  But every language has an operational semantics that defines how to execute its programs, and you must be aware of that semantics to understand programs written in it.  Haskell has a definite evaluation order, just as much as ML has a different one, and even Prolog execution is defined by a clear operational semantics that determines the clause order and that can be influenced by “cut”.
5. “Declarative” means “equational”.  This does not distinguish anything, because there is a well-defined notion of equivalence for any programming language, namely observational equivalence.  Different languages induce different equivalences, of course, but how does one say that one equivalence is “better” than another?  At any rate, I know of no stress on equational properties of logic programs, so either logic programs are not “declarative” or “equational reasoning” is not their defining characteristic.
6. “Declarative” means “referentially transparent”.  The misappropriation of Quine’s terminology only confuses matters.  All I’ve been able to make of this is that “referentially transparent” means that beta-equivalence is valid.  But beta equivalence is not a property of an arbitrary programming language, nor in any case is it clear why this equivalence is first among equals.  In any case why you would decide a priori on what equivalences you want before you even know what it means to run a program?
7. “Declarative” means “has a denotation”.  This gets closer to the point, I think, because we might well say that a declarative semantics is one that gives meaning to programs as some kind of mapping between some sort of spaces.  In other words, it would be a synonym for “denotational semantics”.  But every language has a denotational semantics (perhaps by interpretation into a Kripke structure to impose sequencing), so having one does not seem to distinguish a useful class of languages.  Moreover, even in the case of purely functional programs, the usual denotational semantics (as continuous maps) is not fully abstract, and the fully abstract semantics (as games) is highly operational.  Perhaps a language is declarative in proportion to being able to give it semantics in some “familiar” mathematical setting?
8. “Declarative” means “implicitly parallelizable“.  This was certainly the sense intended at the NSF meeting, but no two “declarative” languages seemed to have much in common.  Charlie Garrod proposes just “implicit”, which is pretty much synonymous with “high level”, and may be the most precise sense there is to the term.

No doubt this list is not exhaustive, but I think it covers many of the most common interpretations.  It seems to me that none of them have a clear meaning or distinguish a well-defined class of languages.  Which leads me to ask, is there any such thing as a declarative programming language?

[Thanks to the late Steve Gould for inspiring the title of this post.]

[Update: wordsmithing.]

## Introductory FP Course Materials

September 15, 2012

The course materials for our first-semester introductory programming course (which I’ve discussed elsewhere on this blog) are now available here.

The course materials for our second-semester data structures and algorithms course are available here.

Thanks to Dan Licata and Guy Blelloch for helping make these available.  Comments are most welcome.

## Intro Curriculum Update

August 17, 2012

In previous posts I have talked about the new introductory CS curriculum under development at Carnegie Mellon. After a year or so of planning, we began to roll out the new curriculum in the Spring of 2011, and have by now completed the transition. As mentioned previously, the main purpose is to bring the introductory sequence up to date, with particular emphasis on introducing parallelism and verification. A secondary purpose was to restore the focus on computing fundamentals, and correct the drift towards complex application frameworks that offer the students little sense of what is really going on. (The poster child was a star student who admitted that, although she had built a web crawler the previous semester, she in fact has no idea how to build a web crawler.) A particular problem is that what should have been a grounding in the fundamentals of algorithms and data structures turned into an exercise in object-oriented programming, swamping the core content with piles of methodology of dubious value to beginning students. (There is a new, separate, upper-division course on oo methodology for students interested in this topic.) A third purpose was to level the playing field, so that students who had learned about programming on the street were equally as challenged, if not more so, than students without much or any such experience. One consequence would be to reduce the concomitant bias against women entering CS, many fewer of whom having prior computing experience than the men.

The solution was a complete do-over, jettisoning the traditional course completely, and starting from scratch. The most important decision was to emphasize functional programming right from the start, and to build on this foundation for teaching data structures and algorithms. Not only does FP provide a much more natural starting point for teaching programming, it is infinitely more amenable to rigorous verification, and provides a natural model for parallel computation. Every student comes to university knowing some algebra, and they are therefore familiar with the idea of computing by calculation (after all, the word algebra derives from the Arabic al jabr, meaning system of calculation). Functional programming is a generalization of algebra, with a richer variety of data structures and a richer set of primitives, so we can build on that foundation. It is critically important that variables in FP are, in fact, mathematical variables, and not some distorted facsimile thereof, so all of their mathematical intuitions are directly applicable. So we can immediately begin discussing verification as a natural part of programming, using principles such as mathematical induction and equational reasoning to guide their thinking. Moreover, there are natural concepts of sequential time complexity, given by the number of steps required to calculate an answer, and parallel time complexity, given by the data dependencies in a computation (often made manifest by the occurrences of variables). These central concepts are introduced in the first week, and amplified throughout the semester.

Two major homework exercises embody the high points of the first-semester course, one to do with co-development of code with proof, the other to do with parallelism. Co-development of program and proof is illustrated by an online regular expression matcher. The problem is a gem for several reasons. One is that it is essentially impossible for anyone to solve by debugging a blank screen. This sets us up nicely for explaining the importance of specification and proof as part of the development process. Another is that it is very easy, almost inevitable, for students to make mistakes that are not easily caught or diagnosed by testing. We require the students to carry out a proof of the correctness of the matcher, and in the process discover a point where the proof breaks down, which then leads to a proper solution. (See my JFP paper “Proof-Directed Debugging” for a detailed development of the main ideas.) The matcher also provides a very nice lesson in the use of higher-order functions to capture patterns of control, resulting in an extremely clean and simple matcher whose correctness proof is highly non-trivial.

The main parallelism example is the Barnes-Hut algorithm for solving the n-body problem in physics. Barnes-Hut is an example of a “tree-based” method, invented by Andrew Appel, for solving this well-known problem. At a high level the main idea is to decompose space into octants (or quadrants if you’re working in the plane), recursively solving the problem for each octant and then combining the solutions to make an overall solution. The key idea is to use an approximation for bodies that are far enough away—a distant constellation can be regarded as an atomic body for the purposes of calculating the effects of its stars on the sun, say. The problem is naturally parallelizable, because of the recursive decomposition. Moreover, it provides a very nice link to their high school physics. Since FP is just an extension of mathematics, the equations specifying the force law and Newton’s Law carry over directly to the code. This is an important sense in which FP builds on and amplifies their prior mathematical experience, and shows how one may connect computer science with other sciences in a rather direct manner.

The introductory FP course establishes the basis for the new data structures and algorithms course that most students take in either the third or fourth semester. This course represents a radical departure from tradition in several ways. First, it is a highly rigorous course in algorithms that rivals the upper-division course taught at most universities (including our own) for the depth and breadth of ideas it develops. Second, it takes the stance that all algorithms are parallel algorithms, with sequential being but a special case of parallel. Of course some algorithms have a better parallelizability ratio (a precise technical characterization of the ability to make use of parallelism), and this can be greatly affected by data structure selection, among other considerations. Third, the emphasis is on persistent abstract types, which are indispensable for parallel computing. No more linked lists, no more null pointer nonsense, no more mutation. Instead we consider abstract types of graphs, trees, heaps, and so forth, all with a persistent semantics (operations create “new” ones, rather than modify “old” ones). Fourth, we build on the reasoning methods introduced in the first semester course to prove the correctness and the efficiency of algorithms. Functional programming makes all of this possible. Programs are concise and amenable to proof, they are naturally parallel, and they enjoy powerful typing properties that enforce abstraction in a mathematically rigorous manner. Fifth, there is a strong emphasis on problems of practical interest. For example, homework 1 is the shotgun method for genome sequencing, a parallel algorithm of considerable practical importance and renown.

There is a third introductory course in imperative programming, taken in either the first or second semester (alternating with the functional programming course at the student’s discretion), that teaches the “old ways” of doing things using a “safe” subset of C. Personally, I think this style of programming is obsolete, but there are many good reasons to continue to teach it, the biggest probably being the legacy problem. The emphasis is on verification, using simple assertions that are checked at run-time and which may be verified statically in some situations. It is here that students learn how to do things “the hard way” using low-level representations. This course is very much in the style of the 1970’s era data structures course, the main difference being that the current incarnation of Pascal has curly braces, rather than begin-end.

For the sake of those readers who may not be up to date on such topics, it seems important to emphasize that functional programming subsumes imperative programming. Every functional language is capable of expressing the old-fashioned sequential pointer-hacking implementation of data structures. You can even reproduce Tony Hoare’s self-described “billion dollar mistake” of the cursed “null pointer” if you’d like! But the whole point is that it is rarely useful, and almost never elegant, to work that way. (Curiously, the “monad mania” in the Haskell community stresses an imperative, sequential style of programming that is incompatible with parallelism, but this is not forced on you as it is in the imperative world.) From this point of view there no loss, and considerable gain, in teaching functional programming from the start. It puts a proper emphasis on mathematically sane programming methods, but allows for mutation-based programming where appropriate (for example, in engendering “side effects” on users).

I encourage readers to review the syllabi and course materials. There is quite a large body of material in place that we plan to expand into textbook-level treatments in the near future. We also plan to write a journal article documenting our experiences with these courses.

I am very grateful to my colleagues Guy Blelloch, Dan Licata, and Frank Pfenning for their efforts in helping to develop the new introductory curriculum at Carnegie Mellon, and to the teaching assistants whose hard work and dedication put the ideas into practice.

Update: Unfortunately, the homework assignments for these courses are not publicly available, because we want to minimize the temptation for students to make use of old assignments and solutions in preparing their own work.  I am working with my colleagues to find some way in which we can promote the ideas without sacrificing too much of the integrity of the course materials.  I apologize for the inconvenience.

## Words matter

February 1, 2012

Yesterday, during a very nice presentation by Ohad Kammar at Carnegie Mellon, the discussion got derailed, in part, because of a standard, and completely needless, terminological confusion involving the word “variable”.  I’m foolish enough to try to correct it.

The problem is that we’ve all been taught to confuse variables with variables—that is, program variables with mathematical variables.  The distinction is basic.  Since time immemorial (well, at least since al Khwarizmi) we have had the notion of a variable, properly so-called, which is given meaning by substitution.  A variable is an unknown, or indeterminate, quantity that can be replaced by any value of its type (a type being, at least since Russell, the range of significance of a variable).  Frege gave the first systematic study of the quantifiers, and Church exploited the crucial concept of a variable to give the most sharply original and broadly applicable model of computation, the $\lambda$-calculus.

Since the dawn of Fortran something that is not a variable has come to be called a variable.  A program variable, in the sense of Fortran and every imperative language since, is not given meaning by substitution.  Rather, it is given meaning by (at least) two operations associated with it, one to get its contents and one to put new contents into it.  (And, maybe, an operation to form a reference to it, as in C or even Algol.)  Now as many of you know, I think that the concept of a program variable in this sense is by and large a bad idea, or at any rate not nearly as important as it has been made out to be in conventional (including object-oriented) languages, but that’s an argument for another occasion.

Instead, I’m making a plea.  Let’s continue to call variables variables.  It’s a perfectly good name, and refers to what is perhaps one of the greatest achievements of the human mind, the fundamental concept of algebra, the variable.  But let’s stop calling those other things variables!  In my Practical Foundations for Programming Languages I coined (as far as I know) a word that seems perfectly serviceable, namely an assignable.  The things called variables in imperative languages should, rather, be called assignables.  The word is only a tad longer than variable, and rolls off the tongue just as easily, and has the advantage of being an accurate description of what it really is.  What’s not to like?

Why bother?  For one thing, some languages have both concepts, a necessity if you want your language to be mathematically civilized (and you do).  For another, in the increasingly important world of program verification, the specification formalisms, being mathematical in nature, make use of variables, which most definitely are not assignables!  But the real reason to make the distinction is, after all, because words matter.  Two different things deserve to have two different names, and it only confuses matters to use the same word for both.  This week’s confusion was only one example of many that I have seen over the years.

So, my suggestion: let’s call variables variables, and let’s call those other things assignables.  In the fullnesss of time (i.e., once the scourge of imperative programming has been lifted) we may not need the distinction any longer.  But until then, why not draw the distinction properly?

Addendum: It seems worth mentioning that in PFPL I have a novel (afaik) treatment of the concept of a reference, which is clarified in a subsequent post.

## The semester’s over

May 4, 2011

One reason that I started this blog was to record my experiences with a new course on functional programming for freshmen at Carnegie Mellon.  Classes have ended, the final exam is graded, and we are in the process of taking stock of what we accomplished, and what we could do better.  This was the first instance of the course, taught to 83 first-year computer science students who volunteered to be part of the new curriculum.  (This represents about 2/3 of the freshman class.)  All had taken a new course on imperative programming taught by Frank Pfenning the previous semester, and all were simultaneously taking 251, the legendary theory foundation course developed here over the last dozen or so years.

Starting this fall the course will be part of the standard curriculum, and will be taught to a broader range of students, including the electrical engineers and the full class of CS students.  We’re still working out whether the standard order will be imperative, then functional, or functional, then imperative.  Much depends on the mathematical maturity of the incoming students.  Those with weak mathematical skills (the vast majority), but some programming experience, will likely start with imperative programming, which they will take simultaneously with a standard discrete math course taught in the mathematics department.  Those with strong mathematical skills, or high ambitions, will start with functional programming.  A small number will park for one semester, taking discrete math and a programming course primarily intended for non-majors, to bring themselves up to speed.

Our new data structures and algorithms course, being developed by Guy Blelloch, will be offered in the fall for students who have already have the IP and FP classes.  Guy plans to make heavy use of functional programming, and will be stressing parallel algorithms on persistent data structures as the common case, leaving the traditional sequential, ephemeral case to the imperative programming class.  Guy’s course is essentially a continuation (ahem) of the FP class that emphasizes more sophisticated data structures and algorithms, and more complex programming problems.

For those who might be interested, I’m attaching the final exam for the course.  The students did exceptionally well (average score of 88%), even though we feared it was too long and difficult beforehand.  The spread was very narrow, perhaps because the exam was too easy, or because this being a class of volunteers, their skills were unusually strong and uniform.  We’ll know more after the fall once we’ve taught the class to a broader range of students.  I do think the exam is representative of what we expected them to be able to do at the end, and they showed us that they can do it.  A clear success!

15-150 Final Exam (Spring 2011)

## Of Course ML Has Monads!

May 1, 2011

Examined from the point of view of ML, monads are but a particular of use of modules.  The signature of monads is given by the definition

signature MONAD = sig
val ret : 'a -> 'a monad
end

There are many, many, many structures that satisfy this signature; I needn’t (and, in any case, can’t) rehearse them all here.  One particularly simple example should suffice to give the general idea:

structure Option : MONAD = struct
type 'a monad = 'a option
fun ret x = SOME x
fun bnd (SOME x) k = k x
| bnd NONE k = NONE
end

This is of course the option monad, which is sometimes used to model the data flow aspects of exceptions, perhaps with some elaboration of the NONE case to associate an exceptional value with a non-result.  (The control flow aspects are not properly modeled this way, however.  For that one needs, in addition, access to some sort of jump mechanism.)

Examples like this one proliferate.  A monad is represented by a structure.  Any structure that provides the facilities specified by the MONAD signature gives rise to the characteristic sequentialization mechanisms codified by it.  Monad transformers are functors that transform one monad into another, with no fuss or bother, and no ad hoc mechanisms required.  Standard modular programming techniques suffice to represent monads; moreover, the techniques involved are fully general, and are equally applicable to other signatures of interest (arrows, or quivers, or bows, or what have you).  Moreover, it is shown in my paper with Chakravarty and Dreyer how to integrate modules into the type inference mechanism of ML so that one can get automatic functor instantiation in those limited cases where it is self-evident what is intended.  This has been implemented by Karl Crary in a prototype compiler for an extension of Standard ML, and it would be good to see this supported in more broadly available compilers for the language.

The bulk of the mania about monads is therefore accounted for by modules.  I have no doubt, however, that you are wondering about the infamous IO monad in Haskell (and it’s associated work-around, unsafePerformIO).  Isn’t that a fundamental feature of the language that cannot be replicated in ML?  Hardly!  It’s entirely a matter of designing the signatures of the standard basis library modules, and nothing more.  The default basis library does not attempt to segregate effects into a monad, but it is perfectly straightforward to do this yourself, by providing your own layer over the standard basis, or to reorganize the standard basis to enforce the separation.  For example, the signature of reference cells might look like this:

signature REF = sig
type 'a ref
val ref : 'a -> 'a ref IO.monad
val ! : 'a ref -> 'a IO.monad
val := : 'a ref -> 'a -> unit IO.monad
end

Here we are presuming that we have a fixed declaration

structure IO : MONAD = ...

that packages up the basic IO primitives that are already implemented in the run-time system of ML, more or less like in Haskell.  The other signatures, such as those for mutable arrays or for performing input and output, would be modified in a similar manner to push effects into the IO monad.  Et voila, you have monadic effects, just like in Haskell.

There’s really nothing to it.  In fact, the whole exercise was carried out by a Carnegie Mellon student, Phillippe Ajoux, a couple of years ago.  He also wrote a number of programs in this style just to see how it all goes: swimmingly.  He also devised syntactic extensions to the Moscow ML compiler that provide a nicer notation for programming with monads, much as in Haskell, but better aligned with ML’s conventions.  (Ideally it should be possible to provide syntactic support for any signature, not just monads, but I’m not aware of a worked-out design for the general case, involving as it would an intermixing of parsing and elaboration.)

My point is that the ML module system can be deployed by you to impose the sorts of effect segregation imposed on you by default in Haskell.  There is nothing special about Haskell that makes this possible, and nothing special about ML that inhibits it.  It’s all a mode of use of modules.

So why don’t we do this by default?  Because it’s not such a great idea.  Yes, I know it sounds wonderful at first, but then you realize that it’s pretty horrible.  Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming.  You cannot easily convert between functional and monadic style without a radical restructuring of code.  And you inevitably need unsafePerformIO to get anything serious done.  In practical terms, you are deprived of the useful concept of a benign effect, and that just stinks!

The moral of the story is that of course ML “has monads”, just like Haskell.  Whether you want to use them is up to you; they are just as useful, and just as annoying, in ML as they are in Haskell.  But they are not forced on you by the language designers!

Update: This post should’ve been called “ML Has Monads, Why Not?”, or “Of Course ML Has Comonads!”, but then no one was wondering about that.

Update: I now think that the last sentence is excessive.  My main point is simply that it’s very simple to go one way or the other with effects, if you have modules to structure things; it’s all a matter of library design.  A variant of ML that enforced the separation of effects is very easily constructed; the question is whether it is useful or not.  I’ve suggested that the monadic separation is beguiling, but not clearly a great idea.  Alternatively, one can say that we’re not that far away from eliminating laziness from Haskell, at least in this respect: just re-do the standard basis library in ML, and you’re a good ways there.  Plus you have modules, and we understand how to integrate type classes with modules, so the gap is rather small.

Follow