## Exceptions are shared secrets

December 3, 2012

It’s quite obvious to me that the treatment of exceptions in Haskell is wrong. Setting aside the example I gave before of an outright unsoundness, exceptions in Haskell are nevertheless done improperly, even if they happen to be sound. One reason is that the current formulation is not stable under seemingly mild extensions to Haskell that one might well want to consider, notably any form of parameterized module or any form of shadowing of exception declarations. For me this is enough to declare the whole thing wrong, but as it happens Haskell is too feeble to allow full counterexamples to be formulated, so one may still claim that what is there now is ok … for now.

But I wish to argue that Haskell exceptions are nevertheless poorly designed, because it misses the crucial point that exception values are shared secrets. Let us distinguish two parties, the raiser of the exception, and the handler of it. The fundamental idea of exceptions is to transfer a value from the raiser to the handler without the possibility of interception by another party. While the language of secrecy seems appropriately evocative, I hasten to add that I am not here concerned with “attackers” or suchlike, but merely with the difficulties of ensuring modular composition of programs from components. In such a setting the “attacker” is yourself, who is not malicious, but who is fallible.

By raising an exception the raiser is “contacting” a handler with a message. The raiser wishes to limit which components of a program may intercept that message. More precisely, the raiser wishes to ensure that only certain previously agreed-upon components may handle that exception, perhaps only one. This property should remain stable under extension to the program or composition with any other component. It should not be possible for an innocent third party to accidentally intercept a message that was not intended for it.

Achieving this requires a secrecy mechanism that allows the raiser and the handler(s) to agree upon their cooperation. This is accomplished by dynamic classification, exactly as it is done properly in Standard ML (but not O’Caml). The idea is that the raiser has access to a dynamically generated constructor for exception values, and any handler has access to the corresponding dynamically generated matcher for exception values. This means that the handler, and only the handler, can decode the message sent by the raiser; no other party can do anything with it other than pass it along unexamined. It is “perfectly encrypted” and cannot be deciphered by any unintended component.

The usual exception mechanisms, as distinct from exception values, allow for “wild-card handlers”, which means that an exception can be intercepted by a third party. This means that the raiser cannot ensure that the handler actually receives the message, but it can ensure, using dynamic classification, that only a legitimate handler may decipher it. Decades of experience with Standard ML shows that this is a very useful thing indeed, and has application far beyond just the simple example considered here. For full details, see my forthcoming book, for a full discussion of dynamic classification and its role for ensuring integrity and confidentiality in a program. Dynamic classification is not just for “security”, but is rather a good tool for everyday programming.

So why does Haskell not do it this way? Well, I’m not the one to answer that question, but my guess is that doing so conflicts with the monadic separation of effects. To do exceptions properly requires dynamic allocation, and this would force code that is otherwise functional into the IO monad. Alternatively, one would have to use unsafePerformIO—as in ezyang’s implementation—to “hide” the effects of exception allocation. But this would then be further evidence that the strict monadic separation of effects is untenable.

Update: Reworked last paragraph to clarify the point I am making; the previous formulation appears to have invited misinterpretation.

Update: This account of exceptions also makes clear why the perennial suggestion to put exception-raising information into types makes no sense to me. I will write more about this in a future post, but meanwhile contemplate that a computation may raise an exception that is not even in principle nameable in the type. That is, it is not conservativity that’s at issue, it’s the very idea.

## PFPL is out!

December 3, 2012

Practical Foundations for Programming Languages, published by Cambridge University Press, is now available in print! It can be ordered from the usual sources, and maybe some unusual ones as well. If you order directly from Cambridge using this link, you will get a 20% discount on the cover price (pass it on).

Since going to press I have, inevitably, been informed of some (so far minor) errors that are corrected in the online edition. These corrections will make their way into the second printing. If you see something fishy-looking, compare it with the online edition first to see whether I may have already corrected the mistake. Otherwise, send your comments to me.

By the way, the cover artwork is by Scott Draves, a former student in my group, who is now a professional artist as well as a researcher at Google in NYC. Thanks, Scott!

Update: The very first author’s copy hit my desk today!

## Yet Another Reason Not To Be Lazy Or Imperative

August 26, 2012

In an earlier post I argued that, contrary to much of the literature in the area, parallelism is all about efficiency, and has little or nothing to do with concurrency.  Concurrency is concerned with controlling non-determinism, which can arise in all sorts of situations having nothing to do with parallelism.  Process calculi, for example, are best viewed as expressing substructural composition of programs, and have very little to do with parallel computing.  (See my PFPL and Rob Simmons’ forthcoming Ph.D. dissertation for more on this perspective.)  Parallelism, on the other hand, is captured by analyzing the cost of a computation whose meaning is independent of its parallel execution.  A cost semantics specifies the abstract cost of a program that is validated by a provable implementation that transfers the abstract cost to a precise concrete cost on a particular platform.  The cost of parallel execution is neatly captured by the concept of a cost graph that captures the dynamic data dependencies among subcomputations.  Details such as the number of processors or the nature of the interconnect are factored into the provable implementation, which predicts the asymptotic behavior of a program on a hardware platform based on its cost graph.  One advantage of cost semantics for parallelism is that it is easy to teach freshmen how to write parallel programs; we’ve been doing this successfully for two years now, with little fuss or bother.

This summer Guy Blelloch and I began thinking about other characterizations of the complexity of programs besides the familiar abstractions of execution time and space requirements of a computation.  One important measure, introduced by Jeff Vitter, is called I/O Complexity.  It measures the efficiency of algorithms with respect to memory traffic, a very significant determiner of performance of programs.  The model is sufficiently abstract as to encompass several different interpretations of I/O complexity.  Basically, the model assumes an unbounded main memory in which all data is ultimately stored, and considers a cache of $M=c\times B$ blocked into chunks of size $B$ that provides quick access to main memory.  The complexity of algorithms is analyzed in terms of these parameters, under the assumption that in-cache accesses are cost-free, so that the only significant costs are those incurred by loading and flushing the cache.  You may interpret the abstract concepts of main memory and cache in the standard way as a two-level hierarchy representing, say, on- and off-chip memory access, or instead as representing a disk (or other storage medium) loaded into memory for processing.  The point is that the relative costs of processing cached versus uncached data is huge, and worth considering as a measure of the efficiency of an algorithm.

As usual in the algorithms world Vitter makes use of a low-level machine model in which to express and evaluate algorithms.  Using this model Vitter obtains a lower-bound for sorting in the I/O model, and a matching upper bound using a $k$-way merge sort, where $k$ is chosen as a function of $M$ and $B$ (that is, it is not cache oblivious in the sense of Leiserson, et al.)  Although such models provide a concrete, and well-understood, basis for analyzing algorithms, we all know full well that programming at such a low-level is at best a tedious exercise.  Worse, machine models provide no foundation for composition of programs, the single most important characteristic of higher-level language models.  (Indeed, the purpose of types is to mediate composition of components; without types, you’re toast.)

The point of Guy’s and my work this summer is to adapt the I/O model to functional programs, avoiding the mess, bother, and futility of trying to work at the machine level.  You might think that it would be impossible to reason about the cache complexity of a functional program (especially if you’re under the impression that functional programming necessarily has something to do with Haskell, which it does not, though you may perhaps say that Haskell has something to do with functional programming).  Traditional algorithms work, particularly as concerns cache complexity, is extremely finicky about memory management in order to ensure that reasonable bounds are met, and you might reasonably suspect that it will ever be thus.  The point of our paper, however, is to show that the same asymptotic bounds obtained by Vitter in the I/O model may be met using purely functional programming, provided that the functional language is (a) non-lazy (of course), and (b) implemented properly (as we describe).

Specifically, we give a cost semantics for functional programs (in the paper, a fragment of ML) that takes account of the memory traffic engendered by evaluation, and a provable implementation that validates the cost semantics by describing how to implement it on a machine-like model.  The crux of the matter is to account for the cache effects that arise from maintaining a control stack during evaluation, even though the abstract semantics has no concept of a stack (it’s part of the implementation, and cannot be avoided).  The cost semantics makes explicit the reading and allocation of values in the store (using Felleisen, Morrisett, and H’s “Abstract Models of Memory Management”), and imposes enough structure on the store to capture the critical concept of locality that is required to ensure good cache (or I/O) behavior.  The abstract model is parameterized by $M$ and $B$ described above, but interpreted as representing the number of objects in the cache and the neighborhood of an object in memory (the objects that are allocated near it, and that are therefore fetched along with the object whenever the cache is loaded).

The provable implementation is given in two steps.  First, we show how to transfer the abstract cost assigned to a computation into the amount of memory traffic incurred on an abstract machine with an explicit control stack.  The key idea here is an amortization argument that allows us to obtain tight bounds on the overhead required to maintain the stack.  Second, we show how to implement the crucial read and allocate operations that underpin the abstract semantics and the abstract machine.  Here we rely on a competitive analysis, given by Sleator, et al., of the ideal cache model, and on an amortization of the cost of garbage collection in the style of Appel.  We also make use of an original (as far as I know) technique for implementing the control stack so as to avoid unnecessary interference with the data objects in cache.  The net result is that the cost semantics provides an accurate asymptotic analysis of the I/O complexity of a functional algorithm, provided that it is implemented in the manner we describe in the paper (which, in fact, is not far from standard implementation techniques, the only trick being how to manage the control stack properly).  We then use the model to derive bounds for several algorithms that are comparable to those obtained by Vitter using a low-level machine model.

The upshot of all of this is that we can reason about the I/O or cache complexity of functional algorithms, much as we can reason about the parallel complexity of functional algorithms, namely by using a cost semantics.  There is no need to drop down to a low-level machine model to get a handle on this important performance metric for your programs, provided, of course, that you’re not stuck with a lazy language (for those poor souls, there is no hope).

## Polarity in Type Theory

August 25, 2012

There has recently arisen some misguided claims about a supposed opposition between functional and object-oriented programming.  The claims amount to a belated recognition of a fundamental structure in type theory first elucidated by Jean-Marc Andreoli, and developed in depth by Jean-Yves Girard in the context of logic, and by Paul Blain-Levy and Noam Zeilberger in the context of programming languages.  In keeping with the general principle of computational trinitarianism, the concept of polarization has meaning in proof theory, category theory, and type theory, a sure sign of its fundamental importance.

Polarization is not an issue of language design, it is an issue of type structure.  The main idea is that types may be classified as being positive or negative, with the positive being characterized by their structure and the negative being characterized by their behavior.  In a sufficiently rich type system one may consider, and make effective use of, both positive and negative types.  There is nothing remarkable or revolutionary about this, and, truly, there is nothing really new about it, other than the terminology.  But through the efforts of the above-mentioned researchers, and others, we have learned quite a lot about the importance of polarization in logic, languages, and semantics.  I find it particularly remarkable that Andreoli’s work on proof search turned out to also be of deep significance for programming languages.  This connection was developed and extended by Zeilberger, on whose dissertation I am basing this post.

The simplest and most direct way to illustrate the ideas is to consider the product type, which corresponds to conjunction in logic.  There are two possible ways that one can formulate the rules for the product type that from the point of view of inhabitation are completely equivalent, but from the point of view of computation are quite distinct.  Let us first state them as rules of logic, then equip these rules with proof terms so that we may study their operational behavior.  For the time being I will refer to these as Method 1 and Method 2, but after we examine them more carefully, we will find more descriptive names for them.

Method 1 of defining conjunction is perhaps the most familiar.  It consists of this introduction rule

$\displaystyle\frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}$

and the following two elimination rules

$\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash A\;\textsf{true}}\qquad\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash B\;\textsf{true}}$.

Method 2 of defining conjunction is only slightly different.  It consists of the same introduction

$\displaystyle \frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}$

and one elimination rule

$\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true} \quad \Gamma,A\;\textsf{true},B\;\textsf{true}\vdash C\;\textsf{true}}{\Gamma\vdash C\;\textsf{true}}$.

From a logical point of view the two formulations are interchangeable in that the rules of the one are admissible with respect to the rules of the other, given the usual structural properties of entailment, specifically reflexivity and transitivity.  However, one can discern a difference in “attitude” in the two formulations that will turn out to be a manifestation of the concept of polarity.

Method 1 is a formulation of the idea that a proof of a conjunction is anything that behaves conjunctively, which means that it supports the two elimination rules given in the definition.  There is no commitment to the internal structure of a proof, nor to the details of how projection operates; as long as there are projections, then we are satisfied that the connective is indeed conjunction.  We may consider that the elimination rules define the connective, and that the introduction rule is derived from that requirement.  Equivalently we may think of the proofs of conjunction as being coinductively defined to be as large as possible, as long as the projections are available.  Zeilberger calls this the pragmatist interpretation, following Count Basie’s principle, “if it sounds good, it is good.”

Method 2 is a direct formulation of the idea that the proofs of a conjunction are inductively defined to be as small as possible, as long as the introduction rule is valid.  Specifically, the single introduction rule may be understood as defining the structure of the sole form of proof of a conjunction, and the single elimination rule expresses the induction, or recursion, principle associated with that viewpoint.  Specifically, to reason from the fact that $A\wedge B\;\textsf{true}$ to derive $C\;\textsf{true}$, it is enough to reason from the data that went into the proof of the conjunction to derive $C\;\textsf{true}$.  We may consider that the introduction rule defines the connective, and that the elimination rule is derived from that definition.  Zeilberger calls this the verificationist interpretation.

These two perspectives may be clarified by introducing proof terms, and the associated notions of reduction that give rise to a dynamics of proofs.

When reformulated with explicit proofs, the rules of Method 1 are the familiar rules for ordered pairs:

$\displaystyle\frac{\Gamma\vdash M:A\quad\Gamma\vdash N:B}{\Gamma\vdash \langle M, N\rangle:A\wedge B}$

$\displaystyle\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{fst}(M):A}\qquad\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{snd}(M):B}$.

The associated reduction rules specify that the elimination rules are post-inverse to the introduction rules:

$\displaystyle\textsf{fst}(\langle M,N\rangle)\mapsto M \qquad \textsf{snd}(\langle M,N\rangle)\mapsto N$.

In this formulation the proposition $A\wedge B$ is often written $A\times B$, since it behaves like a Cartesian product of proofs.

When formulated with explicit proofs, Method 2 looks like this:

$\displaystyle \frac{\Gamma\vdash M:A\quad\Gamma\vdash M:B}{\Gamma\vdash M\otimes N:A\wedge B}$

$\displaystyle\frac{\Gamma\vdash M:A\wedge B \quad \Gamma,x:A,y:B\vdash N:C}{\Gamma\vdash \textsf{split}(M;x,y.N):C}$

with the reduction rule

$\displaystyle\textsf{split}(M\otimes N;x,y.P)\mapsto [M,N/x,y]P$.

With this formulation it is natural to write $A\wedge B$ as $A\otimes B$, since it behaves like a tensor product of proofs.

Since the two formulations of “conjunction” have different internal structure, we may consider them as two different connectives.  This may, at first, seem pointless, because it is easily seen that $x:A\times B\vdash M:A\otimes B$ for some $M$ and that $x:A\otimes B\vdash N:A\times B$ for some $N$, so that the two connectives are logically equivalent, and hence interchangeable in any proof.  But there is nevertheless a reason to draw the distinction, namely that they have different dynamics.

It is easy to see why.  From the pragmatic perspective, since the projections act independently of one another, there is no reason to insist that the components of a pair be evaluated before they are used.  Quite possibly we may only ever project the first component, so why bother with the second?  From the verificationist perspective, however, we are pattern matching against the proof of the conjunction, and are demanding both components at once, so it makes sense to evaluate both components of a pair in anticipation of future pattern matching.  (Admittedly, in a structural type theory one may immediately drop one of the variables on the floor and never use it, but then why give it a name at all?  In a substructural type theory such as linear type theory, this is not a possibility, and the interpretation is forced.)  Thus, the verficationist formulation corresponds to eager evaluation of pairing, and the pragmatist formulation to lazy evaluation of pairing.

Having distinguished the two forms of conjunction by their operational behavior, it is immediately clear that both forms are useful, and are by no means opposed to one another.  This is why, for example, the concept of a lazy language makes no sense, rather one should instead speak of lazy types, which are perfectly useful, but by no means the only types one should ever consider.  Similarly, the concept of an object-oriented language seems misguided, because it emphasizes the pragmatist conception, to the exclusion of the verificationist, by insisting that everything be an object characterized by its methods.

More broadly, it is useful to classify types into two polarities, the positive and the negative, corresponding to the verificationist and pragmatist perspectives.  Positive types are inductively defined by their introduction forms; they correspond to colimits, or direct limits, in category theory.  Negative types are coinductively defined by their elimination forms; they correspond to limits, or inverse limits, in category theory.  The concept of polarity is intimately related to the concept of focusing, which in logic sharpens the concept of a cut-free proof and elucidates the distinction between synchronous and asynchronous connectives, and which in programming languages provides an elegant account of pattern matching, continuations, and effects.

As ever, enduring principles emerge from the interplay between proof theory, category theory, and type theory.  Such concepts are found in nature, and do not depend on cults of personality or the fads of the computer industry for their existence or importance.

Update: word-smithing.

August 14, 2012

It is well known that Haskell is not type safe.  The most blatant violation is the all too necessary, but aptly named, unsafePerformIO operation.  You are enjoined not to use this in an unsafe manner, and must be careful to ensure that the encapsulated computation may be executed at any time because of the inherent unpredictability of lazy evaluation.  (The analogous operation in monadic ML, safePerformIO, is safe, because of the value restriction on polymorphism.)  A less blatant violation is that the equational theory of the language with seq is different from the equational theory without it, and the last I knew the GHC compiler was willing to make transformations that are valid only in the absence of this construct.  This too is well known.  A proper reformulation of the equational theory was given by Patricia Johann a few years ago as a step towards solving it.  (If the GHC compiler is no longer unsafe in this respect, it would be good to know.)

I’ve asked around a little bit, including some of the Haskell insiders, whether it is equally well known that the typed exception mechanism in Haskell is unsound.  From what I can tell this seems not to be as well-understood, so let me explain the situation as I see it, and I am sure I will be quickly corrected if I am wrong.  The starting point for me was the realization that in Haskell pure code (outside of the IO monad) may raise an exception, and, importantly for my point, the exceptions are user-defined.  Based on general semantic considerations, it seemed to me that this cannot possibly be sound, and Karl Crary helped me to isolate the source of the problem.

The difficulty is really nothing to do with exceptions per se, but rather with exception values.  (So my point has nothing whatsoever to do with imprecise exceptions.)  It seems to me that the root cause is an all-too-common misunderstanding of the concept of typed exceptions as they occur, for example, in Standard ML.  The mistake is a familiar one, the confusion of types with classes; it arises often in discussions related to oop.  To clarify the situation let me begin with a few remarks exception values in ML, and then move on to the issue at hand.

The first point, which is not particularly relevant to the present discussion, is that exceptions have nothing to do with dynamic binding.  The full details are in my book, so I will only summarize the situation here.  Many seem to have the idea that an exception handler, which typically looks like a sequence of clauses consisting of an exception and an action, amounts to a dynamic binding of each exception to its associated handler.  This is not so.  In fact the left-hand side of an exception clause is a pattern, including a variable (of which a wild card is one example), or nested patterns of exceptions and values.  I realize that one may implement the exception mechanism using a single dynamically bound symbol holding the current exception handler, but this implementation is but one of many possible ones, and does not in any case define the abstraction of exceptions.  Exceptions have no more to do with dynamic binding than does the familiar if-then-else available in any language.

The second point, which is pertinent to this discussion, is that exceptions have only one type.  You read that right: the typed exception mechanism in Standard ML is, in fact, uni-typed (where I have I heard that before?).  It is perfectly true that in Standard ML one may associate a value of any type you like with an exception, and this value may be communicated to the handler of that exception.  But it is perfectly false to say that there are multiple types of exceptions; there is only one, but it has many (in fact, “dynamically many”) classes.  When you declare, say, exception Error of string in Standard ML you are introducing a new class of type string->exn, so that Error s is an exception value of type exn carrying a string.  An exception handler associates to a computation of type α a function of type exn->α that handles any exceptions thrown by that computation.  To propagate uncaught exceptions, the handler is  implicitly post-composed with the handler fn x => raise x.  Pattern matching recovers the type of the associated value in the branch that handles a particular exception.  So, for example, a handler of the form Error x => exp propagates the fact that x has type string into the expression exp, and similarly for any other exception carrying any other type.  (Incidentally, another perfectly good handler is Error “abcdef” => exp, which handles only an Error exception with associated value “abcdef”, and no other.  This debunks the dynamic binding interpretation mentioned earlier.)

The third point is that exception classes are dynamically generated in the sense that each evaluation of an exception declaration generates a fresh exception.  This is absolutely essential for modularity, and is extremely useful for a wide variety of other purposes, including managing information flow security within a program. (See Chapter 34 of my book for a fuller discussion.)  O’Caml attempted to impose a static form of exceptions, but it is now recognized that this does not work properly in the presence of functors or separate compilation.  This means that exception declarations are inherently stateful and cannot be regarded as pure.

This got me to wondering how Haskell could get away with user-defined typed exceptions in “pure” code.  The answer seems to be “it can’t”, as the following example illustrates:

import Control.Exception

import Data.Typeable

newtype Foo = Foo (() -> IO ())

{- set Foo’s TypeRep to be the same as ErrorCall’s -}

instance Typeable Foo where

typeOf _ = typeOf (undefined :: ErrorCall)

instance Show Foo where  show _ = “”

instance Exception Foo

main = Control.Exception.catch (error “kaboom”) (\ (Foo f) -> f ())

If you run this code in GHC, you get “internal error: stg_ap_v_ret”, which amounts to “going wrong.”  The use of exceptions is really incidental, except insofar as it forces the use of the class Typeable, which is exploited here.

How are we to understand this unsoundness?  My own diagnosis is that typed exceptions are mis-implemented in Haskell using types, rather than classes.  There is no need in ML for any form of type casting to implement exceptions, because there is exactly one type of exception values, albeit one with many classes.  Haskell, on the other hand, tries to have exceptions with different types.  This immediately involves us in type casting, and hilarity ensues.

The problem appears to be difficult to fix, because to do so would require that exception values be declared within the IO monad, which runs against the design principle (unavoidable, in my opinion) that exceptions are permissible in pure code.  (Exceptions are, according to Aleks Nanevski, co-monadic, not monadic.)  Alternatively, since Haskell lacks modularity, a possible fix is to permit only static exceptions in essentially the style proposed in O’Caml.  This amounts to collecting up the exception declarations on a whole-program basis, and implicitly making a global data declaration that declares all the exception constructors “up front”.  Exception handlers would then work by pattern matching, and all would be well.  Why this is not done already is a mystery to me; the current strategy looks a lot like a kludge once you see the problem with safety.

Insofar as my interpretation of what is going on here is correct, the example once again calls into question the dogma of the separation of Church from state as it is realized in Haskell.  As beguiling as it is, the idea, in its current form, simply does not work.  Dave MacQueen recently described it to me as a “tar baby”, from the Brer Rabbit story: the more you get involved with it, the more entangled you become.

## для моих русских читателей

February 28, 2012

Владислав Натаров <4kcheshirecat@gmail.com> has kindly translated my blog posts to date into Russian.  I cannot vouch for the accuracy of the translation, but I am grateful for the effort, and hope that this will be of interest to my Russian readers.

## Words matter

February 1, 2012

Yesterday, during a very nice presentation by Ohad Kammar at Carnegie Mellon, the discussion got derailed, in part, because of a standard, and completely needless, terminological confusion involving the word “variable”.  I’m foolish enough to try to correct it.

The problem is that we’ve all been taught to confuse variables with variables—that is, program variables with mathematical variables.  The distinction is basic.  Since time immemorial (well, at least since al Khwarizmi) we have had the notion of a variable, properly so-called, which is given meaning by substitution.  A variable is an unknown, or indeterminate, quantity that can be replaced by any value of its type (a type being, at least since Russell, the range of significance of a variable).  Frege gave the first systematic study of the quantifiers, and Church exploited the crucial concept of a variable to give the most sharply original and broadly applicable model of computation, the $\lambda$-calculus.

Since the dawn of Fortran something that is not a variable has come to be called a variable.  A program variable, in the sense of Fortran and every imperative language since, is not given meaning by substitution.  Rather, it is given meaning by (at least) two operations associated with it, one to get its contents and one to put new contents into it.  (And, maybe, an operation to form a reference to it, as in C or even Algol.)  Now as many of you know, I think that the concept of a program variable in this sense is by and large a bad idea, or at any rate not nearly as important as it has been made out to be in conventional (including object-oriented) languages, but that’s an argument for another occasion.

Instead, I’m making a plea.  Let’s continue to call variables variables.  It’s a perfectly good name, and refers to what is perhaps one of the greatest achievements of the human mind, the fundamental concept of algebra, the variable.  But let’s stop calling those other things variables!  In my Practical Foundations for Programming Languages I coined (as far as I know) a word that seems perfectly serviceable, namely an assignable.  The things called variables in imperative languages should, rather, be called assignables.  The word is only a tad longer than variable, and rolls off the tongue just as easily, and has the advantage of being an accurate description of what it really is.  What’s not to like?

Why bother?  For one thing, some languages have both concepts, a necessity if you want your language to be mathematically civilized (and you do).  For another, in the increasingly important world of program verification, the specification formalisms, being mathematical in nature, make use of variables, which most definitely are not assignables!  But the real reason to make the distinction is, after all, because words matter.  Two different things deserve to have two different names, and it only confuses matters to use the same word for both.  This week’s confusion was only one example of many that I have seen over the years.

So, my suggestion: let’s call variables variables, and let’s call those other things assignables.  In the fullnesss of time (i.e., once the scourge of imperative programming has been lifted) we may not need the distinction any longer.  But until then, why not draw the distinction properly?

Addendum: It seems worth mentioning that in PFPL I have a novel (afaik) treatment of the concept of a reference, which is clarified in a subsequent post.

## Heartbeat

September 26, 2011

Just a note in response to several requests to say that I am still around, and plan to resume blogging soon.   I have been distracted by more pressing concerns, including teaching at the Oregon Programming Languages Summer School (OPLSS), preparing a submission to  POPL, visiting the Max Planck Institute for Software Systems, and trying hard to finish my book (on which I’ve made substantial progress over the summer).

## Of Course ML Has Monads!

May 1, 2011

Examined from the point of view of ML, monads are but a particular of use of modules.  The signature of monads is given by the definition

signature MONAD = sig
val ret : 'a -> 'a monad
end

There are many, many, many structures that satisfy this signature; I needn’t (and, in any case, can’t) rehearse them all here.  One particularly simple example should suffice to give the general idea:

structure Option : MONAD = struct
type 'a monad = 'a option
fun ret x = SOME x
fun bnd (SOME x) k = k x
| bnd NONE k = NONE
end

This is of course the option monad, which is sometimes used to model the data flow aspects of exceptions, perhaps with some elaboration of the NONE case to associate an exceptional value with a non-result.  (The control flow aspects are not properly modeled this way, however.  For that one needs, in addition, access to some sort of jump mechanism.)

Examples like this one proliferate.  A monad is represented by a structure.  Any structure that provides the facilities specified by the MONAD signature gives rise to the characteristic sequentialization mechanisms codified by it.  Monad transformers are functors that transform one monad into another, with no fuss or bother, and no ad hoc mechanisms required.  Standard modular programming techniques suffice to represent monads; moreover, the techniques involved are fully general, and are equally applicable to other signatures of interest (arrows, or quivers, or bows, or what have you).  Moreover, it is shown in my paper with Chakravarty and Dreyer how to integrate modules into the type inference mechanism of ML so that one can get automatic functor instantiation in those limited cases where it is self-evident what is intended.  This has been implemented by Karl Crary in a prototype compiler for an extension of Standard ML, and it would be good to see this supported in more broadly available compilers for the language.

The bulk of the mania about monads is therefore accounted for by modules.  I have no doubt, however, that you are wondering about the infamous IO monad in Haskell (and it’s associated work-around, unsafePerformIO).  Isn’t that a fundamental feature of the language that cannot be replicated in ML?  Hardly!  It’s entirely a matter of designing the signatures of the standard basis library modules, and nothing more.  The default basis library does not attempt to segregate effects into a monad, but it is perfectly straightforward to do this yourself, by providing your own layer over the standard basis, or to reorganize the standard basis to enforce the separation.  For example, the signature of reference cells might look like this:

signature REF = sig
type 'a ref
val ref : 'a -> 'a ref IO.monad
val ! : 'a ref -> 'a IO.monad
val := : 'a ref -> 'a -> unit IO.monad
end

Here we are presuming that we have a fixed declaration

structure IO : MONAD = ...

that packages up the basic IO primitives that are already implemented in the run-time system of ML, more or less like in Haskell.  The other signatures, such as those for mutable arrays or for performing input and output, would be modified in a similar manner to push effects into the IO monad.  Et voila, you have monadic effects, just like in Haskell.

There’s really nothing to it.  In fact, the whole exercise was carried out by a Carnegie Mellon student, Phillippe Ajoux, a couple of years ago.  He also wrote a number of programs in this style just to see how it all goes: swimmingly.  He also devised syntactic extensions to the Moscow ML compiler that provide a nicer notation for programming with monads, much as in Haskell, but better aligned with ML’s conventions.  (Ideally it should be possible to provide syntactic support for any signature, not just monads, but I’m not aware of a worked-out design for the general case, involving as it would an intermixing of parsing and elaboration.)

My point is that the ML module system can be deployed by you to impose the sorts of effect segregation imposed on you by default in Haskell.  There is nothing special about Haskell that makes this possible, and nothing special about ML that inhibits it.  It’s all a mode of use of modules.

So why don’t we do this by default?  Because it’s not such a great idea.  Yes, I know it sounds wonderful at first, but then you realize that it’s pretty horrible.  Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming.  You cannot easily convert between functional and monadic style without a radical restructuring of code.  And you inevitably need unsafePerformIO to get anything serious done.  In practical terms, you are deprived of the useful concept of a benign effect, and that just stinks!

The moral of the story is that of course ML “has monads”, just like Haskell.  Whether you want to use them is up to you; they are just as useful, and just as annoying, in ML as they are in Haskell.  But they are not forced on you by the language designers!

Update: This post should’ve been called “ML Has Monads, Why Not?”, or “Of Course ML Has Comonads!”, but then no one was wondering about that.

Update: I now think that the last sentence is excessive.  My main point is simply that it’s very simple to go one way or the other with effects, if you have modules to structure things; it’s all a matter of library design.  A variant of ML that enforced the separation of effects is very easily constructed; the question is whether it is useful or not.  I’ve suggested that the monadic separation is beguiling, but not clearly a great idea.  Alternatively, one can say that we’re not that far away from eliminating laziness from Haskell, at least in this respect: just re-do the standard basis library in ML, and you’re a good ways there.  Plus you have modules, and we understand how to integrate type classes with modules, so the gap is rather small.

## The Point of Laziness

April 24, 2011

As I’ve discussed previously, there are a number of good reasons why Haskell is not suitable for teaching introductory functional programming.  Chief among these is laziness, which in the context of a pure functional language has fatal side effects.  First, Haskell suffers from a paucity of types.  It is not possible in Haskell to define the type of natural numbers, nor the type of lists of natural numbers (or lists of anything else), nor any other inductive type!  (In Carollian style there are types called naturals and lists, but that’s only what they’re called, it’s not what they are.)  Second, the language has problematic cost model.  It is monumentally difficult to reason about the time, and especially space, usage of a Haskell program.  Worse, parallelism arises naturally in an eager, not a lazy, language—for example, computing every element of a finite sequence is fundamental to parallel computing, yet is not compatible with the ideology of laziness, which specifies that we should only compute those elements that are required later.

The arguments in favor of laziness never seem convincing to me.  One claim is that the equational theory of lazy programs is said to be more convenient; for example, beta reduction holds without restriction.  But this is significant only insofar as you ignore the other types in the language.  As Andrzej Filinski pointed out decades ago, whereas lazy languages have products, but not sums, eager languages have sums, but not products.  Take your pick.  Similarly, where lazy languages rely on strictness conditions, eager languages rely on totality conditions.  The costs and benefits are dual, and there seems to be no reason to insist a priori on one set of equations as being more important than the other.

Another claim is that laziness supports the definition of infinite data types, such as infinite sequences of values of some type.  But laziness is not essential, or even particularly useful, for this purpose.  For example, the type nat->nat is a natural representation of infinite sequences of natural numbers that supports many, but not all, of the operations that finite sequences (but not, for example, operations such a reverse, which make no sense in the infinite case).   More generally, there is no inherent connection between laziness and such infinitary types.  Noam Zeilberger has developed an elegant theory of eager and lazy types based on distinguishing positive from negative polarities of type constructors, the positive including the inductive and the negative including the coinductive.   Coinductive types are no more about laziness than inductive types are about pointers.

I wish to argue that laziness is important, but not for pure functional programming, but rather only in conjunction with effects.  This is the Kahn-MacQueen Principle introduced in the 1970′s by Gilles Kahn and David MacQueen in their seminal paper on recursive networks of stream transducers.  Dan Licata and I have emphasized this viewpoint in our lectures on laziness in our new course on functional programming for freshmen.

Let’s use streams as a motivating example, contrasting them with lists, with which they are confused in Haskell.  A list is an example of a positive type, one that is defined by its membership conditions (constructors).  Defining a function on a list amounts to pattern matching, giving one case for each constructor (nil and cons), and using recursion to apply the function to the tail of the list.  A stream is an example of a negative type, one that is defined by its behavioral conditions (destructors).  Defining a stream amounts to defining how it behaves when its head and tail are computed.  The crucial thing about lists, or any positive type, is that they are colimits; we know as part of their semantics how a value of list type are constructed.  The crucial thing about streams, or any negative type, is that they are limits; we know as part of their semantics how they behave when destructed.

Since we have no access to the “inside” of a stream, we should think of it not as a static data structure, but as a dynamic process that produces, upon request, successive elements of the stream.  Internally, the stream keeps track of whatever is necessary to determine successive outputs; it has its own state that is not otherwise visible from the outside.  But if a stream is to be thought of as given by a process of generation, then it is inherently an ephemeral data structure.  Interacting with a stream changes its state; the “old” stream is lost when the “new” stream is created.  But, as we have discussed previously, ephemeral data structures are of limited utility.  The role of memoization is to transform an ephemeral process into a persistent data structure by recording the successive values produced by the process so that they can be “replayed” as necessary to permit the stream to have multiple futures.  Thus, rather than being a matter of efficiency, memoization is a matter of functionality, providing a persistent interface to an underlying ephemeral process.

To see how this works in practice, let’s review the signatures PROCESS and STREAM that Dan Licata and I developed for our class.  Here’s a snippet of the signature of processes:

signature PROCESS = sig
type 'a process = unit -> 'a option
val stdin : char process
val random : real process
end

A process is a function that, when applied, generates a value of some type, or indicates that it is finished.  The process stdin represents the Unix standard input; the process random is a random number generator.  The signature of streams looks essentially like this:

signature STREAM = sig
type 'a stream
datatype 'a front = Nil | Cons of 'a * 'a stream
val expose : 'a stream -> 'a front
val memo : 'a Process.process -> 'a stream
val fix : ('a stream -> 'a stream) -> 'a stream
end

The type ‘a front is the type of values that arise when a stream is exposed; it can either terminate, or present an element and another stream.  The memo constructor creates a persistent stream from an ephemeral process of creation for its elements.  The fix operation is used to create recursive networks of streams.  There are other operations as well, but these illustrate the essence of the abstraction.

Using these signatures as a basis, it is extremely easy to put together a package of routines for scripting.  The fundamental components are processes that generate the elements of a stream.  Combinators on streams, such as composition or mapping and reducing, are readily definable, and may be deployed to build up higher levels of abstraction.  For example, Unix utilities, such as grep, are stream transducers that take streams as inputs and produce streams as outputs.  These utilities do not perform input/output; they merely transform streams.  Moreover, since streams are persistent, there is never any issue with “buffering” or “lookahead” or “backtracking”; you just manipulate the stream like any other (persistent) data structure, and everything works automagically.  The classical Bell Labs style of intermixing I/O with processing is eliminated, leading not only to cleaner code, but also greater flexibility and re-use.  This is achieved not by the double-backflips required by the inheritance mechanisms of oopl’s, but rather by making a crisp semantic distinction between the processing of streams and the streaming of processes.  True reuse operates at the level of abstractions, not at the level of the code that gives rise to them.

Update: It seems worthwhile to point out that memoization to create a persistent from an ephemeral data structure is a premier example of a benign effect, the use of state to evince functional behavior.  But benign effects are not programmable in Haskell, because of the segregation of effects into the IO monad.

Update: Lennart Augustsson gives his reasons for liking laziness.

Follow