Referential transparency

February 9, 2012

After reading some of the comments on my Words Matter post, I realized that it might be worthwhile to clarify the treatment of references in PFPL. Most people are surprised to learn that I have separated the concept of a reference from the concept of mutable storage. I realize that my treatment is not standard, but I consider that one thing I learned in writing the book is how to do formulate references so that the concept of a reference is divorced from the thing to which it refers. Doing so allows for a nicely uniform account of references arising in many disparate situations, and is the basis for, among other things, a non-standard, but I think clearer, treatment of communication in process calculi.

It’s not feasible for me to reproduce the entire story here in a short post, so I will confine myself to a few remarks that will perhaps serve as encouragement to read the full story in PFPL.

There is, first of all, a general concept of a symbol, or name, on which much of the development builds. Perhaps the most important thing to realize about symbols in my account is that symbols are not forms of value. They are, rather, a indices for an infinite, open-ended (expansible) family of operators for forming expressions. For example, when one introduces a symbol a as the name of an assignable, what one is doing is introducing two new operators, get[a] and set[a], for getting and setting the contents of that assignable. Similarly, when one introduces a symbol a to be used as a channel name, then one is introducing operators send[a] and receive[a] that act on the channel named a. The point is that these operators interpret the symbol; for example, in the case of mutable state, the get[a] and set[a] operators are what make it be state, as opposed to some other use of the symbol a as an index for some other operator. There is no requirement that the state be formulated using “cells”; one could as well use the framework of algebraic effects pioneered by Plotkin and Power to give a dynamics to these operators. For present purposes I don’t care about how the dynamics is formulated; I only care about the laws that it obeys (this is the essence of the Plotkin/Power approach as I understand it).

Associated with each symbol a is a type that is, I hasten to emphasize not the type of the symbol a itself (the symbol inhabits no type), but is merely associated with it. For example, if a has the associated type nat, then get[a] will be a command returning a number, and set[a](e) will take a number, e, as argument. Similarly for channels, and other uses of symbols as indices of other operators.

Given a symbol a with associated type T, one may form a reference to a, written &a, as a value whose type, in the case of a reference to an assignable, will be, say, T assignable reference, or similar terminology. This type is interpreted by elimination forms that take a value as argument, and transition to the corresponding operator for that assignable: getref(&a) transitions to get[a], and setref(&a;e) transitions to set[a](e). Similar rules govern other uses of symbols. The “ref” operations simply dereference the symbol (determine which assignable it references, in the case of assignables), and defer to the underlying interpretation of the referenced symbol. In the case of process calculi channel passing is effected using references to channels, and there are operations that determine an event based on the referent of a value of channel reference type. (This greatly clarifies, in my view, some messiness in the π-calculus, and leads to a much better-behaved theory of process equivalence.)

The scoping of symbols, or names, is handled independently of their interpretation. One may use either a scoped or a scope-free interpretation of symbols. Scoped symbols have a limited extent; free symbols have unlimited extent. The concept of a mobile type, adapted from our work on ML5, manages the interaction between scoped declarations and evaluation (for which see the book). One consequence is that one may form references to both scoped and unscoped assignables, independently of the scope discipline. In other words I am deliberately breaking the conventional (and, I think, mistaken) linkage between the extent of an assignable and the formation of a reference to it. Usually people conflate reference with state and with global extent; I am deliberately not following that convention, because I think it muddles things up in ways that are better handled in the manner I’m describing. So, for example, in my account of Algol I have no difficulty in allowing references to scoped assignables, and passing them as arguments to procedures, for example. I found this surprising, at first, but then quite natural once I had dissected the situation more carefully.

Finally, let me mention that the critical property of assignables (in fact, symbols in general) is that they admit disequality. Two different assignables, say a and b, can never, under any execution scenario, be aliases for one another: an assignment to one will never affect the content of the other. Aliasing is a property of references, because two references, say bound to variables x and y, may refer to the same underlying assignable, but two different assignables can never be aliases for one another. This is why “assignment conversion” as it is called in Scheme is not in any way an adequate response to my plea for a change of terminology! Assignables and variables are different things, and references are yet different from those.


Words matter

February 1, 2012

Yesterday, during a very nice presentation by Ohad Kammar at Carnegie Mellon, the discussion got derailed, in part, because of a standard, and completely needless, terminological confusion involving the word “variable”.  I’m foolish enough to try to correct it.

The problem is that we’ve all been taught to confuse variables with variables—that is, program variables with mathematical variables.  The distinction is basic.  Since time immemorial (well, at least since al Khwarizmi) we have had the notion of a variable, properly so-called, which is given meaning by substitution.  A variable is an unknown, or indeterminate, quantity that can be replaced by any value of its type (a type being, at least since Russell, the range of significance of a variable).  Frege gave the first systematic study of the quantifiers, and Church exploited the crucial concept of a variable to give the most sharply original and broadly applicable model of computation, the \lambda-calculus.

Since the dawn of Fortran something that is not a variable has come to be called a variable.  A program variable, in the sense of Fortran and every imperative language since, is not given meaning by substitution.  Rather, it is given meaning by (at least) two operations associated with it, one to get its contents and one to put new contents into it.  (And, maybe, an operation to form a reference to it, as in C or even Algol.)  Now as many of you know, I think that the concept of a program variable in this sense is by and large a bad idea, or at any rate not nearly as important as it has been made out to be in conventional (including object-oriented) languages, but that’s an argument for another occasion.

Instead, I’m making a plea.  Let’s continue to call variables variables.  It’s a perfectly good name, and refers to what is perhaps one of the greatest achievements of the human mind, the fundamental concept of algebra, the variable.  But let’s stop calling those other things variables!  In my Practical Foundations for Programming Languages I coined (as far as I know) a word that seems perfectly serviceable, namely an assignable.  The things called variables in imperative languages should, rather, be called assignables.  The word is only a tad longer than variable, and rolls off the tongue just as easily, and has the advantage of being an accurate description of what it really is.  What’s not to like?

Why bother?  For one thing, some languages have both concepts, a necessity if you want your language to be mathematically civilized (and you do).  For another, in the increasingly important world of program verification, the specification formalisms, being mathematical in nature, make use of variables, which most definitely are not assignables!  But the real reason to make the distinction is, after all, because words matter.  Two different things deserve to have two different names, and it only confuses matters to use the same word for both.  This week’s confusion was only one example of many that I have seen over the years.

So, my suggestion: let’s call variables variables, and let’s call those other things assignables.  In the fullnesss of time (i.e., once the scourge of imperative programming has been lifted) we may not need the distinction any longer.  But until then, why not draw the distinction properly?

Addendum: It seems worth mentioning that in PFPL I have a novel (afaik) treatment of the concept of a reference, which is clarified in a subsequent post.


Heartbeat

September 26, 2011

Just a note in response to several requests to say that I am still around, and plan to resume blogging soon.   I have been distracted by more pressing concerns, including teaching at the Oregon Programming Languages Summer School (OPLSS), preparing a submission to  POPL, visiting the Max Planck Institute for Software Systems, and trying hard to finish my book (on which I’ve made substantial progress over the summer).


Types and Cells

June 7, 2011

The doctrine of computational trinitarianism implies that there should be a categorial analogue of two-dimensional type theory … and indeed there is.  It is called, oddly enough, two-dimensional category theory, which arose first for numerous reasons.  What is two-dimensional category theory?  And how does it relate to two-dimensional type theory?

Two-dimensional category theory may be seen as an instance of the general concept of enriched categories in which the collection of morphisms between any two objects is endowed with some additional structure. A locally small category is one for which the maps between any two objects form a set, called the hom set.  An order-enriched category is one in which the morphisms form an ordered set, which is a set equipped with a pre-order (reflexive and transitive relation) on maps such that composition is monotone.  Order-enriched categories were used by Mitchell Wand and Gordon Plotkin and Mike Smyth in their development of the category-theoretic interpretation of recursive types.  Specifically, an O-category is an order-enriched category whose objects are cpo’s (pre-orders closed under sup’s of chains, and hence containing a least element) and whose hom sets are themselves cpo’s and for which composition is continuous (monotone and preserves suprema of chains).

More generally still, one may  consider (different forms of) category-enriched categories, in which the hom sets are structured as categories with their own vertical notion of identity and composition that meshes well with the horizontal notion of identity and composition of the category itself.  (I will not give here the precise meaning of “meshes well with”, since there are several possible notions and I will be discussing this structure in the type-theoretic setting below.)  The vertical maps are maps between maps that can be interpreted in several ways.  One way is as “witnesses” to an ordering relation among maps, viewing a category as a “pre-order with witnesses”, perhaps an approximation relation among maps such as arises in domain theory.  Another is to view these maps as homotopies between continuous functions on a space; these are the continuous deformations of one map into another.  (This is usually expressed by introducing a “time” coordinate ranging over the unit interval, and stating that a homotopy h between maps f and g is a continuous mapping such that at time zero h agrees with f, at time 1 agrees with g, and smoothly varies at times between.)  An important special case of category-enriched categories are those whose morphisms form (not just a category but also) a groupoid, a category in which each map is invertible.  These can be thought of as “equivalence relations with evidence”; the vertical maps are “proofs” that two horizontal maps are equivalent (for example, homotopic).

The sort of category-enriched categories that I have in mind are called (strict) 2-categories.  And as the terminology suggests, there is nothing special about the number 2 here; one can consider n-categories for any number n, and even \infty-categories that incorporate all of these into one grand structure (strictly or weakly).  To help manage this structure, the categorists have a uniform terminology that I wish to introduce here, called cells.  The objects in a category are the 0-cells, the maps are the 1-cells, the maps between maps (transformations, homotopies, …) are the 2-cells, and so forth.  A 2-category is one for which we stop at 2-cells; all higher cells being degenerate and hence ignored.  Often the 2-cells (and higher) form groupoids (in one of two senses that I will come back to later); a (2,1)-category is a 2-category in which the 2-cells form a groupoid (but the 1-cells need only form a category).  The class of (\infty,1)-categories are the subject of intense interest these days, in part because of their importance in homotopy theory.  They are, however, notoriously hard to manage, and, from what I understand, it is not yet clear what is even the correct definition.

Returning to two-dimensional type theory, and applying the types-as-categories correspondence, we get that types are 0-cells, terms (with free variables) are 1-cells, and transformations are 2-cells.  General terms are not invertible, but transformations are, giving rise to a (strict) groupoid structure expressed by the judgement \alpha::M\simeq N:A.  The rules for composition of transformations expresses the vertical structure as forming a strict groupoid.  Ordinary substitution expresses the horizontal structure of terms as maps (plugging in terms for free variables).  But how do these interact?  In strict two-dimensional type theory this interaction is expressed by definitional equalities that express the interchange law between horizontal and vertical composition.  Specifically, we have the following rule:

\displaystyle{  {x:B\vdash \alpha::M\simeq N:A \qquad \vdash \beta::P\simeq Q:B}\over{\alpha[\beta]::\textit{map}\{x:B.A\}[\beta](M[P/x])\simeq N[Q/x]:A[Q/x]}  }

(More generally one must consider arbitrary contexts, and transformations between substitutions, but this instance of the general case expresses the essential idea.)

This “substitution” principle must satisfy some equational laws that amount to the following conditions:

  1. Identity: \alpha[\textit{id}]\equiv\alpha.
  2. Composition: \alpha[\beta[\gamma]]\equiv \alpha[\beta][\gamma].
  3. Interchange: (\alpha\circ\alpha')[\beta\circ\beta'] = \alpha[\beta] \circ \textit{resp}\{map[\beta]\} (\alpha'[\beta']).
  4. Delegation: (\textit{id}_M)[\delta]\equiv M[\delta].

(Here \textit{resp} is the application of a transformation to a term; it is there to “get the types right”.)  The interchange law is the type-theoretic analogue of the required interaction between 1-composition and 2-composition in a (strict) 2-category.

This is the judgmental structure of strict two-dimensional type theory.  There are several directions we can go from here.  One is to clarify the strict/weak distinction that has been underlying our discussions so far.  Another is to consider how to extend this to three- and higher-dimensional (or, dare I say it, infinite-dimensional) type theory.  The crucial ingredient for both of these discussions is one particular type constructor (more precisely, family of type constructors) that internalizes the hom structure as a type.  In the groupoidal case, where the higher structure is symmetric (invertible), this corresponds to (a version of) the Martin-Löf identity type.  The introduction rule is reflexivity, and the elimination rule is, in essence, the Yoneda Lemma from category theory.  When interpreted in terms of homotopy theory, this is the type of paths in a space, which is critical for defining the fundamental group(oid) of a space, and to clarifying the hinted-at distinction between weak and strict structure in higher dimensions.  More on that later!


Transformations as strict groupoids

May 30, 2011

The distinguishing feature of higher-dimensional type theory is the concept of equivalence of the members of a type that must be respected by all families of types.  To be sufficiently general it is essential to regard equivalence as a structure, rather than a property.  This is expressed by the judgement

\displaystyle \Gamma\vdash \alpha::M\simeq N:A

which states that M and N are equivalent members of type A, as evidenced by the transformation \alpha.  Respect for equivalence is ensured by the rule

\displaystyle{{\Gamma,x:A\vdash B\,\textsf{type}\quad  \Gamma\vdash \alpha :: M\simeq N:A \quad  \Gamma\vdash P:B[M/x]}\over  {\Gamma\vdash \textit{map}\{x:A.B\}[\alpha](P):B[N/x]}},

which states that equivalent members determine equivalent instances of a family of types.  The equivalence between instances is mediated by the operation \textit{map}\{x:A.B\}[\alpha](-), which sends members of B[M/x] to members of B[N/x].  We call this mapping the action of the family x:A.B on the transformation \alpha.

For reasons that will only become apparent as we go along, it is important that “equivalence” really be an equivalence: it must be, in an appropriate sense, reflexive, symmetric, and transitive.  The “appropriate sense” is precisely that we require the existence of transformations

\displaystyle{\Gamma\vdash \textit{id}::M\simeq M:A}

\displaystyle{{\Gamma\vdash\alpha::M\simeq N:A}\over{\Gamma\vdash\alpha^{-1}::N\simeq M:A}}

\displaystyle{{\Gamma\vdash \beta:N\simeq P:A\quad  \Gamma\vdash \alpha:M\simeq N:A}\over{\Gamma\vdash\beta\circ\alpha:M\simeq P:A}}

Moreover, these transformations must be respected by the action of any family, in a sense that we shall make clear momentarily.  Before doing so, let us observe that these transformations constitute the operations of a groupoid, which we may think of either as an equivalence relation equipped with evidence or a category in which every map is invertible (a generalized group).  While the former interpretation may not suggest it, the latter formulation implies that we should impose some requirements on how these transformations interact, namely the axioms of a groupoid:

  1. Composition (multiplication) is associative: \gamma\circ(\beta\circ\alpha)\equiv (\gamma\circ\beta)\circ\alpha::M\simeq N:A.
  2. Identity is the unit of composition: \textit{id}\circ\alpha\equiv\alpha::M\simeq N:A and \alpha\circ\textit{id}\equiv\alpha::M\simeq N:A.
  3. Inverses cancel: \alpha^{-1}\circ\alpha\equiv\textit{id}::M\simeq M:A and \alpha\circ\alpha^{-1}\equiv\textit{id}::N\simeq N:A.
These conditions, which impose equalities on transformations, demand that the second-dimensional structure of a type form a strict groupoid.  I will come back to an important weakening of these requirements later.

We further require that the action of a type family preserve the groupoid structure.  For this it is enough to require that it preserve identities and composition:

\displaystyle{\textit{map}\{x:A.B\}[\textit{id}](-) \equiv \textit{id}(-):B[M/x]}

and

\displaystyle{\begin{array}{c}\textit{map}\{x:A.B\}[\beta\circ\alpha](-)\\\equiv\\\textit{map}\{x:A.B\}[\beta](\textit{map}\{x:A.B\}[\alpha](-))\end{array}}.

Thinking of a groupoid as a category, these conditions state that the action of a type family be (strictly) functorial.  (Here again we are imposing strong requirements in order to facilitate the exposition; eventually we will consider a relaxation of these conditions that will admit a richer range of applications.)

(The alert reader will note that I have not formally introduced the concept of a transformation between types, nor the equality of these, into the theory.  There are different ways to skin this cat; for now, I will be a bit loose about the axiomatics in order to focus attention on the main ideas.  Rest assured that everything can be made precise!)

By demanding that the groupoid axioms hold strictly (as equalities) and that the action of families be strictly functorial, we have simplified the theory considerably by restricting it to dimension 2.  To relax these restrictions requires higher dimensions.  For example, we may demand only that the groupoid conditions hold up to a transformation of transformations, but hold strictly from then on; this is the 3-dimensional case.  Or we can relax all such conditions to hold only up to a higher transformation, resulting in finite dimensional type theory.  Similar considerations will apply to other conditions that we shall impose on the action of families, in particular to specify the action of type constructors on transformations, which I will discuss next time.  The presentation of finite-dimensional type theory will be aided by the introduction of identity types (also called path types).  Identity types avoid the need for an ever-expanding nesting of transformations between transformations between ….  More on that later!


Higher-Dimensional Type Theory

May 30, 2011

Ideas have their time, and it’s not for us to choose when they arrive.  But when they do, they almost always occur to many people at more or less the same time, often in a slightly disguised form whose underlying unity becomes apparent only later.  This is perhaps not too surprising, the same seeds taking root in many a fertile mind.  A bit harder to explain, though, is the moment in time when an idea comes to fruition.  Often all of the ingredients are available, and yet no one thinks to put two-and-two together and draw what seems, in retrospect, to be the obvious inference.  Until, suddenly, everyone does.  Why didn’t we think of that ages ago?  Nothing was stopping us, we just didn’t notice the opportunity!

The recent development of higher-dimensional structure in type theory seems to be a good example.  All of the ingredients have been present since the 1970′s, yet as far as I know no one, until quite recently, no one quite put together all the pieces to expose the beautiful structure that has been sitting there all along.  Like many good ideas, one can see clearly that the ideas were foreshadowed by many earlier developments whose implications are only now becoming understood.  My plan is to explain higher type theory (HTT) to the well-informed non-expert, building on ideas developed by various researchers, including Thorsten Altenkirch, Steve Awodey, Richard Garner, Martin Hofmann, Dan Licata, Peter Lumsdaine, Per Martin-Löf, Mike Shulman, Thomas Streicher, Vladimir Voevodsky, and Michael Warren.  It will be useful in the sequel to be familiar with The Holy Trinity, at least superficially, and preferably well enough to be able to move back and forth between the three manifestations that I’ve previously outlined.

One-dimensional dependent type theory is defined by derivation rules for these four fundamental forms of judgement (and, usually, some others that we suppress here for the sake of concision):

\displaystyle \Gamma\vdash A\,\mathsf{type}

\displaystyle \Gamma\vdash M : A

\displaystyle \Gamma\vdash M \equiv N : A

\displaystyle \Gamma\vdash A\equiv B

A context, \Gamma, consists of a sequence of declarations of variables of the form x_1:A_1,\dots,x_n:A_n, where it is presupposed, for each 1\leq i\leq n, that x_1:A_1,\dots,x_{i-1}:A_{i-1}\vdash A_i\,\mathsf{type} is derivable.

The key notion of dependent type theory is that of a family of types indexed by (zero or more) variables ranging over a type.  The judgement \Gamma\vdash A\,\mathsf{type} states that A is a family of types indexed by the variables given by \Gamma.  For example, we may have \vdash\textit{Nat}\,\textsf{type}, specifying that \textit{Nat} is a closed type (a degenerate family of types), and x{:}\textit{Nat}\vdash\textit{Seq}(x)\,\textsf{type}, specifying that \textit{Seq}(n) is a type (say, of sequences of naturals of length n) for each \vdash n:\textit{Nat}.  The rules of type theory ensure, either directly or indirectly, that the structural properties of the hypothetical/general judgement are valid.  In particular families of types respect equality of indices:

\displaystyle{{\Gamma,x:A\vdash B\,\textsf{type}\quad \Gamma\vdash M\equiv N:A \quad \Gamma\vdash P:B[M/x]}\over  {\Gamma\vdash P:B[N/x]}}.

In words, if B is a family of types indexed by A, and if M and N are equal members of type A, then every member of B[M/x] is also a member of B[N/x].

The generalization to two- (and higher-) dimensional type theory can be motivated in several ways.  One natural source of higher-dimensional structure is a universe, a type whose elements correspond to types.  For example, we may have a universe of sets given as follows:

\displaystyle \vdash \textit{Set}\,\textsf{type}

\displaystyle x:\textit{Set}\vdash \textit{Elt}(x)\,\textsf{type}

\displaystyle \vdash \textit{nat}:\textit{Set}

\displaystyle \vdash \textit{Elt}(\textit{nat})\equiv\textit{Nat}

\displaystyle a:\textit{Set},b:\textit{Set}\vdash a\times b : \textit{Set}

\displaystyle a:\textit{Set},b:\textit{Set}\vdash \textit{Elt}(a\times b)\equiv \textit{Elt}(a)\times\textit{Elt}(b)

and so forth, ensuring that \textit{Set} is closed under typical set-forming operations whose interpretations are given by \textit{Elt} in terms of standard type-theoretic concepts.

In many situations, including much of informal (yet entirely rigorous) mathematics, it is convenient to identify sets that are isomorphic, so that, for example, the sets \textit{nat}\times\textit{nat} and \textit{2}\to\textit{nat} would be interchangeable.  In particular, these sets should have the “same” (type of) elements.  But obviously these two sets do not have the same elements (one consists of pairs, the other of functions, under the natural interpretation of the sets as types), so we cannot hope to treat \textit{Elt}(\textit{nat}\times\textit{nat}) and \textit{Elt}(\textit{2}\to\textit{nat}) as equal, though we may wish to regard them as equivalent in some sense.  Moreover, since two sets can be isomorphic in different ways, isomorphism must be considered a structure on sets, rather than a property of sets.  For example, \textit{2} is isomorphic to itself in two different ways, by the identity and by negation (swapping).  Thus, equivalence of the elements of two isomorphic sets must take account of the isomorphism itself, and hence must have computational significance.

It is precisely the desire to accommodate equivalences such as this that gives rise to higher dimensions in type theory.  Specifically, we introduce two-dimensional structure by adding a new judgement to type theory stating that two members of a type are related by a specified transformation:

\displaystyle \Gamma\vdash \alpha :: M\simeq N : A

Crucially, families of types must respect transformation:

\displaystyle{{\Gamma,x:A\vdash B\,\textsf{type}\quad \Gamma\vdash \alpha :: M\simeq N:A \quad \Gamma\vdash P:B[M/x]}\over  {\Gamma\vdash \textit{map}\{x:A.B\}[\alpha](P):B[N/x]}}.

A transformation should be thought of as evidence of interchangeability of the members of a type; the map operation puts the evidence to work.

Returning to our example of the universe of sets, let us specify that a transformation from one set to another is an pair of functions constituting a bijection between the elements of the two sets:

\displaystyle{  {\begin{array}{c}  \Gamma,x:\textit{Elt}(a)\vdash f(x):\textit{Elt}(b) \\  \Gamma,x:\textit{Elt}(b)\vdash g(x):\textit{Elt}(a) \\  \Gamma,x:\textit{Elt}(a)\vdash g(f(x))\equiv x:\textit{Elt}(a) \\  \Gamma,x:\textit{Elt}(b)\vdash f(g(x))\equiv x:\textit{Elt}(b)  \end{array}}  \over  {\Gamma\vdash\textit{iso}(f,g)::a\simeq b:\textit{Set}}}

(The equational conditions here are rather strong; I will return to this point in a future post.  For now, let us just take this as the defining criterion of isomorphism between two sets.)

Evidence for the isomorphism of two sets induces a transformation on types given by the following equation:

\displaystyle{  {\Gamma\vdash M:\textit{Elt}(a)}\over  {\Gamma\vdash \textit{map}\{\textit{Elt}\}[\textit{iso}(f,g)](M)\equiv f(M) : \textit{Elt}(b)}}

(suppressing the obvious typing premises for f and g).  In words an isomorphism between sets a and b induces a transformation between their elements given by the isomorphism itself.

This, then, is the basic structure of two-dimensional type theory, but there is much more to say!  In future posts I intend to develop the ideas further, including a discussion of these topics:

  1. The definition of \textit{map}\{x:A.B\} is given by induction over the structure of x:A.B.  The above equation covers only one case; there are more, corresponding to each way of forming a family of types x:A.B.  The extension to function types will expose the role of the inverse of the isomorphism between sets.
  2. The judgement \alpha::M\simeq N:A may be internalized as a type, which will turn out to correspond to the identity type in Martin-Löf’s type theory, albeit with a different interpretation given by Altenkirch.  The identity type plays an important role in the extension to all higher dimensions.
  3. To ensure coherence and to allow for greater expressiveness we must also discuss equality and equivalence of transformations and how these influence the induced transformation of families of types.  In particular transformations admit a groupoid structure which expresses reflexivity, symmetry, and transitivity of transformation; these conditions can be considered to hold strongly or weakly, giving rise to different applications and interpretations.
  4. Higher-dimensional type theory admits a fascinating interpretation in terms of homotopy theory which types are interpreted as spaces, members as points in those spaces, and transformations as paths, or homotopies.  This, together with a generalization of the treatment of universes outlined above, is the basis for Voevodsky’s work on univalent foundations of mathematics.
  5. One may consider relaxing the groupoid structure on transformations to a “monoidoid” (that is, category) structure by not requiring symmetry (inverses).  The structure of type theory changes significantly in the absence of symmetry, posing significant open problems, but admitting a wider range of applications of higher-dimensional structure in both CS and mathematics.
To keep up to date with the latest developments in this area, please visit the Homotopy Type Theory blog!

The semester’s over

May 4, 2011

One reason that I started this blog was to record my experiences with a new course on functional programming for freshmen at Carnegie Mellon.  Classes have ended, the final exam is graded, and we are in the process of taking stock of what we accomplished, and what we could do better.  This was the first instance of the course, taught to 83 first-year computer science students who volunteered to be part of the new curriculum.  (This represents about 2/3 of the freshman class.)  All had taken a new course on imperative programming taught by Frank Pfenning the previous semester, and all were simultaneously taking 251, the legendary theory foundation course developed here over the last dozen or so years.

Starting this fall the course will be part of the standard curriculum, and will be taught to a broader range of students, including the electrical engineers and the full class of CS students.  We’re still working out whether the standard order will be imperative, then functional, or functional, then imperative.  Much depends on the mathematical maturity of the incoming students.  Those with weak mathematical skills (the vast majority), but some programming experience, will likely start with imperative programming, which they will take simultaneously with a standard discrete math course taught in the mathematics department.  Those with strong mathematical skills, or high ambitions, will start with functional programming.  A small number will park for one semester, taking discrete math and a programming course primarily intended for non-majors, to bring themselves up to speed.

Our new data structures and algorithms course, being developed by Guy Blelloch, will be offered in the fall for students who have already have the IP and FP classes.  Guy plans to make heavy use of functional programming, and will be stressing parallel algorithms on persistent data structures as the common case, leaving the traditional sequential, ephemeral case to the imperative programming class.  Guy’s course is essentially a continuation (ahem) of the FP class that emphasizes more sophisticated data structures and algorithms, and more complex programming problems.

For those who might be interested, I’m attaching the final exam for the course.  The students did exceptionally well (average score of 88%), even though we feared it was too long and difficult beforehand.  The spread was very narrow, perhaps because the exam was too easy, or because this being a class of volunteers, their skills were unusually strong and uniform.  We’ll know more after the fall once we’ve taught the class to a broader range of students.  I do think the exam is representative of what we expected them to be able to do at the end, and they showed us that they can do it.  A clear success!

15-150 Final Exam (Spring 2011)


Of Course ML Has Monads!

May 1, 2011

A popular meme in the world of PL’s is that “Haskell has monads”, with the implication that this is a distinctive feature of the language, separate from all others.  While it is true that Haskell has popularized the use of monads as a program structuring device, the idea of a monad is not so much an issue of language design (apart from the ad hoc syntactic support provided by Haskell), but rather one of library design.  After all, a monad is just one of a zillion signatures (type classes) with which to structure programs, and there is no particular reason why this one cannot be used in any language that supports even a modicum of modularity.  There is a particular reason why monads had to arise in Haskell, though, which is to defeat the scourge of laziness.  But even in the presence of monads, one still needs things like seq to sequentialize evaluation, because the lazy cost model is so disadvantageous.  In an ironic twist the emphasis on monads in Haskell means that programming in Haskell is rather like programming in an updated dialect of Algol with a richer type structure than the original, but the same overall structure.

Examined from the point of view of ML, monads are but a particular of use of modules.  The signature of monads is given by the definition

signature MONAD = sig
  type 'a monad
  val ret : 'a -> 'a monad
  val bnd : 'a monad -> ('a -> 'b monad) -> 'b monad
end

There are many, many, many structures that satisfy this signature; I needn’t (and, in any case, can’t) rehearse them all here.  One particularly simple example should suffice to give the general idea:

structure Option : MONAD = struct
  type 'a monad = 'a option
  fun ret x = SOME x
  fun bnd (SOME x) k = k x
    | bnd NONE k = NONE
end

This is of course the option monad, which is sometimes used to model the data flow aspects of exceptions, perhaps with some elaboration of the NONE case to associate an exceptional value with a non-result.  (The control flow aspects are not properly modeled this way, however.  For that one needs, in addition, access to some sort of jump mechanism.)

Examples like this one proliferate.  A monad is represented by a structure.  Any structure that provides the facilities specified by the MONAD signature gives rise to the characteristic sequentialization mechanisms codified by it.  Monad transformers are functors that transform one monad into another, with no fuss or bother, and no ad hoc mechanisms required.  Standard modular programming techniques suffice to represent monads; moreover, the techniques involved are fully general, and are equally applicable to other signatures of interest (arrows, or quivers, or bows, or what have you).  Moreover, it is shown in my paper with Chakravarty and Dreyer how to integrate modules into the type inference mechanism of ML so that one can get automatic functor instantiation in those limited cases where it is self-evident what is intended.  This has been implemented by Karl Crary in a prototype compiler for an extension of Standard ML, and it would be good to see this supported in more broadly available compilers for the language.

The bulk of the mania about monads is therefore accounted for by modules.  I have no doubt, however, that you are wondering about the infamous IO monad in Haskell (and it’s associated work-around, unsafePerformIO).  Isn’t that a fundamental feature of the language that cannot be replicated in ML?  Hardly!  It’s entirely a matter of designing the signatures of the standard basis library modules, and nothing more.  The default basis library does not attempt to segregate effects into a monad, but it is perfectly straightforward to do this yourself, by providing your own layer over the standard basis, or to reorganize the standard basis to enforce the separation.  For example, the signature of reference cells might look like this:

signature REF = sig
  type 'a ref
  val ref : 'a -> 'a ref IO.monad
  val ! : 'a ref -> 'a IO.monad
  val := : 'a ref -> 'a -> unit IO.monad
end

Here we are presuming that we have a fixed declaration

structure IO : MONAD = ...

that packages up the basic IO primitives that are already implemented in the run-time system of ML, more or less like in Haskell.  The other signatures, such as those for mutable arrays or for performing input and output, would be modified in a similar manner to push effects into the IO monad.  Et voila, you have monadic effects, just like in Haskell.

There’s really nothing to it.  In fact, the whole exercise was carried out by a Carnegie Mellon student, Phillippe Ajoux, a couple of years ago.  He also wrote a number of programs in this style just to see how it all goes: swimmingly.  He also devised syntactic extensions to the Moscow ML compiler that provide a nicer notation for programming with monads, much as in Haskell, but better aligned with ML’s conventions.  (Ideally it should be possible to provide syntactic support for any signature, not just monads, but I’m not aware of a worked-out design for the general case, involving as it would an intermixing of parsing and elaboration.)

My point is that the ML module system can be deployed by you to impose the sorts of effect segregation imposed on you by default in Haskell.  There is nothing special about Haskell that makes this possible, and nothing special about ML that inhibits it.  It’s all a mode of use of modules.

So why don’t we do this by default?  Because it’s not such a great idea.  Yes, I know it sounds wonderful at first, but then you realize that it’s pretty horrible.  Once you’re in the IO monad, you’re stuck there forever, and are reduced to Algol-style imperative programming.  You cannot easily convert between functional and monadic style without a radical restructuring of code.  And you inevitably need unsafePerformIO to get anything serious done.  In practical terms, you are deprived of the useful concept of a benign effect, and that just stinks!

The moral of the story is that of course ML “has monads”, just like Haskell.  Whether you want to use them is up to you; they are just as useful, and just as annoying, in ML as they are in Haskell.  But they are not forced on you by the language designers!

Update: This post should’ve been called “ML Has Monads, Why Not?”, or “Of Course ML Has Comonads!”, but then no one was wondering about that.

Update: I now think that the last sentence is excessive.  My main point is simply that it’s very simple to go one way or the other with effects, if you have modules to structure things; it’s all a matter of library design.  A variant of ML that enforced the separation of effects is very easily constructed; the question is whether it is useful or not.  I’ve suggested that the monadic separation is beguiling, but not clearly a great idea.  Alternatively, one can say that we’re not that far away from eliminating laziness from Haskell, at least in this respect: just re-do the standard basis library in ML, and you’re a good ways there.  Plus you have modules, and we understand how to integrate type classes with modules, so the gap is rather small.


The Point of Laziness

April 24, 2011

As I’ve discussed previously, there are a number of good reasons why Haskell is not suitable for teaching introductory functional programming.  Chief among these is laziness, which in the context of a pure functional language has fatal side effects.  First, Haskell suffers from a paucity of types.  It is not possible in Haskell to define the type of natural numbers, nor the type of lists of natural numbers (or lists of anything else), nor any other inductive type!  (In Carollian style there are types called naturals and lists, but that’s only what they’re called, it’s not what they are.)  Second, the language has problematic cost model.  It is monumentally difficult to reason about the time, and especially space, usage of a Haskell program.  Worse, parallelism arises naturally in an eager, not a lazy, language—for example, computing every element of a finite sequence is fundamental to parallel computing, yet is not compatible with the ideology of laziness, which specifies that we should only compute those elements that are required later.

The arguments in favor of laziness never seem convincing to me.  One claim is that the equational theory of lazy programs is said to be more convenient; for example, beta reduction holds without restriction.  But this is significant only insofar as you ignore the other types in the language.  As Andrzej Filinski pointed out decades ago, whereas lazy languages have products, but not sums, eager languages have sums, but not products.  Take your pick.  Similarly, where lazy languages rely on strictness conditions, eager languages rely on totality conditions.  The costs and benefits are dual, and there seems to be no reason to insist a priori on one set of equations as being more important than the other.

Another claim is that laziness supports the definition of infinite data types, such as infinite sequences of values of some type.  But laziness is not essential, or even particularly useful, for this purpose.  For example, the type nat->nat is a natural representation of infinite sequences of natural numbers that supports many, but not all, of the operations that finite sequences (but not, for example, operations such a reverse, which make no sense in the infinite case).   More generally, there is no inherent connection between laziness and such infinitary types.  Noam Zeilberger has developed an elegant theory of eager and lazy types based on distinguishing positive from negative polarities of type constructors, the positive including the inductive and the negative including the coinductive.   Coinductive types are no more about laziness than inductive types are about pointers.

I wish to argue that laziness is important, but not for pure functional programming, but rather only in conjunction with effects.  This is the Kahn-MacQueen Principle introduced in the 1970′s by Gilles Kahn and David MacQueen in their seminal paper on recursive networks of stream transducers.  Dan Licata and I have emphasized this viewpoint in our lectures on laziness in our new course on functional programming for freshmen.

Let’s use streams as a motivating example, contrasting them with lists, with which they are confused in Haskell.  A list is an example of a positive type, one that is defined by its membership conditions (constructors).  Defining a function on a list amounts to pattern matching, giving one case for each constructor (nil and cons), and using recursion to apply the function to the tail of the list.  A stream is an example of a negative type, one that is defined by its behavioral conditions (destructors).  Defining a stream amounts to defining how it behaves when its head and tail are computed.  The crucial thing about lists, or any positive type, is that they are colimits; we know as part of their semantics how a value of list type are constructed.  The crucial thing about streams, or any negative type, is that they are limits; we know as part of their semantics how they behave when destructed.

Since we have no access to the “inside” of a stream, we should think of it not as a static data structure, but as a dynamic process that produces, upon request, successive elements of the stream.  Internally, the stream keeps track of whatever is necessary to determine successive outputs; it has its own state that is not otherwise visible from the outside.  But if a stream is to be thought of as given by a process of generation, then it is inherently an ephemeral data structure.  Interacting with a stream changes its state; the “old” stream is lost when the “new” stream is created.  But, as we have discussed previously, ephemeral data structures are of limited utility.  The role of memoization is to transform an ephemeral process into a persistent data structure by recording the successive values produced by the process so that they can be “replayed” as necessary to permit the stream to have multiple futures.  Thus, rather than being a matter of efficiency, memoization is a matter of functionality, providing a persistent interface to an underlying ephemeral process.

To see how this works in practice, let’s review the signatures PROCESS and STREAM that Dan Licata and I developed for our class.  Here’s a snippet of the signature of processes:

signature PROCESS = sig
  type 'a process = unit -> 'a option
  val stdin : char process
  val random : real process
end

A process is a function that, when applied, generates a value of some type, or indicates that it is finished.  The process stdin represents the Unix standard input; the process random is a random number generator.  The signature of streams looks essentially like this:

signature STREAM = sig
  type 'a stream
  datatype 'a front = Nil | Cons of 'a * 'a stream
  val expose : 'a stream -> 'a front
  val memo : 'a Process.process -> 'a stream
  val fix : ('a stream -> 'a stream) -> 'a stream
end

The type ‘a front is the type of values that arise when a stream is exposed; it can either terminate, or present an element and another stream.  The memo constructor creates a persistent stream from an ephemeral process of creation for its elements.  The fix operation is used to create recursive networks of streams.  There are other operations as well, but these illustrate the essence of the abstraction.

Using these signatures as a basis, it is extremely easy to put together a package of routines for scripting.  The fundamental components are processes that generate the elements of a stream.  Combinators on streams, such as composition or mapping and reducing, are readily definable, and may be deployed to build up higher levels of abstraction.  For example, Unix utilities, such as grep, are stream transducers that take streams as inputs and produce streams as outputs.  These utilities do not perform input/output; they merely transform streams.  Moreover, since streams are persistent, there is never any issue with “buffering” or “lookahead” or “backtracking”; you just manipulate the stream like any other (persistent) data structure, and everything works automagically.  The classical Bell Labs style of intermixing I/O with processing is eliminated, leading not only to cleaner code, but also greater flexibility and re-use.  This is achieved not by the double-backflips required by the inheritance mechanisms of oopl’s, but rather by making a crisp semantic distinction between the processing of streams and the streaming of processes.  True reuse operates at the level of abstractions, not at the level of the code that gives rise to them.

Update: It seems worthwhile to point out that memoization to create a persistent from an ephemeral data structure is a premier example of a benign effect, the use of state to evince functional behavior.  But benign effects are not programmable in Haskell, because of the segregation of effects into the IO monad.

Update: Lennart Augustsson gives his reasons for liking laziness.


Some thoughts on teaching FP

April 17, 2011

A number of people have asked some very good questions about the details of how we teach certain concepts in our new functional programming class for freshmen at Carnegie Mellon.  Rather than spray responses among the various comments, I’ll summarize a few major points here in hopes of helping others who may wish to teach a similar course.  So this post is not really meant for a broad audience, but rather for the specialist; feel free to skip it if it seems too focused for your interests.  I promise to write some more controversial material of more general interest soon!  Meanwhile, here are a few thoughts presented in no particular order of importance.

Because the class is intended for beginners, we start from scratch with a language-based model of computation.  This means that, with one regrettable misstep on our part, we never talk about extra-linguistic concepts like “run-time stacks” or “compilers.”  The students are taught to think in terms of the language itself, and to reason directly about both the correctness and efficiency of the code they actually write, not the code that it allegedly compiles to or translates to.  One beautiful feature of the language-based approach is that we start with a very familiar model of computation, the evaluation of polynomials over the reals.  It’s very familiar for all students, and I think they find it very satisfying precisely because it has a direct computational flavor.  You plug in for variables and simplify, and out comes the answer.  We can draw on this as our starting point; programs are just generalized polynomials.  In particular, in a functional language variables are variables: the mathematical concept of variables, which is given meaning by substitution, is precisely the programming concept of variable.  It’s not analogous, it’s the same.  So we can draw on their experience and ask them to prove things about programs using methods that build directly on what they already know.  It’s extremely natural, and very beautiful, and leads easily to an integrated view of mathematics and programming.  Moreover, it’s a level playing field.  Students with prior “programming experience” are, if anything, at a disadvantage, because they think they know things that are either wrong or inapplicable.  One consequence is gender equity, because even with American society being what it is, the women have no particular disadvantage with respect to the men when it comes to our style of thinking and programming.  It’s a win-win-win situation.

Moving to a more technical level, the use of structural operational semantics is indispensable for providing a rigorous foundation for understanding program execution, reasoning about program correctness, and for defining a cost model to support reasoning about asymptotic complexity.  There is no substitute for this!  Without a crisp formulation of the semantics of the language, it is impossible to discuss any of these issues in a meaningful and technically precise way.  With it you can readily resolve any questions about “what happens if …”, giving the students a tool that they can use themselves to answer such questions.  Moreover, as program verification becomes more important in industrial practice, as well as academic research, it is essential that students be educated in the tools of semantics.  Structural operational semantics is very easy to teach, and presents no problems for the students.  We just use it, and they get it without any fuss or bother.  It is a natural extension of their experience with high school algebra.  Be not afraid of using these tools to teach programming!

As I’ve explained previously, it is a very good idea to avoid Booleans as much as possible.  And, above all, don’t mention equality!  The equals sign in programming languages is not the equals sign of mathematics.  Propositions are not Booleans, and it only confuses matters to use notations that encourage this misconception.  Related to this, avoid if-then-else entirely, and instead use only case analysis for branching, even when the value to be discriminated is a Boolean.  We consistently write things like

case Int.compare(x,y) of
  LESS => ...
| GREATER => ...
| EQUAL => ...

rather than a nested conditional branch.  It encourages students to think in terms of pattern matching, and prepares the ground for later developments, including a smooth transition to pattern matching over more complex data structures and reasoning inductively when programming recursively.

Teaching parallelism is completely straightforward, because the model of computation inherently avoids unnatural and unnecessary problems of interference, and focuses attention on the critical issue of data dependencies among computations in a program.  Students have no trouble computing the work (sequential time complexity) or span (parallel time complexity) of a program, and have no problems reading off recurrences for the respective time complexities.  Later, when we introduce sequences, the idea of computing in parallel with the entire sequence, rather than item-by-iterm (as encouraged by the dreadful iterators so beloved in the oo world), comes naturally and easily.  The key to this, of course, is that data structures in a functional language are naturally persistent; it is monstrously hard to use ephemeral data structures in a parallel computation, and is not something we could hope to teach freshmen.

A major decision for us is how to teach the expression and enforcement of abstraction in a program.  In a departure from our previous approach, we have decided against using opaque ascription (sealing) as a means of enforcing abstraction.  It has its virtues, but the problem is that it does not mesh well with other language features, in particular with substructures and type classes (views).  For example, consider the signature of a mapping whose domain is an ordered type of keys:

signature MAPPING = sig
  structure Key : ORDERED
  type 'a mapping
  val lookup : Key.t -> 'a mapping -> 'a
  ...
end

Unfortunately, sealing a structure with this signature renders the module useless:

structure IntMapping :> MAPPING = struct
 structure Key = Int
 type 'a mapping = 'a bst
 ...
end

The trouble is that not only is the type ‘a IntMapping.mapping abstract, as intended, but so is IntMapping.Key.t, which is not at all intended!  To get around this we we must create a specialization of the signature MAPPING using one of several means such as

signature INT_MAPPING = MAPPING where type Key.t=int
structure IntMapping :> INT_MAPPING = ...

Of course one need not name the signature, but this illustrates the general problem.  As things get more complicated, you have more and more clauses that specify the types of things (sharing specifications).

The alternative, which has worked very well for us, is to eschew opaque ascription, and instead rely on the datatype mechanism to make types abstract.  So to give an implementation of the abstract type of mappings with keys being integers, we proceed as follows:

structure IntMapping : MAPPING = struct
  structure Key : ORDERED = Int
  datatype 'a bst = Empty | Node of 'a bst * (Key.t * 'a) * 'a bst
  type 'a mapping = 'a bst
  val insert = ...
end

The point is that since the constructors of the type ‘a bst are not exported in the interface, the type ‘a IntMapping.mapping is abstract.  Note as well that the use of transparent ascription on the structure Key ensures that keys really are integers (of type Int.int), and are not abstract, exactly as intended.  This formulation allows us to state simple rules of signature matching (every specification in the signature has a corresponding declaration in the structure), and allows us to enforce abstraction boundaries with a minimum of fuss.  The students have had absolutely no trouble with this at all, and we have had no trouble structuring our code this way.

When using functors (parameterized modules) to combine modules it is, of course, necessary to impose sharing constraints to ensure that only coherent compositions are possible.  (Rather than take the space to explain this here, I will rely on the reader’s experience to understand what is involved here.)  These sorts of sharing specifications are perfectly natural, easily explained, and have presented absolutely no difficulties for the students.  We illustrated their use in our game tree search example, in which the “referee” module is parameterized by the two “player” modules, which must of course cohere on their concept of a “game” (it’s no use pitting a chess player against a checkers player!).  The code looks like this

functor Referee
  (structure Player1 : PLAYER and Player2 : PLAYER
   sharing type Player1.Game.t = Player2.Game.t) : REFEREE = ...

The sharing specification states precisely and concisely the natural coherence constraint governing the two players.  Here again, the dog we feared never barked, and the students found it all quite intuitive and unproblematic.  This allowed them to expend their time on the actual complexities of the problem at hand, such as how to think about alpha-beta pruning in a parallel game-tree search, rather than get all tied up with the bureaucracy of structuring the code itself.

The virtue of teaching bright young students is that they are bright and they are young.  Their brilliance is, of course, a pleasure.  We have to work hard to come up with sufficiently challenging exercises, and many students challenge us with their solutions to our problems.  Their youth means that they come to us with a minimum of misconceptions and misinformation that they’ve picked up on the street, and are open to learning methods that are entirely non-standard (at least for now) with respect to what their friends are learning at other universities.  What makes Carnegie Mellon a special place is precisely that the students are pushed into thinking hard in ways that they might not be.  Personally, I hope that more universities worldwide build on what we have started, and give their students the same benefits that ours are already enjoying.


Follow

Get every new post delivered to your Inbox.

Join 40 other followers