## Sequentiality as the Essence of Parallelism

November 4, 2017

I recently thought of a nice way to structure a language for parallel programming around the concept of sequential composition.  Think of parallelism as the default—evaluate everything in parallel unless the semantics of the situation precludes it: sums are posterior to summands, but the summands can be evaluated simultaneously.  You need a way to express the necessary dependencies without introducing any spurious ones.

There’s a tool for that, called lax logic, introduced by Fairtlough and Mendler and elaborated by Davies and Pfenning, which I use extensively in PFPL.  The imperative language Modernized Algol is formulated in the lax style, distinguishing two modes, or levels, of syntax, the (pure) expressions and the (impure) commands.  The lax modality, which links the two layers, behaves roughly like a monad, but, all the hype notwithstanding, it is not the central player.  It’s the modes, not the modality, that matter.  (See the Commentary on PFPL for more.)

The lax modality is just the ticket for expressing parallelism.  Rather than separate expressions from commands, here we distinguish between values and computations.  The names are important, to avoid possible confusion.  Values are fully evaluated; they are not a source of parallelism.  (If values were called “pure”, it would be irresistible to think otherwise.)  Computations have yet to be evaluated; they engender parallelism by sequential composition.  What?  No, you didn’t nod off! Let me explain.

Parallelism is all about the join points.  If parallel execution is the default, then the job of the programmer is not to induce parallelism, but to harness it.  And you do that by saying, “this computation depends on these others.”  Absent that, there is nothing else to say, just go for it.  No sub-languages.  No program analysis.  No escaping the monad.  Just express the necessary dependencies, and you’re good to go.

So, what are the join points?  They are the elimination forms for two parallel modalities.  They generalize the sequential case to allow for statically and dynamically determined parallelism.   A value of parallel product type is a tuple of unevaluated computations, a kind of “lazy” tuple (but not that kind of laziness, here I just mean unevaluated components).  The elimination form evaluates all of the component computations in parallel, creates a value tuple from their values, and passes it to the body of the form.  Similarly, a value of parallel sequence type is a generator consisting of two values, a natural number n indicating its size, and a function determining the ith component computation for each 1≤i<n.  The elimination form activates all n component computations, binds their values to a value sequence, and passes it to the body of the form.

The join point effects a change of type, from encapsulated computations to evaluated values, neatly generalizing sequential composition from a unary to a multiway join.  If you’d like, the parallel products and parallel sequences are “generalized monads” that encapsulate not just one, but many, unevaluated computations.  But they are no more monads than they are in any other functional language: the categorial equational laws need not hold in the presence of, say, divergence, or exceptions.

The dynamics assigns costs to computations, not to values, whose cost of creation has already been paid.  The computation that just returns a value has unit work and span.  Primitive operations take unit work and span.  The sequential composition of a parallel product with n components induces span one more than the maximum span of the constituents, and induces work one more than the sum of their work.  The dynamics of sequential composition for parallel sequences is similar, with the “arity” being determined dynamically rather than statically.

Programming in this style means making the join points explicit.  If you don’t like that, you can easily define derived forms—and derived costs—for constructs that do it for you.    For example, a pair of computations might be rendered as activating a parallel pair of its components, then returning the resulting value pair.  And so on and so forth.  It’s no big deal.

En passant the modal formulation of parallelism solves a nasty technical problem in a substitution-based cost semantics that does not make the modal distinction.  The issue is, how to distinguish between the creation of a value, and the many re-uses of it arising from substitution?  It’s not correct to charge again and again for cresting the value each time you see it (this cost can be asymptotically significant), but you do have to charge for creating it somewhere (it’s not free, and it can matter).  And, anyway, how is one to account for the cost of assessing whether an expression is, in fact, a value?  The usual move is to use an environment semantics to manage sharing.  But you don’t have to, the modal framework solves the problem, by distinguishing between a value per se; the computation that returns it fully created; and the computation that incrementally constructs it from its constituent parts.  It’s the old cons-vs-dotted pair issue, neatly resolved.

Please see Section 10 of the Commentary on PFPL for a fuller account.  The main idea is to generalize a type of single unevaluated computations, which arises in lax logic, to types of statically- and dynamically many unevaluated computations.  The bind operation becomes a join operation for these computations, turning a “lazy” tuple or sequence into eager tuples or sequences.

Updates: word-smithing, added cite to Davies-Pfenning, replaced cite of course notes with reference to commentary.

## What, if anything, is a programming paradigm?

May 1, 2017

Just out, an essay on the Cambridge University Press author’s blog about “programming paradigms”, and why I did not structure Practical Foundations for Programming Languages around them.

## Structure and Efficiency of Computer Programs

September 28, 2014

For decades my colleague, Guy Blelloch, and I have promoted a grand synthesis of the two “theories” of computer science, combinatorial theory and logical theory.  It is only a small exaggeration to say that these two schools of thought work in isolation.  The combinatorial theorists concern themselves with efficiency, based on hypothetical translations of high-level algorithms to low-level machines, and have no useful theory of composition, the most important tool for developing large software systems.  Logical theorists concern themselves with composition, emphasizing the analysis of the properties of components of systems and how those components are combined; the heart of logic is a theory of composition (entailment).  But relatively scant attention is paid to efficiency, and, to a distressingly large extent, the situation is worsening, and not improving.

Guy and I have argued, through our separate and joint work, for the applicability of PL ideas to algorithms design, leading. for example, to the concept of adaptive programming that Umut Acar has pursued aggressively over the last dozen years.  And we have argued for the importance of cost analysis, for various measures of cost, at the level of the code that one actually writes, and not how it is compiled.  Last spring, prompted by discussions with Anindya Banerjee at NSF in the winter of 2014, I decided to write a position paper on the topic, outlining the scientific opportunities and challenges that would arise in an attempt to unify the two, disparate theories of computing.  I circulated the first draft privately in May, and revised it in July to prepare for a conference call among algorithms and PL researchers (sponsored by NSF) to find common ground and isolate key technical challenges to achieving its goals.

There are serious obstacles to be overcome if a grand synthesis of the “two theories” is to be achieved.  The first step is to get the right people together to discuss the issues and to formulate a unified vision of what are the core problems, and what are promising directions for short- and long-term research.  The position paper is not a proposal for funding, but is rather a proposal for a meeting designed to bring together two largely (but not entirely) disparate communities.  In summer of 2014 NSF hosted a three-hour long conference call among a number of researchers in both areas with a view towards hosting a workshop proposal in the near future.  Please keep an eye out for future developments.

I am grateful to Anindya Banerjee at NSF for initiating the discussion last winter that led to the paper and discussion, and I am grateful to Swarat Chaudhuri for his helpful comments on the proposal.

[Update: word smithing, corrections, updating, removed discussion of cost models for fuller treatment later, fixed incoherence after revision.]

## Summer of Programming Languages

July 6, 2014

Having just returned from the annual Oregon Programming Languages Summer School, at which I teach every year, I am once again very impressed with the impressive growth in the technical sophistication of the field and with its ability to attract brilliant young students whose enthusiasm and idealism are inspiring.  Eugene was, as ever, an ideal setting for the summer school, providing a gorgeous setting for work and relaxation.  I was particularly glad for the numerous chances to talk with students outside of the classroom, usually over beer, and I enjoyed, as usual, the superb cycling conditions in Eugene and the surrounding countryside.  Many students commented to me that the atmosphere at the summer school is wonderful, filled with people who are passionate about programming languages research, and suffused with a spirit of cooperation and sharing of ideas.

Started by Zena Ariola a dozen years ago, this year’s instance was organized by Greg Morrisett and Amal Ahmed in consultation with Zena.  As usual, the success of the school depended critically on the dedication of Jim Allen, who has been the de facto chief operating officer since it’s inception.  Without Jim, OPLSS could not exist.  His attention to detail, and his engagement with the students are legendary.   Support from the National Science Foundation CISE Division, ACM SIGPLANMicrosoft Research, Jane Street Capital, and BAE Systems was essential for providing an excellent venue,  for supporting a roster of first-rate lecturers, and for supporting the participation of students who might otherwise not have been able to attend.  And, of course, an outstanding roster of lecturers donated their time to come to Eugene for a week to share their ideas with the students and their fellow lecturers.

The schedule of lectures is posted on the web site, all of which were taped, and are made available on the web.  In addition many speakers provided course notes, software, and other backing materials that are also available online.  So even if you were not able to attend, you can still benefit from the summer school, and perhaps feel more motivated to come next summer.  Greg and I will be organizing, in consultation with Zena.  Applying the principle “don’t fix what isn’t broken”, we do not anticipate major changes, but there is always room for improvement and the need to freshen up the content every year.  For me the central idea of the summer school is the applicability of deep theory to everyday practice.  Long a dream held by researchers such as me, these connections become more “real” every year as the theoretical abstractions of yesterday become the concrete practices of today.  It’s breathtaking to see how far we’ve come from the days when I was a student just beginning to grasp the opportunities afforded by ideas from proof theory, type theory, and category theory (the Holy Trinity) to building beautiful software systems.  No longer the abstruse fantasies of mad (computer) scientists, these ideas are the very air we breathe in PL research.  Gone are the days of ad hoc language designs done in innocence of the foundations on which they rest.  Nowadays serious industrial-strength languages are emerging that are grounded in theory and informed by practice.

Two examples have arisen just this summer, Rust (from Mozila) and Swift (from Apple), that exemplify the trend.  Although I have not had time to study them carefully, much less write serious code using them, it is evident from even a brief review of their web sites that these are serious languages that take account of the academic developments of the last couple of decades in formulating new language designs to address new classes of problems that have arisen in programming practice.  These languages are type safe, a basic criterion of sensibility, and feature sophisticated type systems that include ideas such as sum types, which have long been missing from commercial languages, or provided only in comically obtuse ways (such as objects).  The infamous null pointer mistakes have been eradicated, and the importance of pattern matching (in the sense of the ML family of languages) is finally being appreciated as the cure for Boolean blindness.  For once I can look at new industrial languages without an overwhelming sense of disappointment, but instead with optimism and enthusiasm that important ideas are finally, at long last, being recognized and adopted.  As has often been observed, it takes 25 years for an academic language idea to make it into industrial practice.  With Java it was simply the 1970’s idea of automatic storage management; with languages such as Rust and Swift we are seeing ideas from the 80’s and 90’s make their way into industrial practice.  It’s cause for celebration, and encouragement for those entering the field: the right ideas do win out in the end, one just has to have the courage to be irrelevant.

I hope to find the time to comment more meaningfully on the recent developments in practical programming languages, including Rust and Swift, but also languages such as Go and OCaml that are also making inroads into programming practice.  (The overwhelming success and future dominance of Haskell is self-evident.  Kudos!) But for now, let me say that the golden age of programming language research is here and now, and promises to continue indefinitely.

Update: word smithing.

## Old Neglected Theorems Are Still Theorems

March 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

$\begin{array}{rcl} \textit{gcd}(m,0) & = & m \\ \textit{gcd}(0,m) & = & m \\ \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\ \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m

The type of $\textit{gcd}$ defined in this manner has partial function type $(\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}$, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say $2^{2^n}$, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.

## There and Back Again

August 6, 2012

Last fall it became clear to me that it was “now or never” time for completing Practical Foundations for Programming Languages, so I put just about everything else aside and made the big push to completion.  The copy editing phase is now complete, the cover design (by Scott Draves) is finished, and its now in the final stages of publication.  You can even pre-order a copy on Amazon; it’s expected to be out in November.

I can already think of ways to improve it, but at some point I had to declare victory and save some powder for future editions.  My goal in writing the book is to organize as wide a body of material as I could manage in a single unifying framework based on structural operational semantics and structural type systems.  At over 600 pages the manuscript is at the upper limit of what one can reasonably consider a single book, even though I strived for concision throughout.

Quite a lot of the technical development does not follow along traditional lines.  For example, I completely decouple the concepts of assignment, reference, and storage class (heap or stack) from one another, which makes clear that one may have references to stack-allocated assignables, or make use of heap-allocated assignables without having references to them.  As another example, my treatment of concurrency, while grounded in the process calculus tradition, coheres with my treatment of assignables, but differs sharply from standard accounts (and avoids some of the complications in the treatment of process equivalences).

An explicit goal was to avoid the computational cladistics that characterizes many treatments of PL concepts.  For example, there are not chapters on “paradigms” such as “functional programming” or “object-oriented programming”, although many of their underlying concepts are treated in the text.  So there are chapters on higher-order functions and dynamic dispatch, for example, but these are not elevated to language design principles, but rather are analyses of computational phenomena “found in nature”.

With the first edition behind me, I intend to resume blogging.  I have a few topics lined up in my head, including an update on our new undergraduate curriculum at Carnegie Mellon (going smashingly), and more posts on fundamentals of logic and PL’s.  So stay tuned!

Update: word-smithing.