Bellman on “Dynamic Programming”

April 21, 2014

Everyone who has studied algorithms has wondered “why the hell is Bellman’s memorization technique called dynamic programming?”.  I recently learned the answer from my colleague, Guy Blelloch, who dug up the explanation from Richard Bellman himself:

“I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes.

“An interesting question is, ‘Where did the name, dynamic programming, come from?’ The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various rea- sons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities” (p. 159).

As with algorithms, so too with dynamic languages?

Update: why is it called “memoization” and not “memorization”?

Update: rewrite of the commentary.


Old Neglected Theorems Are Still Theorems

March 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

\begin{array}{rcl}    \textit{gcd}(m,0) & = & m \\    \textit{gcd}(0,m) & = & m \\    \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\    \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m<n    \end{array}

The type of \textit{gcd} defined in this manner has partial function type (\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say 2^{2^n}, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.

Homotopy Type Theory Lectures and Notes On-Line

December 4, 2013

My activity on this blog has been reduced to nil recently because I have been spending my time preparing a series of lectures on homotopy type theory, starting from basic principles and ending with the application of univalence and higher inductive types to algebraic topology.  The course web page contains links to the video-taped lectures and to the course notes prepared by the students this semester.  These will be available indefinitely and are accessible to anyone interested in the course.  My hope is that these will provide a useful introduction to the HoTT Book, which is available for free on the web and may be printed on demand.  My intention is to write an introduction to dependent type theory for a CS audience, which will serve as a reference for much ongoing work in the area and as a prolegomenon to the HoTT Book itself.

What, If Anything, Is A Declarative Language?

July 18, 2013

Back in the 1980’s it was very fashionable to talk about “declarative” programming languages.  But to my mind there was never a clear definition of a “declarative language”, and hence no way to tell what is declarative and what is not.  Lacking any clear meaning, the term came to refer to the arbitrary conflation of functional with logic programming to such an extent that “functional-and-logic-programming” almost became a Germanic thing-in-itself (ding an sich).  Later, as the logic programming wave subsided, the term “declarative”,  like “object-oriented”, came to be an expression of approval, and then, mercifully, died out.

Or so I had thought.  Earlier this week I attended a thriller of an NSF-sponsored workshop on high-level programming models for parallelism, where I was surprised by the declarative zombie once again coming to eat our brains.  This got me to thinking, again, about whether the term has any useful meaning.  For what it’s worth, and perhaps to generate useful debate, here’re some things that I think people mean, and why I don’t think they mean very much.

  1. “Declarative” means “high-level”.  This just seems to replace one vague term by another.
  2. “Declarative” means “not imperative”.  But this flies in the face of reality.  Functional languages embrace and encompass imperative programming as a special case, and even Prolog has imperative features, such as assert and retract, that have imperative meaning.
  3. “Declarative” means “functional”.  OK, but then we don’t really need another word.
  4. “Declarative” means “what, not how”.  But every language has an operational semantics that defines how to execute its programs, and you must be aware of that semantics to understand programs written in it.  Haskell has a definite evaluation order, just as much as ML has a different one, and even Prolog execution is defined by a clear operational semantics that determines the clause order and that can be influenced by “cut”.
  5. “Declarative” means “equational”.  This does not distinguish anything, because there is a well-defined notion of equivalence for any programming language, namely observational equivalence.  Different languages induce different equivalences, of course, but how does one say that one equivalence is “better” than another?  At any rate, I know of no stress on equational properties of logic programs, so either logic programs are not “declarative” or “equational reasoning” is not their defining characteristic.
  6. “Declarative” means “referentially transparent”.  The misappropriation of Quine’s terminology only confuses matters.  All I’ve been able to make of this is that “referentially transparent” means that beta-equivalence is valid.  But beta equivalence is not a property of an arbitrary programming language, nor in any case is it clear why this equivalence is first among equals.  In any case why you would decide a priori on what equivalences you want before you even know what it means to run a program?
  7. “Declarative” means “has a denotation”.  This gets closer to the point, I think, because we might well say that a declarative semantics is one that gives meaning to programs as some kind of mapping between some sort of spaces.  In other words, it would be a synonym for “denotational semantics”.  But every language has a denotational semantics (perhaps by interpretation into a Kripke structure to impose sequencing), so having one does not seem to distinguish a useful class of languages.  Moreover, even in the case of purely functional programs, the usual denotational semantics (as continuous maps) is not fully abstract, and the fully abstract semantics (as games) is highly operational.  Perhaps a language is declarative in proportion to being able to give it semantics in some “familiar” mathematical setting?
  8. “Declarative” means “implicitly parallelizable“.  This was certainly the sense intended at the NSF meeting, but no two “declarative” languages seemed to have much in common.  Charlie Garrod proposes just “implicit”, which is pretty much synonymous with “high level”, and may be the most precise sense there is to the term.

No doubt this list is not exhaustive, but I think it covers many of the most common interpretations.  It seems to me that none of them have a clear meaning or distinguish a well-defined class of languages.  Which leads me to ask, is there any such thing as a declarative programming language?

[Thanks to the late Steve Gould for inspiring the title of this post.]

[Update: wordsmithing.]

[Update: add a remark about semantics, add another candidate meaning.]

[Update: added link to previous post.]

Exceptions Are Shared Secrets

December 3, 2012

Exceptions are commonly criticized as being the “goto’s” of modern programming languages.  Raising an exception transfers control to an unknown destination, it is said, and this is a bad thing on engineering grounds.  I disagree.  It is perfectly predictable where a raised exception will be handled, provided that exceptions are done properly, which, unfortunately, is not always the case.

The crucial point that exception values are shared secrets. Let us distinguish two parties, the raiser of the exception, and the handler of it. The fundamental idea of exceptions is to transfer a value from the raiser to the handler without the possibility of interception by another party. While the language of secrecy seems appropriately evocative, I hasten to add that I am not here concerned with “attackers” or suchlike, but merely with the difficulties of ensuring modular composition of programs from components. In such a setting the “attacker” is yourself, who is not malicious, but who is fallible.

By raising an exception the raiser is “contacting” a handler with a message. The raiser wishes to limit which components of a program may intercept that message. More precisely, the raiser wishes to ensure that only certain previously agreed-upon components may handle that exception, perhaps only one. This property should remain stable under extension to the program or composition with any other component. It should not be possible for an innocent third party to accidentally intercept a message that was not intended for it.

Achieving this requires a secrecy mechanism that allows the raiser and the handler(s) to agree upon their cooperation. This is accomplished by dynamic classification, exactly as it is done properly in Standard ML (but not O’Caml). The idea is that the raiser has access to a dynamically generated constructor for exception values, and any handler has access to the corresponding dynamically generated matcher for exception values. This means that the handler, and only the handler, can decode the message sent by the raiser; no other party can do anything with it other than pass it along unexamined. It is “perfectly encrypted” and cannot be deciphered by any unintended component.

The usual exception mechanisms, as distinct from exception values, allow for “wild-card handlers”, which means that an exception can be intercepted by a third party. This means that the raiser cannot ensure that the handler actually receives the message, but it can ensure, using dynamic classification, that only a legitimate handler may decipher it. Decades of experience with Standard ML shows that this is a very useful thing indeed, and has application far beyond just the simple example considered here. For full details, see my forthcoming book, for a full discussion of dynamic classification and its role for ensuring integrity and confidentiality in a program. Dynamic classification is not just for “security”, but is rather a good tool for everyday programming.

Haskell is one language that does not get exceptions right.  Allocating an exception incurs a storage effect, and so would have to be confined to the IO monad in Haskell.  But this would destroy the utility of exceptions in pure code.  The result is an exception mechanism that is arguably broken and that does not provide the guarantees that make exceptions a perfectly pleasant and useful way to write code.

Update: Reworked last paragraph to clarify the point I am making; the previous formulation appears to have invited misinterpretation.

Update: This account of exceptions also makes clear why the perennial suggestion to put exception-raising information into types makes no sense to me. I will write more about this in a future post, but meanwhile contemplate that a computation may raise an exception that is not even in principle nameable in the type. That is, it is not conservativity that’s at issue, it’s the very idea.

Update: Wordsmithing, removal of irrelevant remarks to focus on the main point about dynamic exceptions.

Practical Foundations for Programming Languages Is Out

December 3, 2012

Practical Foundations for Programming Languages, published by Cambridge University Press, is now available in print! It can be ordered from the usual sources, and maybe some unusual ones as well. If you order directly from Cambridge using this link, you will get a 20% discount on the cover price (pass it on).

Since going to press I have, inevitably, been informed of some (so far minor) errors that are corrected in the online edition. These corrections will make their way into the second printing. If you see something fishy-looking, compare it with the online edition first to see whether I may have already corrected the mistake. Otherwise, send your comments to me.

By the way, the cover artwork is by Scott Draves, a former student in my group, who is now a professional artist as well as a researcher at Google in NYC. Thanks, Scott!

Update: The very first author’s copy hit my desk today!

Introductory FP Course Materials

September 15, 2012

The course materials for our first-semester introductory programming course (which I’ve discussed elsewhere on this blog) are now available here.

The course materials for our second-semester data structures and algorithms course are available here.

Thanks to Dan Licata and Guy Blelloch for helping make these available.  Comments are most welcome.

Update: I will be giving a talk about this at Google Pittsburgh on Thursday the 20th of September.