Old Neglected Theorems Are Still Theorems

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

\begin{array}{rcl}    \textit{gcd}(m,0) & = & m \\    \textit{gcd}(0,m) & = & m \\    \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\    \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m<n    \end{array}

The type of \textit{gcd} defined in this manner has partial function type (\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say 2^{2^n}, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.


30 Responses to Old Neglected Theorems Are Still Theorems

  1. So how does the Brown-Palsberg self-interpreter for System T affect this line of reasoning? http://math.andrej.com/2016/01/04/a-brown-palsberg-self-interpreter-for-godels-system-t/ explicitly links back to here, and mentions that PFFL probably has some implicit assumptions in its proof.

    I do realize that this in no way invalidates the core of the argument in this post.

    • If you mean PFPL, then I would say that the setup is explicit, and makes no hidden assumptions that I am aware of. In fact, it is just standard elementary computability theory, placed into the context of Goedel’s T.

      The Palsberg work is where one has to be careful about assumptions. They use the word “self-interpreter” in a way that is so general you can even write a self-interpreter for a total language in that language. I don’t find the abuse of terminology helpful. Whatever they are doing, it does not contradict the established facts for the established terminology.

  2. pressron says:

    Pragmatic considerations aside, I think that another important theorem speaks to the theoretical utility of total languages. While it is true that total languages avoid the halting problem, they do not avoid its much more generally useful generalization of “bounded halting”, which is at the core of the time-hierarchy theorem. Crucially, the diagonalization proof of bounded halting does not require that the language be able to interpret itself, merely interpret some computation up to a given n number of steps. Bounded halting, therefore, applies to total languages exactly in the same manner it applies to partial languages, and by the same diagonalization proof. As it is bounded halting that yields the theoretical bounds on the cost of verification, it is an immediate result that total languages cannot help reduce the cost of _software_ verification (leaving aside the necessity of totality when a language is used to prove general mathematical theorems).

  3. Sandro Magi says:

    Unless I’ve missed something, this explosion argument applies to total languages in which general recursion is syntactically inexpressible, but it would not apply to total languages in which termination is checked by types, eg. via sized types or similar. The sized types correspond to the external verification of termination. Is this correct?

    • it applies whenever your language is restricted to total functions. if it’s by proof, then that proof can be, by the theorem, very large.

      the theorem simply says that there exist bad cases, but it does not say that all, or “most”, cases are bad. but i’ve found that many (most?) people these days don’t even know of the theorem at all, whence the post came.

  4. It’s also possible to equip Coq with bar types. I like the idea of having the choice of proving termination for free with the types, but being able to back out for the rare cases where it’s a pain in the ass. My point is that you want to capture totality with types where it’s natural and convenient so you don’t have the proof obligation everywhere. And in practice, most computations work just fine this way.

  5. arielbyd says:

    I’ll just like to note that the blowup factor is required to be computable – i.e. it is not the case that (the paper proved that) you can have a blowup factor of \n. BB(n)

  6. Eeyore says:

    Right. Old theorems are still theorems. Just theorems, I’d say. It’s actually quite funny because I’m writing a compiler for a total language, Aha!, in itself. Who needs an interpreter anyway?

    • vog2 says:

      That’s exactly what I thought when reading this, too. Who wants to write an interpreter anyway? Why not just writing a compiler? Are there any good reasons to demand an interpreter (instead of a compiler) for a total language?

    • The issue is not to do with interpreter vs compiler, but rather whether one can execute the compiler’s output. It is immaterial whether the compiler is the identity or something more complicated than that.

  7. Mike Shulman says:

    This is a very nice post, and it relates to something I’ve been thinking about on the mathematical side for a year or two. Mathematicians are generally indoctrinated that the idea of a “set of all sets” (or, if you like a type theory as a foundation, a universe with Type:Type) is just wrong and impossible, and if we want “universes” to exist in mathematics then we have to stratify them, Type0:Type1:Type2 etc. From a platonic point of view, what we are taught is that a set of all sets does not exist.

    However, in a type theoretic foundation, the inconsistencies arising from Type:Type (via Girard’s paradox and variations) are simply nonterminating programs. And just an infinite loop is invariably a bug, a non-normalizing proof term should not be considered a valid proof. So, by analogy to your argument for writing in partial languages, maybe we should really consider doing mathematics in an “inconsistent” foundation, and just verify “externally” that all of our proofs terminate. In other words, there’s no reason that a set of all sets can’t exist; we just have to be more careful than usual not to engage in circular reasoning when talking about it.

  8. spitters37 says:

    The verification of NuPrl in Coq is WIP by Vincent Rahli and Abhishek Anand (Cornell).

  9. Gert Smolka says:

    You write: We have already seen one limitation of total programming languages: they are not universal. You cannot write an interpreter for T within T, and this limitation extends to any total language whatever.

    I’m curious. Where can I find a proof of this claim? I see the analogy with Gödel’s 2nd incompleteness theorem, but fail to see why termination is as strong as consistency.

    • Peter Gammie says:

      See also “Incomputability” by Hoare, Allison, ACM Computing Surveys 4 (3) 1972. I don’t remember if there’s a proper proof in there, but the claim is made towards the end.

    • dobesv says:

      Isn’t it the case, however, that if the total programming language is capable of I/O it should be able to compile itself to a partial language and run the interpreter for that language?

    • A total language can’t interpret a partial language.

    • dobesv says:

      Hi Robert,

      Sorry I wasn’t clear – I meant it could instruct the operating system to run the interpreter for that language using the I/O facility provided. Well, assuming the I/O facility included a way to run an external command, which seems reasonable.

      In Haskell they have externalized the “impure” parts of the program using monadic I/O. One could also externalize the “partial” nature of a program using a total language with a partial monad. The “inside” of the program is still total and pure but with the help of the runtime the overall program is partial and impure. Just as most people agree that Haskell is a “pure language” I think the same people could be convinced that a system with a total language but partial impure monad can be called a “total language”.

      Note that I’m not saying the result that a total language cannot interpret itself or any partial language is incorrect as stated in the original paper. But in a practical total programming language implementation one would provide a total language with a way to do I/O and parse or interpret arbitrarily large data structures and languages. And it would probably still be called a “total language” nonetheless.

      On Fri, Jul 18, 2014 at 11:48 AM, Existential Type wrote:

      > Robert Harper commented: “A total language can’t interpret a partial > language.”

    • oh, constable and smith did this 25+ years ago before “monad” jargon prevailed. they called them “bar types” or “computation types”, and they do exactly what you suggest.

    • Sandro Magi says:

      You can’t use the partiality monad to interpret a partial language within a total language?

    • spitters37 says:

      Bob, do you have a reference for these “bar types”?

    • Constable and Smith had papers at LICS around 1987 or so, leading to both Smith’s and Crary’s theses on the topic. I think one paper was called Computational Foundations of Basic Recursive Function Theory, but there were certainly others. The bar types preceded and anticipated monads for partiality. In particular one type of partial functions (among many) A\rightharpoonup B consists of the total functions A\to\overline{B}—total functions from A to suspended computations of B.

  10. Todd Wilson says:

    I’m going to direct my PL students to read this post, because it makes concrete — through the BST — something I tried to impress on them less effectively in the abstract. I did, in fact, use the same example of gcd when discussing the difference between T and PCF, and sketched the admissibility of course-of-values recursion (and the associated strong-induction principle) in T, but it’s nice to be able to point here for validation and some extra information. Thanks!

  11. In Coq, at least with one popular usage mode, there isn’t actually any performance cost to using general recursion. The “Fix” combinator is written as primitive-recursive in the structure of a termination proof, and extraction to OCaml erases proofs systematically. Your GCD example would extract to the idiomatic OCaml code.

  12. spitters37 says:

    Does the recent work towards verifying NuPrl in Coq contradict your argument, or would you say this is having your cake and eating it too?
    Of course, it is not unrelated to the verification of the cubical and simplicial models in types theory.

    • Erroneous parenthetical removed. It was late and I was tired. I don’t know about the verification of NuPRL in Coq. Afaik the cubical sets are not quite a model of Coq, so I haven’t yet understood what it will mean in terms of endowing Coq with an operational semantics.

Leave a Reply to Mike Shulman Cancel reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: