Old Neglected Theorems Are Still Theorems

March 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself.  And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place.  What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages.  The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of  T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”)  The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

\begin{array}{rcl}    \textit{gcd}(m,0) & = & m \\    \textit{gcd}(0,m) & = & m \\    \textit{gcd}(m,n) & = & \textit{gcd}(m-n,n) \quad \text{if}\ m>n \\    \textit{gcd}(m,n) & = & \textit{gcd}(m,n-m) \quad \text{if}\ m<n    \end{array}

The type of \textit{gcd} defined in this manner has partial function type (\mathbb{N}\times \mathbb{N})\rightharpoonup \mathbb{N}, which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be.  (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost.  One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).  More importantly, this limitation extends to any total language whatever.  If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages.  Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say 2^{2^n}, or however expansive you wish to be.  The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.)  But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise.  For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice.  Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T.  By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it.  One could argue that such a function is of no practical use.  I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless).  For another example, there is no inherent reason why termination need be assured by means similar to that used in T.  We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one.  The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality).  But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.

More Is Not Always Better

January 28, 2013

In a previous post I discussed the status of Church’s Law in type theory, showing that it fails to hold internally to extensional type theory, even though one may see externally that the definable numeric functions in ETT are λ-definable, and hence Turing computable.  The distinction between internal and external is quite important in logic, mainly because a logical formalism may be unable to express precisely an externally meaningful concept.  The classical example is the Löwenheim-Skolem Theorem of first-order logic, which says that any theory with an infinite model has a countable model.  In particular the theory of sets has a countable model, which would seem to imply that the set of real numbers, for example, is countable.  But internally one can prove that the reals are uncountable (Cantor’s proof is readily expressed in the theory), which seems to be a paradox of some kind.  But no, all it says is that the function witnessing the countability of the term model cannot be expressed internally, and hence there is no contradiction at all.

A similar situation obtains with Church’s Law.  One may observe empirically, so to say, that Church’s Law holds externally of ETT, but this fact cannot be internalized.  There is a function given by Church’s Law that “decompiles” any (extensional) function of type N→N by providing the index for a Turing machine that computes it.  But this function cannot be definable internally to extensional type theory, because it may be used to obtain a decision procedure for halting of Turing machines, which is internally refutable by formalizing the standard undecidability proof.  In both of these examples it is the undefinability of a function that is important to the expressive power of a formalism, contrary to naïve analyses that would suggest that, when it comes to definability of functions, the more the merrier.  This is a general phenomenon in type theory.  The power of type theory arises from its strictures, not its affordances, in direct opposition to the ever-popular language design principle “first-class x” for all imaginable values of x.

Another perspective on the same issue is provided by Martin-Löf’s meaning explanation of type theory, which is closely related to the theory of realizability for constructive logic.  The high-level idea is that a justification for type theory may be obtained by starting with an untyped concept of computability (i.e., a programming language given by an operational semantics for closed terms), and then giving the meaning of the judgments of type theory in terms of such computations.  So, for example, the judgment A type, where A is a closed expression means that A evaluates to a canonical type, where the canonical types include, say, Nat, and all terms of the form A’→A”, where A’ and A” are types.  Similarly, if A is a type, the judgment a:A means that A evaluates to a canonical type A’ and that a evaluates to a canonical term a’ such that a’ is a canonical element of A’, where, say, any numeral for a natural number is a canonical member of Nat.  To give the canonical members of the function type A’→A” requires the further notion of equality of elements of a type, a=b:A, which all functions are required to respect.  A meaning explanation of this sort was suggested by Martin-Löf in his landmark paper Constructive Mathematics and Computer Programming, and is used as the basis for the NuPRL type theory, which extends that account in a number of interesting directions, including inductive and coinductive types, subset and quotient types, and partial types.

The relation to realizability emerges from applying the meaning explanation of types to the semantics of propositions given by the propositions-as-types principle (which, as I’ve previously argued, should not be called “the Curry-Howard isomorphism”).  According to this view a proposition P is identified with a type, the type of its proofs, and we say that P true iff evaluates to a canonical proposition that has a canonical member.  In particular, for implication we say that P→Q true if and only if P true implies Q true (and, in addition, the proof respects equality, a condition that I will suppress here for the sake of simplicity).  More explicitly, the implication is true exactly when the truth of the antecedent implies the truth of the consequent, which is to say that there is a constructive transformation of proofs of P into proofs of Q.

In recursive realizability one accepts Church’s Law and demands that the constructive transformation be given by the index of a Turing machine (i.e., by a program written in a fixed programming language).  This means, in particular, that if P expresses, say, the decidability of the halting problem, for which there is no recursive realizer, then the implication P→Q is vacuously true!  By taking Q to be falsehood, we obtain a realizer for the statement that the halting problem is undecidable.  More generally, any statement that is not realized is automatically false  in the recursive realizability interpretation, precisely because the realizers are identified with Turing machine indices.  Pressing a bit further, there are statements, such as the statement that every Turing machine either halts or diverges on its own input, that are true in classical logic, yet have no recursive realizer, and hence are false in the realizability interpretation.

In contrast in the meaning explanation for NuPRL Church’s Law is not assumed.  Although one may show that there is no Turing machine to decide halting for Turing machines, it is impossible to show that there is no constructive transformation that may do so.  For example, an oracle machine would be able to make the required decision.  This is entirely compatible with intuitionistic principles, because although intuitionism does not affirm LEM, neither does it deny it.  This point is often missed in some accounts, leading to endless confusions.  Intuitionistic logic, properly conceived, is compatible with classical logic in that classical logic may be seen as an idealization of intuitionistic logic in which we heuristically postulate that all propositions are decidable (all instances of LEM hold).

The crucial point distinguishing the meaning explanation from recursive realizability is precisely the refusal to accept Church’s Law, a kind of comprehension principle for functions as discussed earlier.  This refusal is often called computational open-endedness because it amounts to avoiding a commitment to the blasphemy of limiting God’s programming language to Turing machines (using an apt metaphor of Andrej Bauer’s).  Rather, we piously accept that richer notions of computation are possible, and avoid commitment to a “final theory” of computation in which Church’s Law is postulated outright.  By avoiding the witnessing function provided by Church’s Law we gain expressive power, rather than losing it, resulting in an elegant theory of constructive mathematics that enriches, rather than diminishes, classical mathematics.    In short, contrary to “common sense” (i.e., uninformed supposition), more is not always better.

Update: corrected minor technical error and some typographical errors.

Update: clarified point about incompatibility of recursive realizability with classical logic.

Polarity in Type Theory

August 25, 2012

There has recently arisen some misguided claims about a supposed opposition between functional and object-oriented programming.  The claims amount to a belated recognition of a fundamental structure in type theory first elucidated by Jean-Marc Andreoli, and developed in depth by Jean-Yves Girard in the context of logic, and by Paul Blain-Levy and Noam Zeilberger in the context of programming languages.  In keeping with the general principle of computational trinitarianism, the concept of polarization has meaning in proof theory, category theory, and type theory, a sure sign of its fundamental importance.

Polarization is not an issue of language design, it is an issue of type structure.  The main idea is that types may be classified as being positive or negative, with the positive being characterized by their structure and the negative being characterized by their behavior.  In a sufficiently rich type system one may consider, and make effective use of, both positive and negative types.  There is nothing remarkable or revolutionary about this, and, truly, there is nothing really new about it, other than the terminology.  But through the efforts of the above-mentioned researchers, and others, we have learned quite a lot about the importance of polarization in logic, languages, and semantics.  I find it particularly remarkable that Andreoli’s work on proof search turned out to also be of deep significance for programming languages.  This connection was developed and extended by Zeilberger, on whose dissertation I am basing this post.

The simplest and most direct way to illustrate the ideas is to consider the product type, which corresponds to conjunction in logic.  There are two possible ways that one can formulate the rules for the product type that from the point of view of inhabitation are completely equivalent, but from the point of view of computation are quite distinct.  Let us first state them as rules of logic, then equip these rules with proof terms so that we may study their operational behavior.  For the time being I will refer to these as Method 1 and Method 2, but after we examine them more carefully, we will find more descriptive names for them.

Method 1 of defining conjunction is perhaps the most familiar.  It consists of this introduction rule

\displaystyle\frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}

and the following two elimination rules

\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash A\;\textsf{true}}\qquad\frac{\Gamma\vdash A\wedge B\;\textsf{true}}{\Gamma\vdash B\;\textsf{true}}.

Method 2 of defining conjunction is only slightly different.  It consists of the same introduction

\displaystyle \frac{\Gamma\vdash A\;\textsf{true}\quad\Gamma\vdash B\;\textsf{true}}{\Gamma\vdash A\wedge B\;\textsf{true}}

and one elimination rule

\displaystyle\frac{\Gamma\vdash A\wedge B\;\textsf{true} \quad \Gamma,A\;\textsf{true},B\;\textsf{true}\vdash C\;\textsf{true}}{\Gamma\vdash C\;\textsf{true}}.

From a logical point of view the two formulations are interchangeable in that the rules of the one are admissible with respect to the rules of the other, given the usual structural properties of entailment, specifically reflexivity and transitivity.  However, one can discern a difference in “attitude” in the two formulations that will turn out to be a manifestation of the concept of polarity.

Method 1 is a formulation of the idea that a proof of a conjunction is anything that behaves conjunctively, which means that it supports the two elimination rules given in the definition.  There is no commitment to the internal structure of a proof, nor to the details of how projection operates; as long as there are projections, then we are satisfied that the connective is indeed conjunction.  We may consider that the elimination rules define the connective, and that the introduction rule is derived from that requirement.  Equivalently we may think of the proofs of conjunction as being coinductively defined to be as large as possible, as long as the projections are available.  Zeilberger calls this the pragmatist interpretation, following Count Basie’s principle, “if it sounds good, it is good.”

Method 2 is a direct formulation of the idea that the proofs of a conjunction are inductively defined to be as small as possible, as long as the introduction rule is valid.  Specifically, the single introduction rule may be understood as defining the structure of the sole form of proof of a conjunction, and the single elimination rule expresses the induction, or recursion, principle associated with that viewpoint.  Specifically, to reason from the fact that A\wedge B\;\textsf{true} to derive C\;\textsf{true}, it is enough to reason from the data that went into the proof of the conjunction to derive C\;\textsf{true}.  We may consider that the introduction rule defines the connective, and that the elimination rule is derived from that definition.  Zeilberger calls this the verificationist interpretation.

These two perspectives may be clarified by introducing proof terms, and the associated notions of reduction that give rise to a dynamics of proofs.

When reformulated with explicit proofs, the rules of Method 1 are the familiar rules for ordered pairs:

\displaystyle\frac{\Gamma\vdash M:A\quad\Gamma\vdash N:B}{\Gamma\vdash \langle M, N\rangle:A\wedge B}

\displaystyle\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{fst}(M):A}\qquad\frac{\Gamma\vdash M:A\wedge B}{\Gamma\vdash \textsf{snd}(M):B}.

The associated reduction rules specify that the elimination rules are post-inverse to the introduction rules:

\displaystyle\textsf{fst}(\langle M,N\rangle)\mapsto M \qquad \textsf{snd}(\langle M,N\rangle)\mapsto N.

In this formulation the proposition A\wedge B is often written A\times B, since it behaves like a Cartesian product of proofs.

When formulated with explicit proofs, Method 2 looks like this:

\displaystyle \frac{\Gamma\vdash M:A\quad\Gamma\vdash M:B}{\Gamma\vdash M\otimes N:A\wedge B}

\displaystyle\frac{\Gamma\vdash M:A\wedge B \quad \Gamma,x:A,y:B\vdash N:C}{\Gamma\vdash \textsf{split}(M;x,y.N):C}

with the reduction rule

\displaystyle\textsf{split}(M\otimes N;x,y.P)\mapsto [M,N/x,y]P.

With this formulation it is natural to write A\wedge B as A\otimes B, since it behaves like a tensor product of proofs.

Since the two formulations of “conjunction” have different internal structure, we may consider them as two different connectives.  This may, at first, seem pointless, because it is easily seen that x:A\times B\vdash M:A\otimes B for some M and that x:A\otimes B\vdash N:A\times B for some N, so that the two connectives are logically equivalent, and hence interchangeable in any proof.  But there is nevertheless a reason to draw the distinction, namely that they have different dynamics.

It is easy to see why.  From the pragmatic perspective, since the projections act independently of one another, there is no reason to insist that the components of a pair be evaluated before they are used.  Quite possibly we may only ever project the first component, so why bother with the second?  From the verificationist perspective, however, we are pattern matching against the proof of the conjunction, and are demanding both components at once, so it makes sense to evaluate both components of a pair in anticipation of future pattern matching.  (Admittedly, in a structural type theory one may immediately drop one of the variables on the floor and never use it, but then why give it a name at all?  In a substructural type theory such as linear type theory, this is not a possibility, and the interpretation is forced.)  Thus, the verficationist formulation corresponds to eager evaluation of pairing, and the pragmatist formulation to lazy evaluation of pairing.

Having distinguished the two forms of conjunction by their operational behavior, it is immediately clear that both forms are useful, and are by no means opposed to one another.  This is why, for example, the concept of a lazy language makes no sense, rather one should instead speak of lazy types, which are perfectly useful, but by no means the only types one should ever consider.  Similarly, the concept of an object-oriented language seems misguided, because it emphasizes the pragmatist conception, to the exclusion of the verificationist, by insisting that everything be an object characterized by its methods.

More broadly, it is useful to classify types into two polarities, the positive and the negative, corresponding to the verificationist and pragmatist perspectives.  Positive types are inductively defined by their introduction forms; they correspond to colimits, or direct limits, in category theory.  Negative types are coinductively defined by their elimination forms; they correspond to limits, or inverse limits, in category theory.  The concept of polarity is intimately related to the concept of focusing, which in logic sharpens the concept of a cut-free proof and elucidates the distinction between synchronous and asynchronous connectives, and which in programming languages provides an elegant account of pattern matching, continuations, and effects.

As ever, enduring principles emerge from the interplay between proof theory, category theory, and type theory.  Such concepts are found in nature, and do not depend on cults of personality or the fads of the computer industry for their existence or importance.

Update: word-smithing.

Extensionality, Intensionality, and Brouwer’s Dictum

August 11, 2012

There seems to be a popular misunderstanding about the propositions-as-types principle that has led some to believe that intensional type theory (ITT) is somehow preferable to or more sensible than extensional type theory (ETT).  Now, as a practical matter, few would dispute that ETT is much easier to use than ITT for mechanizing everyday mathematics.  Some justification for this will be given below, but I am mainly concerned with matters of principle.  Specifically, I wish to dispute the claim that t ETT is somehow “wrong” compared to ITT.  The root of the problem appears to be a misunderstanding of the fundamental ideas of intuitionism, which are expressed by the proposition-as-types principle.

The most popular conception appears to be the trivial one, namely that certain inductively defined formal systems of logic correspond syntactically to certain inductively defined formal systems of typing.  Such correspondences are not terribly interesting, because they can easily be made to hold by construction: all you need to do is to introduce proof terms that summarize a derivation, and then note that the proofs of a proposition correspond to the terms of the associated type.  In this form the propositions-as-types principle is often dubbed, rather grandly, the Curry-Howard Isomorphism.  It’s a truism that most things in mathematics are named after anyone but their discoverers, and that goes double in this case.  Neither Curry nor Howard discovered the principle (Howard himself disclaims credit for it), though they both did make contributions to it.  Moreover, this unfortunate name deprives credit to those who did the real work in inventing the concept, including Brouwer, Heyting, Kolmogorov, deBruijn, and Martin-Löf.  (Indeed, it is sometimes called the BHK Correspondence, which is both more accurate and less grandiose.)  Worse, there is an “isomorphism” only in the most trivial sense of an identity by definition, hardly worth emphasizing.

The interesting conception of the propositions-as-types principle is what I call Brouwer’s Dictum, which states that all of mathematics, including the concept of a proof, is to be derived from the concept of a construction, a computation classified by a type.  In intuitionistic mathematics proofs are themselves “first-class” mathematical objects that inhabit types that may as well be identified with the proposition that they prove.  Proving a proposition is no different than constructing a program of a type.  In this sense logic is a branch of mathematics, the branch concerned with those constructions that are proofs.  And mathematics is itself a branch of computer science, since according to Brouwer’s Dictum all of mathematics is to be based on the concept of computation.  But notice as well that there are many more constructions than those that correspond to proofs.  Numbers, for example, are perhaps the most basic ones, as would be any inductive or coinductive types, or even more exotic objects such as Brouwer’s own choice sequences.  From this point of view the judgement M\in A stating that M is a construction of type A is of fundamental importance, since it encompasses not only the formation of “ordinary” mathematical constructions, but also those that are distinctively intuitionistic, namely mathematical proofs.

An often misunderstood point that must be clarified before we continue is that the concept of proof in intuitionism is not to be identified with the concept of a formal proof in a fixed formal system.  What constitutes a proof of a proposition is a judgement, and there is no reason to suppose a priori that this judgement ought to be decidable.  It should be possible to recognize a proof when we see one, but it is not required that we be able to rule out what is a proof in all cases.  In contrast formal proofs are inductively defined and hence fully circumscribed, and we expect it to be decidable whether or not a purported formal proof is in fact a formal proof, that is whether it is well-formed according to the given inductively defined rules.  But the upshot of Gödel’s Theorem is that as soon as we fix the concept of formal proof, it is immediate that it is not an adequate conception of proof simpliciter, because there are propositions that are true, which is to say have a proof, but have no formal proof according to the given rules.  The concept of truth, even in the intuitionistic setting, eludes formalization, and it will ever be thus.  Putting all this another way, according to the intuitionistic viewpoint (and the mathematical practices that it codifies), there is no truth other than that given by proof.  Yet the rules of proof cannot be given in decidable form without missing the point.

It is for this reason that the first sense of the propositions-as-types principle discussed above is uninteresting, for it only ever codifies a decidable, and hence incomplete, conception of proof.  Moreover, the emphasis on an isomorphism between propositions and types also misses the point, because it fails to account for the many forms of type that do not correspond to propositions.  The formal correspondence is useful in some circumstances, namely those in which the object of study is a formal system.  So, for example, in LF the goal is to encode formal systems, and hence it is essential in the LF methodology that type checking be decidable.  But when one is talking about a general theory of computation, which is to say a general theory of mathematical constructions, there is no reason to expect either an isomorphism or decidability.  (So please stop referring to propositions-as-types as “the Curry-Howard Isomorphism”!)

We are now in a position to discuss the relationship between ITT and ETT, and to correct the misconception that ETT is somehow “wrong” because the typing judgement is not decidable.  The best way to understand the proper relationship between the two is to place them into the broader context of homotopy type theory, or HTT.  From the point of view of homotopy type theory ITT and ETT represent extremal points along a spectrum of type theories, which is to say a spectrum of conceptions of mathematical construction in Brouwer’s sense.  Extensional type theory is the theory of homotopy sets, or hSets for short, which are spaces that are homotopically discrete, meaning that the only path (evidence for equivalence) of two elements is in fact the trivial self-loop between an element and itself.  Therefore if we have a path between x and y in A, which is to say a proof that they are equivalent, then x and y are equal, and hence interchangeable in all contexts.  The bulk of everyday mathematics takes place within the universe of hSets, and hence is most appropriately expressed within ETT, and experience has born this out.  But it is also interesting to step outside of this framework and consider richer conceptions of type.

For example, as soon as we introduce universes, one is immediately confronted with the need to admit types that are not hSets.  A universe of hSets naturally includes non-trivial paths between elements witnessing their isomorphism as hSets, and hence their interchangeability in all contexts.  Taking a single universe of hSets as the sole source of such additional structure leads to (univalent) two-dimensional type theory.  In this terminology ETT is then to be considered as one-dimensional type theory.  Universes are not the only source of higher dimensionality.  For example, the interval has two elements, 0 and 1 connected by a path, the segment between them, which may be seen as evidence for their interchangeability (we can slide them along the segment one to the other).  Similarly, the circle S^1 is a two-dimensional inductively defined type with one element, a base point, and one path, a non-reflexive self-loop from the base point to itself.  It is now obvious that one may consider three-dimensional type theory, featuring types such as S^2, the sphere, and so forth.  Continuing this through all finite dimensions, we obtain finite-dimensional type theory, which is just ITT (type theory with no discreteness at any dimension).

From this perspective one can see more clearly why it has proved so awkward to formalize everyday mathematics in ITT.  Most such work takes place in the universe of hSets, and makes no use of higher-dimensional structure.  The natural setting for such things is therefore ETT, the theory of types as homotopically discrete sets.  By formalizing such mathematics within ITT one is paying the full cost of higher-dimensionality without enjoying any of its benefits.  This neatly confirms experience with using NuPRL as compared to using Coq for formulating the mathematics of homotopy sets, and why even die-hard ITT partisans find themselves wanting to switch to ETT for doing real work (certain ideological commitments notwithstanding).  On the other hand, as higher-dimensional structure becomes more important to the work we are doing, something other than ETT is required.  One candidate is a formulation of type theory with explicit levels, representing the dimensionality restriction appropriate to the problem domain.  So work with discrete sets would take place within level 1, which is just extensional type theory.  Level 2 is two-dimensional type theory, and so forth, and the union of all finite levels is something like ITT.  To make this work requires that there be a theory of cumulativity of levels, a theory of resizing that allows us to move work at a higher level to a lower level at which it still makes sense, and a theory of truncation that allows suppression of higher-dimensional structure (generalizing proof irrelevance and “squash” types).

However this may turn out, it is clear that the resulting type theory will be far richer than merely the codification of the formal proofs of some logical system.  Types such as the geometric spaces mentioned above do not arise as the types of proofs of propositions, but rather are among the most basic of mathematical constructions, in complete accordance with Brouwer’s dictum.

Church’s Law

August 9, 2012

A new feature of this year’s summer school was a reduction in the number of lectures, and an addition of daily open problem sessions for reviewing the day’s material. This turned out to be a great idea for everyone, because it gave us more informal time together, and gave the students a better chance at digesting a mountain of material. It also turned out to be a bit of an embarrassment for me, because I posed a question off the top of my head for which I thought I had two proofs, neither of which turned out to be valid. The claimed theorem is, in fact, true, and one of my proofs is easily corrected to resolve the matter (the other, curiously, remains irredeemable for reasons I’ll explain shortly). The whole episode is rather interesting, so let me recount a version of it here for your enjoyment.

The context of the discussion was extensional type theory, or ETT, which is characterized by an identification of judgemental with propositional equality: if you can prove that two objects are equal,then they are interchangeable for all purposes without specific arrangement. The alternative, intensional type theory,or ITT, regards judgemental equality as definitional equality (symbolic evaluation), and gives computational meaning to proofs of equality of objects of a type, allowing in particular transport across two instances of a family whose indices are equal. NuPRL is an example of an ETT; CiC is an example of an ITT.

Within the framework of ETT, the principle of function extensionality comes “for free”, because you can prove it to hold within the theory. Function extensionality states that f=g:A\to B whenever x:A\vdash f(x)=g(x):B. That is, two functions are if they are equal on all arguments (and, implicitly, respect equality of arguments). Function extensionality holds definitionally if your definitional equivalence includes the \eta and \xi rules, but in any case does not have the same force as extensional equality. Function extensionality as a principle of equality cannot be derived in ITT, but must be added as an additional postulate (or derived from a stronger postulate, such as univalence or the existence of a one-dimensional interval type).

Regardless of whether we are working in an extensional or an intensional theory, it is easy to see that all functions of type N\to N definable in type theory are computable. For example, we may show that all such functions may be encoded as recursive functions in the sense of Kleene, or in a more modern formulation we may give a structural operational semantics that provides a deterministic execution model for such functions (given n:N, run f:N\to N on n until it stops, and yield that as result). Of course the proof relies on some fairly involved meta-theory, but it is all constructively valid (in an informal sense) and hence provides a legitimate computational interpretation of the theory. Another way to say the same thing is to say that the comprehension principles of type theory are such that every object deemed to exist has a well-defined computational meaning, so it follows that all functions defined within it are going to be computable.

This is all just another instance of Church’s Law, the scientific law stating that any formalism for defining computable functions will turn out to be equivalent to, say, the λ-calculus when it comes to definability of number-theoretic functions. (Ordinarily Church’s Law is called Church’s Thesis, but for reasons given in my Practical Foundations book, I prefer to give it the full status of a scientific law.) Type theory is, in this respect, no better than any other formalism for defining computable functions. By now we have such faith in Church’s Law that this remark is completely unsurprising, even boring to state explicitly.

So it may come as a surprise to learn that Church’s Law is false. I’m being provocative here, so let me explain what I mean before I get flamed to death on the internet.   The point I wish to make is that there is an important distinction between the external and the internal properties of a theory. For example, in first-order logic the Löwenheim-Skolem Theorem tells us that if a first-order theory has an infinite model, then it has a countable model. This implies that, externally to ZF set theory, there are only countably many sets, even though internally to ZF set theory we can carry out Cantor’s argument to show that the powerset operation takes us to exponentially higher cardinalities far beyond the countable. One may say that the “reason” is that the evidence for the countability of sets is a bijection that is not definable within the theory, so that it cannot “understand” its own limitations. This is a good thing.

The situation with Church’s Law in type theory is similar. Externally we know that every function on the natural numbers is computable. But what about internally? The internal statement of Church’s Law is this: \Pi f:N\to N.\Sigma n:N. n\Vdash f, where the notation n\Vdash f means, informally, that n is the code of a program that, when executed on input m:N, evaluates to f(m). In Kleene’s original notation this would be rendered as \Pi m:N.\Sigma p:N.T(n,m,p)\wedge Id(U(p),f(m)), where the T predicate encodes the operational semantics, and the U predicate extracts the answer from a successful computation. Note that the expansion makes use of the identity type at the type N. The claim is that Church’s Law, stated as a type (proposition) within ETT, is false, which is to say that it entails a contradiction.

When I posed this as an exercise at the summer school, I had in mind two different proofs, which I will now sketch. Neither is valid, but there is a valid proof that I’ll come to afterwards.

Both proofs begin by applying the so-called Axiom of Choice. For those not familiar with type theory, the “axiom” of choice is in fact a theorem, stating that every total binary relation contains a function. Explicitly,

(\Pi x:A.\Sigma y:B.R(x,y)) \to \Sigma f:A\to B.\Pi x:A.R(x,f(x)).

The function f is the “choice function” that associates a witness to the totality of R to each argument x. In the present case if we postulate Church’s Law, then by the axiom of choice we have

\Sigma F:(N\to N)\to N.\Pi f:N\to N. F(f)\Vdash f.

That is, the functional F picks out, for each function f in N\to N, a (code for a) program that witnesses the computability of f. This should already seem suspicious, because by function extensionality the functional F must assign the same program to any two extensionally equal functions.

We may easily see that F is injective, for if F(f) is F(g), then both track both f and g, and hence f and g are (extensionally) equal. Thus we have an injection from N\to N into N, which seems “impossible” … except that it is not! Let’s try the proof that this is impossible, and see where it breaks down. Suppose that i:(N\to N)\to N is injective. Define d(x)=i^{-1}(x)(x)+1, and consider d(i(d))=i^{-1}(i(d))(i(d))+1=d(i(d))+1 so 0=1 and we are done. Not so fast! Since i is only injective, and not necessarily surjective, it is not clear how to define i^{-1}. The obvious idea is to send x=i(f) to f, and any x outside the image of i to, say, the identity. But there is no reason to suppose that the image of i is decidable, so the attempted definition breaks down. I hacked around with this for a while, trying to exploit properties of F to repair the proof (rather than work with a general injection, focus on the specific functional F), but failed. Andrej Bauer pointed out to me, to my surprise, that there is a model of ETT (which he constructed) that contains an injection of N\to N into N! So there is no possibility of rescuing this line of argument.

(Incidentally, we can show within ETT that there is no bijection between N and N\to N, using surjectivity to rescue the proof attempt above. Curiously, Lawvere has shown that there can be no surjection from N onto N\to N, but this does not seem to help in the present situation. This shows that the concept of countability is more subtle in the constructive setting than in the classical setting.)

But I had another argument in mind, so I was not worried. The functional F provides a decision procedure for equality for the type N\to N: given f,g:N\to N, compare F(f) with F(g). Surely this is impossible! But one cannot prove within type theory that \textrm{Id}_{N\to N}(-,-) is undecidable, because type theory is consistent with the law of the excluded middle, which states that every proposition is decidable. (Indeed, type theory proves that excluded middle is irrefutable for any particular proposition P: \neg\neg(P\vee\neg P).) So this proof also fails!

At this point it started to seem as though Church’s Law could be independent of ETT, as startling as that sounds. For ITT it is more plausible: equality of functions is definitional, so one could imagine associating an index with each function without disrupting anything. But for ETT this seemed implausible to me. Andrej pointed me to a paper by Maietti and Sambin that states that Church’s Law is incompatible with function extensionality and choice. So there must be another proof that refutes Church’s Law, and indeed there is one based on the aforementioned decidability of function equivalence (but with a slightly different line of reasoning than the one I suggested).

First, note that we can use the equality test for functions in N\to N to check for halting. Using the T predicate described above, we can define a function that is constantly 0 iff a given (code of a) program never halts on given input. We may then use the above-mentioned equality test to check for halting. So it suffices to show that the halting problem for (codes of) functions and inputs is not computable to complete the refutation of the internal form of Church’s Law.

Specifically, assume given h:N\times N\to N that, given a code for a function and an input, yields 0 or 1 according to whether or not that function halts when applied to that input. Define d:N\to N by \lambda x:N.\neg h(x,x), the usual diagonal function. Now apply the functional F obtained from Church’s Law using the Axiom of Choice to obtain n=F(d), the code for the function d, and consider h(n,n) to derive the needed contradiction. Notice that we have used Church’s Law here to obtain a code for the type-theoretic diagonal function, which is then passed to the halting tester in the usual way.

As you can see, the revised argument follows along lines similar to what I had originally envisioned (in the second version), but requires a bit more effort to push through the proof properly. (Incidentally, I don’t think the argument can be made to work in pure ITT, but perhaps it would go through for ITT enriched with function extensionality.)

Thus, Church’s Law is false internally to extensional type theory, even though it is evidently true externally for that theory. You can see the similarity to the situation in first-order logic described earlier. Even though all functions of type N\to N are computable, type theory itself is not capable of recognizing this fact (at least, not in the extensional case). And this is a good thing, not a bad thing! The whole beauty of constructive mathematics lies in the fact that it is just mathematics, free of any self-conscious recognition that we are writing programs when proving theorems constructively. We never have to reason about machine indices or any such nonsense, we just do mathematics under the discipline of not assuming that every proposition is decidable. One benefit is that the same mathematics admits interpretation not only in terms of computability, but also in terms of continuity in topological spaces, establishing a deep connection between two seemingly disparate topics.

(Hat tip to Andrej Bauer for help in sorting all this out. Here’s a link to a talk and a paper about the construction of a model of ETT in which there is an injection from N\to N to N.)

Update: word-smithing.

The Holy Trinity

March 27, 2011

The Christian doctrine of trinitarianism states that there is one God that is manifest in three persons, the Father, the Son, and the Holy Spirit, who together form the Holy Trinity.   The doctrine of computational trinitarianism holds that computation manifests itself in three forms: proofs of propositions, programs of a type, and mappings between structures.  These three aspects give rise to three sects of worship: Logic, which gives primacy to proofs and propositions; Languages, which gives primacy to programs and types; Categories, which gives primacy to mappings and structures.  The central dogma of computational trinitarianism holds that Logic, Languages, and Categories are but three manifestations of one divine notion of computation.  There is no preferred route to enlightenment: each aspect provides insights that comprise the experience of computation in our lives.

Computational trinitarianism entails that any concept arising in one aspect should have meaning from the perspective of the other two.  If you arrive at an insight that has importance for logic, languages, and categories, then you may feel sure that you have elucidated an essential concept of computation—you have made an enduring scientific discovery.  Advances in our understanding of computation may arise from insights gained in many ways (any data is useful and relevant), but their essential truth does not depend on their popularity.

Logic tells us what propositions exist (what sorts of thoughts we wish to express) and what constitutes a proof (how we can communicate our thoughts to others).  Languages (in the sense of programming) tells us what types exist (what computational phenomena we wish to express) and what constitutes a program (how we can give rise to that phenomenon).  Categories tell us what structures exist (what mathematical models we have to work with) and what constitutes a mapping between them (how they relate to one another).  In this sense all three have ontological force; they codify what is, not how to describe what is already given to us.  In this sense they are foundational; if we suppose that they are merely descriptive, we would be left with the question of where these previously given concepts arise, leading us back again to foundations.  It is the foundations that I wish to describe here, because I believe it will help to clarify some common misunderstandings about the notions of proposition, type, and structure.  Of particular interest here is that a “type system” is not, under this conception, an arbitrary collection of conditions imposed on a previously given notion of program (whether written with horizontal lines, or not).  It is, rather, a way to say what the programs are in the first place, and what they mean as proofs and as mappings.

Here I will outline the basic correspondences between logic, languages, and categories by examining their structural properties (and, for now, nothing more).

The fundamental notion in logic is that of entailment, written P_1,\dots,P_n\vdash P, expressing derivability of P from P_1,\dots, P_n.  This means that P is derivable from the rules of logic, given the P_i as axioms.  In contrast to admissibility (which I will not discuss further here) this form of entailment does not express implication!  In particular, an entailment is never vacuously true.  Entailment enjoys at least two crucial structural properties, making it a pre-order:

\displaystyle{\strut\over{P\vdash P}}

\displaystyle{{P\vdash Q\quad Q\vdash R}\over{P\vdash R}}.

In addition we often have the following additional structural properties:

\displaystyle{{P_1,\dots,P_n\vdash Q}\over{P_1,\dots,P_n,P_{n+1}\vdash Q}}

\displaystyle{{P_1,\dots,P_i,P_{i+1},\dots,P_n\vdash Q}\over{P_1,\dots,P_{i+1},P_{i},\dots,P_n\vdash Q}}

\displaystyle{{P_1,\dots,P_i,P_i,\dots,P_n\vdash Q}\over{P_1,\dots,P_i,\dots,P_n\vdash Q}}.

These state that “extra” axioms do not affect deduction; the “order” of axioms does not matter; “duplication” of axioms does not matter.  (These may seem inevitable, but in substructural logics any or all of these may be denied.)

In languages we have the fundamental concept of a typing judgement, written x_1{:}A_1,\dots,x_n{:} A_n\vdash M{:}A, stating that M is an expression of type $A$ involving variables x_i of type $A_i$.  A typing judgement must satisfy the following basic structural properties:

\displaystyle{\strut\over{x:A\vdash x:A}}

\displaystyle{{y:B\vdash N:C \quad x:A\vdash M:B}\over{x:A\vdash [M/y]N:C}}

We may think of the variables as names for “libraries”, in which case the first states that we may use any library we wish, and the second states closure under “linking” (as in the Unix tool ld or its relatives), with [M/x]N being the result of linking x in N to the library M.  Typically we expect analogues of the “extra”, “reordering”, and “duplication” axioms to hold as well, though this ain’t necessarily so.  I will leave their formulation as an exercise for the reader.

In categories we have the fundamental concept of a mapping f:X\longrightarrow Y between structures X and Y.  The most elementary structures, perhaps, are sets, and mappings are functions, but it is more common to consider, say, that X and Y are topological spaces, and $f$ is a continuous function between them.  Mappings satisfy analogous structural properties:

\displaystyle{\strut\over{\textit{id}_X : X \longrightarrow X}}

\displaystyle{{f:X\longrightarrow Y \quad g : Y\longrightarrow Z}\over{g\circ f:X\longrightarrow Z}}

These express, respectively, the existence of the identity map, and the closure of maps under composition.  They correspond to reflexivity and transitivity of entailment, and to the library and linking rule of languages.  As with types, one may expect additional closure conditions corresponding to the “extra”, “reordering”, and “duplication” axioms by giving suitable meaning to multiple assumptions.  I will not go into this here, but numerous standard sources treat these conditions in detail.

What I find captivating about computational trinitarianism is that it is beautiful!  Imagine a world in which logic, programming, and mathematics are unified, in which every proof corresponds to a program, every program to a mapping, every mapping to a proof!  Imagine a world in which the code is the math, in which there is no separation between the reasoning and the execution, no difference between the language of mathematics and the language of computing.  Trinitarianism is the central organizing principle of a theory of computation that integrates, unifies, and enriches the language of logic, programming, and mathematics.  It provides a framework for discovery, as well as analysis, of computational phenomena.  An innovation in one aspect must have implications for the other; a good idea is a good idea, in whatever form it may arise.  If an idea does not make good sense logically, categorially, and typically (sorry for the neologism), then it cannot be a manifestation of the divine.