A “proof by contradiction” is not a proof that ends with a contradiction

March 4, 2017

It is well-known that constructivists renounce “proof by contradiction”, and that classicists scoff at the critique.  “Those constructivists,” the criticism goes, “want to rule out proofs by contradiction.  How absurd!  Look, Pythagoras showed that the square root of two is irrational by deriving a contradiction from the assumption that it is rational.  There is nothing wrong with this.  Ignore them!”

On examination that sort of critique fails, because a proof by contradiction is not a proof that derives a contradiction.  Pythagoras’s  proof is valid, one of the eternal gems of mathematics.  No one questions the validity of that argument, even if they question proof by contradiction. 

Pythagoras’s Theorem expresses a negation: it is not the case that the square root of two can be expressed as the ratio of two integers.  Assume that it can be so represented.  A quick deduction shows that this is impossible.  So the assumption is false.  Done.  This is a direct proof of a negative assertion; it is not a “proof by contradiction”.

What, then, is a proof by contradiction?  It is the affirmation of a positive statement by refutation of its denial.  It is a direct proof of the negation of a negative assertion that is then pressed into service as a direct proof of the assertion, which it is not.  Anyone is free to ignore the distinction for the sake of convenience, as a philosophical issue, or as a sly use of “goto” in a proof, but the distinction nevertheless exists and is important.  Indeed, part of the beauty of constructive mathematics is that one can draw such distinctions, for, once drawn, one can selectively disregard them.  Once blurred, forever blurred, a pure loss of expressiveness.

For the sake of explanation, let me rehearse a standard example of a proof by contradiction.  The claim is that there exists irrationals a and b such that a to the b power is rational.  Here is an indirect proof, a true proof by contradiction.  Move number one, let us prove instead that it is impossible that any two irrationals a and b are such that a to the b is irrational.  This is a negative statement, so of course one proves it be deriving a contradiction from assuming it. But it is not the original statement!  This will be clear from examining the information content of the proof.

Suppose, for a contradiction, that every two irrationals a and b are such that a to the b power is irrational.  We know from Pythagoras that root two is irrational, so plug it in for both a and b, and conclude that root two to the root two power is irrational.  Now use the assumption again, taking a to be root two to the root two, and b to be root two.  Calculate a to the power of b, it is two, which is eminently rational.  Contradiction.

We have now proved that it is not the case that every pair of irrationals, when exponentiated, give an irrational.  There is nothing questionable about this proof as far as I am aware.  But it does not prove that there are two irrationals whose exponent is rational!  If you think it does, then I ask you, please name them for me.  That information is not in this proof (there are other proofs that do name them, but that is not relevant for my purposes).  You may, if you wish, disregard the distinction I am drawing, that is your prerogative, and neither I nor anyone has any problem with that.  But you cannot claim that it is a direct proof, it is rather an indirect proof, that proceeds by refuting the negative of the intended assertion.

So why am I writing this?  Because I have learned, to my dismay, that in U.S. computer science departments–of all places!–students are being taught, erroneously, that any proof that derives a contradiction is a “proof by contradiction”.  It is not.  Any proof of a negative must proceed by contradiction.  A proof by contradiction in the long-established sense of the term is, contrarily, an indirect proof of a positive by refutation of the negative.  This distinction is important, even if you want to “mod out” by it in your work, for it is only by drawing the distinction that one can even define the equivalence with which to quotient.

That’s my main point.  But for those who may not be familiar with the distinction between direct and indirect proof, let me take the opportunity to comment on why one might care to draw such a distinction.  It is entirely a matter of intellectual honesty: the information content of the foregoing indirect proof does not fulfill the expectation stated in the theorem.  It is a kind of boast, an overstatement, to claim otherwise.  Compare the original statement with the reformulation used in the proof.  The claim that it is not the case that every pair of irrationals exponentiate to an irrational is uncontroversial.  The proof proves it directly, and there is nothing particularly surprising about it.  One would even wonder why anyone would bother to state it.  Yet the supposedly equivalent claim stated at the outset appears much more fascinating, because most people cannot easily think up an example of two irrationals that exponentiate to rationals.  Nor does the proof provide one. Once, when shown the indirect proof, a student of mine blurted out “oh that’s so cheap.”  Precisely.

Why should you care?  Maybe you don’t, but there are nice benefits to keeping the distinction, because it demarcates the boundary between constructive proofs, which have direct interpretation as functional programs, and classical proofs, which have only an indirect such interpretation (using continuations, to be precise, and giving up canonicity).  Speaking as a computer scientist, this distinction matters, and it’s not costly to maintain.  May I ask that you adhere to it?

Edit: rewrote final paragraph, sketchy and irrelevant, and improved prose throughout. 


Cubical Higher Type Theory as a Programming Language

July 31, 2016

I gave a presentation at the Workshop on Categorical Logic and Univalent Foundations held in Leeds, UK July 27-29th 2016.  My talk, entitled Computational Higher Type Theory, concerns a formulation of higher-dimensional type theory in which terms are interpreted directly as programs and types as programs that express specifications of program behavior.  This approach to type theory, first suggested by Per Martin-Löf in his famous paper Constructive Mathematics and Computer Programming and developed more fully by The NuPRL Project, emphasizes constructive mathematics in the Brouwerian sense: proofs are programs, propositions are types.

The now more popular accounts of type theory emphasize the axiomatic freedom given by making fewer foundational commitments, such as not asserting the decidability of every type, but give only  an indirect account of their computational content, and then only in some cases.  In particular, the computational content of Voevodsky’s Univalence Axiom in Homotopy Type Theory remains unclear, though the Bezem-Coquand-Huber model in cubical sets carried out in constructive set theory gives justification for its constructivity.

To elicit the computational meaning of higher type theory more clearly, emphasis has shifted to cubical type theory (in at least two distinct forms) in which the higher-dimensional structure of types is judgmentally explicit as the higher cells of a type, which are interpreted as identifications.  In the above-linked talk I explain how to construe a cubical higher type theory directly as a programming language.  Other efforts, notably by Cohen-Coquand-Huber-Mörtberg, have similar goals, but using somewhat different methods.

For more information, please see my home page on which are linked two arXiv papers providing the mathematical details, and a 12-page paper summarizing the approach and the major results obtained so far.  These papers represent joint work with Carlo Angiuli and Todd Wilson.

PCLSRING in Semantics

July 11, 2016

PCLSRING is an operating system kernel design technique introduced in ITS for managing interruptions of long-running synchronous system calls.  It was mentioned in an infamous diatribe by Dick Gabriel, and is described in loving detail by Allen Bawden in an article for the ages.

Discussions of PCLSRING usually center on fundamental questions of systems design.  Is the ITS approach better than the Unix approach?   Should the whole issue be avoided by using asynchronous system calls, as in VMS?  And weren’t the good old days better than the bad new days anyway?

Let’s set those things aside for now and instead consider what it is, rather than what it’s for or whether it’s needed.  The crux of the matter is this.  Suppose you’re working with a system such as Unix that has synchronous system calls for file I/O, and you initiate a “large” read of n bytes into memory starting at address a.  It takes a while to perform the transfer, during which time the process making the call may be interrupted for any number of reasons.  The question is, what to do about the process state captured at the moment of the interrupt?

For various reasons it doesn’t make sense to snapshot the process while it is running inside the kernel.  One solution is to simply stop the read “in the middle” and arrange that, when the process resumes, it returns from the system call indicating that some m<=n bytes have been read.  You’re supposed to check that m=n yourself anyway, and restart the call if not.  (This is the Unix solution.)  It is all too easy to neglect the check, and the situation is made the worse because so few languages have sum types which would make it impossible to neglect the deficient return.

PCLSRING instead stops the system call in place, backs up the process PC to the system call, but with the parameters altered to read n-m bytes into location a+m, so that when the process resumes it simply makes a “fresh” system call to finish the read that was so rudely interrupted.  The one drawback, if it is one, is that your own parameters may get altered during the call, so you shouldn’t rely on them being anything in particular after it returns.  (This is all more easily visualized in assembly language, where the parameters are typically words that follow the system call itself in memory.)

While lecturing at this year’s OPLSS, it occurred to me that the dynamics of Modernized Algol in PFPL, which is given in Plotkin’s style, is essentially the same idea.  Consider the rule for executing an encapsulated command:

if mm’, then bnd(cmd(m);x.m”)bnd(cmd(m’);x.m”)

(I have suppressed the memory component of the state, which is altered as well.)  The expression cmd(m) encapsulates the command m.  The bnd command executes m and passes its result to another command, m”, via the variable x.  The above rule specifies that a step of execution of m results in a reconstruction of the entire bnd, albeit encapsulating m’ , the intermediate result, instead of m.  It’s exactly PCLSRING!  Think of m as the kernel code for the read, think of cmd as the system call, and think of the bnd as the sequential composition of commands in an imperative language.  The kernel only makes partial progress executing m before being interrupted, leaving m’ remaining to be executed to complete the call.  The “pc” is backed up to the bnd, albeit modified with m’ as the new “system call” to be executed on the next transition.

I just love this sort of thing!  The next time someone asks “what the hell is PCLSRING?”, you now have the option of explaining it in one line, without any mention of operating systems.  It’s all a matter of semantics.



PFPL Commentary

June 3, 2016

I am building a web page devoted to the 2nd edition of Practical Foundations for Programming Languages, recently published by Cambridge University Press.  Besides an errata, the web site features a commentary on the text explaining major design decisions and suggesting alternatives.  I also plan to include additional exercises and to make sample solutions available to faculty teaching from the book.

The purpose of the commentary is to provide the “back story” for the development, which is often only hinted at, or is written between the lines, in PFPL itself.  To emphasize enduring principles over passing fads, I have refrained from discussing particular languages in the book.  But this makes it difficult for many readers to see the relevance.  One purpose of the commentary is to clarify these connections by explaining why I said what I said.

As a starting point, I explain why I ignore the familiar concept of a “paradigm” in my account of languages.  The idea seems to have been inspired by Kuhn’s (in)famous book The Structure of Scientific Revolutions, and was perhaps a useful device at one time.  But by now the idea of a paradigm is just too vague to be useful, and there are many better ways to explain and systematize language structure.  And so I have avoided it.

I plan for the commentary to be a living document that I will revise and expand as the need arises.  I hope for it to provide some useful background for readers in general, and teachers in particular.  I wish for the standard undergraduate PL course to evolve from a superficial taxonomy of the weird animals in the language zoo to a systematic study of the general theory of computation.  Perhaps PFPL can contribute to effecting that change.



Practical Foundations for Programming Languages, Second Edition

April 11, 2016

Today I received my copies of Practical Foundations for Programming Languages, Second Edition on Cambridge University Press.  The new edition represents a substantial revision and expansion of the first edition, including these:

  1. A new chapter on type refinements has been added, complementing previous chapters on dynamic typing and on sub-typing.
  2. Two old chapters were removed (general pattern matching, polarization), and several chapters were very substantially rewritten (higher kinds, inductive and co-inductive types, concurrent and distributed Algol).
  3. The parallel abstract machine was revised to correct an implied extension that would have been impossible to carry out.
  4. Numerous corrections and improvements were made throughout, including memorable and pronounceable names for languages.
  5. Exercises were added to the end of each chapter (but the last).  Solutions are available separately.
  6. The index was revised and expanded, and some conventions systematized.
  7. An inexcusably missing easter egg was inserted.

I am grateful to many people for their careful reading of the text and their suggestions for correction and improvement.

In writing this book I have attempted to organize a large body of material on programming language concepts, all presented in the unifying framework of type systems and structural operational semantics.  My goal is to give precise definitions that provide a clear basis for discussion and a foundation for both analysis and implementation.  The field needs such a foundation, and I hope to have helped provide one.


The Power of Negative Thinking

January 27, 2015

Exception tracking is a well-known tar baby of type system design.  After all, if expressions can have two sorts of result, why shouldn’t the type say something about them both?  Languages such as CLU, FX, and Java, to name three, provide “throws” or “raises” clauses to the types of procedures that specify an upper bound on the exceptions that can occur when they are called.  It all seems natural and easy, but somehow the idea never really works very well.  One culprit is any form of higher-order programming, which is inherent in object-oriented and functional languages alike.  To handle the indirection requires more complex concepts, such as effect polymorphism, to make thing work reasonably well.  Or untracked exceptions are used to avoid the need to track them.  Somehow such an appealing idea seems rather stickier to realize than one might expect.  But why?

A piece of the puzzle was put into place by Xavier Leroy and François Pessaux in their paper on tracking uncaught exceptions. Their idea was to move use type-based methods to track uncaught exceptions, but to move the clever typing techniques required out of the programming language itself and into a separate analysis tool.  They make effective use of the powerful concept of row polymorphism introduced by Didier Rémy for typing records and variants in various dialects of Caml.  Moving exception tracking out of the language and into a verification tool is the decisive move, because it liberates the analyzer from any constraints that may be essential at the language level.

But why track uncaught exceptions?  That is, why track uncaught exceptions, rather than caught exceptions?  From a purely methodological viewpoint it seems more important to know that a certain code fragment cannot raise certain exceptions (such as the \texttt{match} exception in ML, which arises when a value matches no pattern in a case analysis).  In a closed world in which all of the possible exceptions are known, then tracking positive information about which exceptions might be raised amounts to the same as tracking which exceptions cannot be raised, by simply subtracting the raised set from the entire set.  As long as the raised set is an upper bound on the exceptions that might be raised, then the difference is a lower bound on the set of exceptions that cannot be raised.  Such conservative approximations are necessary because a non-trivial behavioral property of a program is always undecidable, and hence requires proof.  In practice this means that stronger invariants must be maintained than just the exception information so that one may prove, for example, that the values passed to a pattern match are limited to those that actually do satisfy some clause of an inexhaustive match.

How realistic is the closed world assumption?  For it to hold seems to require a whole-program analysis, and is therefore non-modular, a risky premise in today’s world.  Even on a whole-program basis exceptions must be static in the sense that, even if they are scoped, they may in principle be declared globally, after suitable renaming to avoid collisions.  The global declarations collectively determine the whole “world” from which positive exception tracking information may be subtracted to obtain negative exception information.  But in languages that admit multiple instantiations of modules, such as ML functors, static exceptions are not sufficient (each instance should introduce a distinct exception).  Instead, static exceptions must be replaced by dynamic exceptions that are allocated at initialization time, or even run-time, to ensure that no collisions can occur among the instances.  At that point we have an open world of exceptions, one in which there are exceptions that may be raised, but which cannot be named in any form of type that seeks to provide an upper bound on the possible uncaught exceptions that may arise.

For example consider the ML expression

  exception X
  raise X

If one were to use positive exception tracking, what would one say about the expression as a whole?  It can, in fact it does, raise the exception \texttt{X}, yet this fact is unspeakable outside of the scope of the declaration.   If a tracker does not account for this fact, it is unsound in the sense that the uncaught exceptions no longer provide an upper bound on what may be raised.  One maneuver, used in Java, for example, is to admit a class of untracked exceptions about which no static information is maintained.  This is useful, because it allows one to track those exceptions that can be tracked (by the Java type system) and to not track those that cannot.

In an open world (which includes Java, because exceptions are a form of object) positive exception tracking becomes infeasible because there is no way to name the exceptions that might be tracked.  In the above example the exception \textsf{X} is actually a bound variable bound to a reference to an exception constructor.  The name of the bound variable ought not matter, so it is not even clear what the exception raised should be called.  (It is amusing to see the messages generated by various ML compilers when reporting uncaught exceptions.  The information they provide is helpful, certainly, but is usually, strictly speaking, meaningless, involving identifiers that are not in scope.)

The methodological considerations mentioned earlier suggest a way around this difficulty.  Rather than attempt to track those exceptions that might be raised, instead track the exceptions that cannot be raised.  In the above example there is nothing to say about \texttt{X} not being raised, because it is being raised, so we’re off the hook there.  The “dual” example

  exception X

illustrates the power of negative thinking.  The body of the \textsf{let} does not raise the exception bound to \textsf{X}, and this may be recorded in a type that makes sense within the scope of \textsf{X}.  The crucial point is that when exiting its scope it is sound to drop mention of this information in a type for the entire expression.  Information is lost, but the analysis is sound.  In contrast there is no way to drop positive information without losing soundness, as the first example shows.

One way to think about the situation is in terms of type refinements, which express properties of the behavior of expressions of a type.  To see this most clearly it is useful to separate the exception mechanism into two parts, the control part and the data part.  The control aspect is essentially just a formulation of error-passing style, in which every expression has either a normal return of a specified type, or an exceptional return of the type associated to all exceptions.  (Nick Benton and Andrew Kennedy nicely formulated this view of exceptions as an extension of the concept of a monad.)

The data aspect is, for dynamic exceptions, the type of dynamically classified values, which is written \textsf{clsfd} in PFPL.  Think of it as an open-ended sum in which one can dynamically generate new classifiers (aka summands, injections, constructors, exceptions, channels, …) that carry a value of a specified type.  According to this view the exception \textsf{X} is bound to a dynamically-generated classifier carrying a value of unit type.  (Classifier allocation is a storage effect, so that the data aspect necessarily involves effects, whereas the control aspect may, and, for reasons of parallelism, be taken as pure.)  Exception constructors are used to make values of type \textsf{clsfd}, which are passed to handlers that can deconstruct those values by pattern matching.

Type refinements come into play as a means of tracking the class of a classified value.  For the purposes of exception tracking, the crucial refinements of the type \textsf{clsfd} are the positive refinement, \textsf{a}\cdot, and the negative refinement\overline{\textsf{a}\cdot}, which specify that a classified value is, or is not, of class \textsf{a}.  Positive exception tracking reduces to maintaining invariants expressed by a disjunction of positive refinements; negative exception tracking reduces to maintaining invariants expressed by a conjunction of negative refinements.  Revisiting the logic of exception tracking, the key is that the entailment

\overline{\mathsf{a}_1\cdot}\wedge \cdots\wedge \overline{\mathsf{a}_n\cdot} \leq \overline{\mathsf{a}_1\cdot} \wedge \cdots \wedge \overline{\mathsf{a}_{n-1}\cdot}

is valid, whereas the “entailment”

\mathsf{a}_1\cdot \vee \cdots \vee \mathsf{a}_n\cdot \leq \mathsf{a}_1\cdot\vee \cdots \vee \mathsf{a}_{n-1}\cdot

is not.  Thus, in the negative setting we may get ourselves out of the scope of an exception by weakening the refinement, an illustration of the power of negative thinking.

Structure and Efficiency of Computer Programs

September 28, 2014

For decades my colleague, Guy Blelloch, and I have promoted a grand synthesis of the two “theories” of computer science, combinatorial theory and logical theory.  It is only a small exaggeration to say that these two schools of thought work in isolation.  The combinatorial theorists concern themselves with efficiency, based on hypothetical translations of high-level algorithms to low-level machines, and have no useful theory of composition, the most important tool for developing large software systems.  Logical theorists concern themselves with composition, emphasizing the analysis of the properties of components of systems and how those components are combined; the heart of logic is a theory of composition (entailment).  But relatively scant attention is paid to efficiency, and, to a distressingly large extent, the situation is worsening, and not improving.

Guy and I have argued, through our separate and joint work, for the applicability of PL ideas to algorithms design, leading. for example, to the concept of adaptive programming that Umut Acar has pursued aggressively over the last dozen years.  And we have argued for the importance of cost analysis, for various measures of cost, at the level of the code that one actually writes, and not how it is compiled.  Last spring, prompted by discussions with Anindya Banerjee at NSF in the winter of 2014, I decided to write a position paper on the topic, outlining the scientific opportunities and challenges that would arise in an attempt to unify the two, disparate theories of computing.  I circulated the first draft privately in May, and revised it in July to prepare for a conference call among algorithms and PL researchers (sponsored by NSF) to find common ground and isolate key technical challenges to achieving its goals.

There are serious obstacles to be overcome if a grand synthesis of the “two theories” is to be achieved.  The first step is to get the right people together to discuss the issues and to formulate a unified vision of what are the core problems, and what are promising directions for short- and long-term research.  The position paper is not a proposal for funding, but is rather a proposal for a meeting designed to bring together two largely (but not entirely) disparate communities.  In summer of 2014 NSF hosted a three-hour long conference call among a number of researchers in both areas with a view towards hosting a workshop proposal in the near future.  Please keep an eye out for future developments.

I am grateful to Anindya Banerjee at NSF for initiating the discussion last winter that led to the paper and discussion, and I am grateful to Swarat Chaudhuri for his helpful comments on the proposal.

[Update: word smithing, corrections, updating, removed discussion of cost models for fuller treatment later, fixed incoherence after revision.]