It Is What It Is (And Nothing Else)

February 22, 2016

A recent discussion of introductory computer science education led to the topic of teaching recursion.  I was surprised to learn that students are being taught that recursion requires understanding something called a “stack” that is nowhere in evidence in their code.  Few, if any, students master the concept, which is usually “covered” only briefly.  Worst, they are encouraged to believe that recursion is a mysterious bit of esoterica that is best ignored.

And thus is lost one of the most important and beautiful concepts in computing.

The discussion then moved on to the implementation of recursion in certain inexplicably popular languages for teaching programming.  As it turns out, the compilers mis-implement recursion, causing unwarranted space usage in common cases.  Recursion is dismissed as problematic and unimportant, and the compiler error is elevated to a “design principle” — to be snake-like is to do it wrong.

And thus is lost one of the most important and beautiful concepts in computing.

And yet, for all the stack-based resistance to the concept, recursion has nothing to do with a stack.  Teaching recursion does not need any mumbo-jumbo about “stacks”.  Implementing recursion does not require a “stack”.  The idea that the two concepts are related is simply mistaken.

What, then, is recursion?  It is nothing more than self-reference, the ability to name a computation for use within the computation itself.  Recursion is what it is, and nothing more.  No stacks, no tail calls, no proper or improper forms, no optimizations, just self-reference pure and simple.  Recursion is not tied to “procedures” or “functions” or “methods”; one can have self-referential values of all types.

Somehow these very simple facts, which date back to the early 1930’s, have been replaced by damaging myths that impede teaching and using recursion in programs.  It is both a conceptual and a practical loss.  For example, the most effective methods for expressing parallelism in programs rely heavily on recursive self-reference; much would be lost without it.  And the allegation that “real programmers don’t use recursion” is beyond absurd: the very concept of a digital computer is grounded in recursive self-reference (the cross-connection of gates to form a latch).  (Which, needless to say, does not involve a stack.)  Not only do real programmers use recursion, there could not even be programmers were it not for recursion.

I have no explanation for why this terrible misconception persists.  But I do know that when it comes to programming languages, attitude trumps reality every time.  Facts?  We don’t need no stinking facts around here, amigo.  You must be some kind of mathematician.

If all the textbooks are wrong, what is right?  How should one explain recursion?  It’s simple.  If you want to refer to yourself, you need to give yourself a name.  “I” will do, but so will any other name, by the miracle of α-conversion.  A computation is given a name using a fixed point (not fixpoint, dammit) operator:  fix x is e stands for the expression e named x for use within e.  Using it, the textbook example of the factorial function is written thus:

fix f is fun n : nat in case n {zero => 1 | succ(n') => n * f n'}.

Let us call this whole expression fact, for convenience.  If we wish to evaluate it, perhaps because we wish to apply it to an argument, its value is

fun n : nat in case n {zero => 1 | succ(n') => n * fact n'}.

The recursion has been unrolled one step ahead of execution.  If we reach fact again, as we will for a positive argument,  fact is evaluated again, in the same way, and the computation continues.  There are no stacks involved in this explanation.

Nor is there a stack involved in the implementation of fixed points.  It is only necessary to make sure that the named computation does indeed name itself.  This can be achieved by a number of means, including circular data structures (non-well-founded abstract syntax), but the most elegant method is by self-application.  Simply arrange that a self-referential computation has an implicit argument with which it refers to itself.  Any use of the computation unrolls the self-reference, ensuring that the invariant is maintained.  No storage allocation is required.

Consequently, a self-referential functions such as

fix f is fun (n : nat, m:nat) in case n {zero => m | succ(n') => f (n',n*m)}

execute without needing any asymptotically significant space.  It is quite literally a loop, and no special arrangement is required to make sure that this is the case.  All that is required is to implement recursion properly (as self-reference), and you’re done.  There is no such thing as tail-call optimization.  It’s not a matter of optimization, but of proper implementation.  Calling it an optimization suggests it is optional, or unnecessary, or provided only as a favor, when it is more accurately described as a matter of getting it right.

So what, then, is the source of the confusion?  The problem seems to be a too-close association between compound expressions and recursive functions or procedures.  Consider the classic definition of factorial given earlier.  The body of the definition involves the expression

n * fact n'

where there is a pending multiplication to be accounted for.  Once the recursive call (to itself) completes, the multiplication can be carried out, and it is necessary to keep track of this pending obligation.  But this phenomenon has nothing whatsoever to do with recursion.  If you write

n * square n'

then it is equally necessary to record where the external call is to return its value.  In typical accounts of recursion, the two issues get confused, a regrettable tragedy of error.

Really, the need for a stack arises the moment one introduces compound expressions.  This can be explained in several ways, none of which need pictures or diagrams or any discussion about frames or pointers or any extra-linguistic concepts whatsoever.  The best way, in my opinion, is to use Plotkin’s structural operational semantics, as described in my Practical Foundations for Programming Languages (Second Edition) on Cambridge University Press.

There is no reason, nor any possibility, to avoid recursion in programming.  But folk wisdom would have it otherwise.  That’s just the trouble with folk wisdom, everyone knows it’s true, even when it’s not.

Update: Dan Piponi and Andreas Rossberg called attention to a pertinent point regarding stacks and recursion.  The conventional notion of a run-time stack records two distinct things, the control state of the program (such as subroutine return addresses, or, more abstractly, pending computations, or continuations), and the data state of the program (a term I just made up because I don’t know a better one, for managing multiple simultaneous activations of a given procedure or function).  Fortran (back in the day) didn’t permit multiple activations, meaning that at most one instance of a procedure can be in play at a given time.  One consequence is that α-equivalence can be neglected: the arguments of a procedure can be placed in a statically determined spot for the call.  As a member of the Algol-60 design committee Dijkstra argued, successfully, for admitting multiple procedure activations (and hence, with a little extra arrangement, recursive/self-referential procedures).  Doing so requires that α-equivalence be implemented properly; two activations of the same procedure cannot share the same argument locations.  The data stack implements α-equivalence using de Bruijn indices (stack slots); arguments are passed on the data stack using activation records in the now-classic manner invented by Dijkstra for the purpose.  It is not self-reference that gives rise to the need for a stack, but rather re-entrancy of procedures, which can arise in several ways, not just recursion.  Moreover, recursion does not always require re-entrancy—the so-called tail call optimization is just the observation that certain recursive procedures are not, in fact, re-entrant.  (Every looping construct illustrates this principle, albeit on an ad hoc basis, rather than as a general principle.)