It Is What It Is (And Nothing Else)

A recent discussion of introductory computer science education led to the topic of teaching recursion.  I was surprised to learn that students are being taught that recursion requires understanding something called a “stack” that is nowhere in evidence in their code.  Few, if any, students master the concept, which is usually “covered” only briefly.  Worst, they are encouraged to believe that recursion is a mysterious bit of esoterica that is best ignored.

And thus is lost one of the most important and beautiful concepts in computing.

The discussion then moved on to the implementation of recursion in certain inexplicably popular languages for teaching programming.  As it turns out, the compilers mis-implement recursion, causing unwarranted space usage in common cases.  Recursion is dismissed as problematic and unimportant, and the compiler error is elevated to a “design principle” — to be snake-like is to do it wrong.

And thus is lost one of the most important and beautiful concepts in computing.

And yet, for all the stack-based resistance to the concept, recursion has nothing to do with a stack.  Teaching recursion does not need any mumbo-jumbo about “stacks”.  Implementing recursion does not require a “stack”.  The idea that the two concepts are related is simply mistaken.

What, then, is recursion?  It is nothing more than self-reference, the ability to name a computation for use within the computation itself.  Recursion is what it is, and nothing more.  No stacks, no tail calls, no proper or improper forms, no optimizations, just self-reference pure and simple.  Recursion is not tied to “procedures” or “functions” or “methods”; one can have self-referential values of all types.

Somehow these very simple facts, which date back to the early 1930’s, have been replaced by damaging myths that impede teaching and using recursion in programs.  It is both a conceptual and a practical loss.  For example, the most effective methods for expressing parallelism in programs rely heavily on recursive self-reference; much would be lost without it.  And the allegation that “real programmers don’t use recursion” is beyond absurd: the very concept of a digital computer is grounded in recursive self-reference (the cross-connection of gates to form a latch).  (Which, needless to say, does not involve a stack.)  Not only do real programmers use recursion, there could not even be programmers were it not for recursion.

I have no explanation for why this terrible misconception persists.  But I do know that when it comes to programming languages, attitude trumps reality every time.  Facts?  We don’t need no stinking facts around here, amigo.  You must be some kind of mathematician.

If all the textbooks are wrong, what is right?  How should one explain recursion?  It’s simple.  If you want to refer to yourself, you need to give yourself a name.  “I” will do, but so will any other name, by the miracle of α-conversion.  A computation is given a name using a fixed point (not fixpoint, dammit) operator:  fix x is e stands for the expression e named x for use within e.  Using it, the textbook example of the factorial function is written thus:

fix f is fun n : nat in case n {zero => 1 | succ(n') => n * f n'}.

Let us call this whole expression fact, for convenience.  If we wish to evaluate it, perhaps because we wish to apply it to an argument, its value is

fun n : nat in case n {zero => 1 | succ(n') => n * fact n'}.

The recursion has been unrolled one step ahead of execution.  If we reach fact again, as we will for a positive argument,  fact is evaluated again, in the same way, and the computation continues.  There are no stacks involved in this explanation.

Nor is there a stack involved in the implementation of fixed points.  It is only necessary to make sure that the named computation does indeed name itself.  This can be achieved by a number of means, including circular data structures (non-well-founded abstract syntax), but the most elegant method is by self-application.  Simply arrange that a self-referential computation has an implicit argument with which it refers to itself.  Any use of the computation unrolls the self-reference, ensuring that the invariant is maintained.  No storage allocation is required.

Consequently, a self-referential functions such as

fix f is fun (n : nat, m:nat) in case n {zero => m | succ(n') => f (n',n*m)}

execute without needing any asymptotically significant space.  It is quite literally a loop, and no special arrangement is required to make sure that this is the case.  All that is required is to implement recursion properly (as self-reference), and you’re done.  There is no such thing as tail-call optimization.  It’s not a matter of optimization, but of proper implementation.  Calling it an optimization suggests it is optional, or unnecessary, or provided only as a favor, when it is more accurately described as a matter of getting it right.

So what, then, is the source of the confusion?  The problem seems to be a too-close association between compound expressions and recursive functions or procedures.  Consider the classic definition of factorial given earlier.  The body of the definition involves the expression

n * fact n'

where there is a pending multiplication to be accounted for.  Once the recursive call (to itself) completes, the multiplication can be carried out, and it is necessary to keep track of this pending obligation.  But this phenomenon has nothing whatsoever to do with recursion.  If you write

n * square n'

then it is equally necessary to record where the external call is to return its value.  In typical accounts of recursion, the two issues get confused, a regrettable tragedy of error.

Really, the need for a stack arises the moment one introduces compound expressions.  This can be explained in several ways, none of which need pictures or diagrams or any discussion about frames or pointers or any extra-linguistic concepts whatsoever.  The best way, in my opinion, is to use Plotkin’s structural operational semantics, as described in my Practical Foundations for Programming Languages (Second Edition) on Cambridge University Press.

There is no reason, nor any possibility, to avoid recursion in programming.  But folk wisdom would have it otherwise.  That’s just the trouble with folk wisdom, everyone knows it’s true, even when it’s not.

Update: Dan Piponi and Andreas Rossberg called attention to a pertinent point regarding stacks and recursion.  The conventional notion of a run-time stack records two distinct things, the control state of the program (such as subroutine return addresses, or, more abstractly, pending computations, or continuations), and the data state of the program (a term I just made up because I don’t know a better one, for managing multiple simultaneous activations of a given procedure or function).  Fortran (back in the day) didn’t permit multiple activations, meaning that at most one instance of a procedure can be in play at a given time.  One consequence is that α-equivalence can be neglected: the arguments of a procedure can be placed in a statically determined spot for the call.  As a member of the Algol-60 design committee Dijkstra argued, successfully, for admitting multiple procedure activations (and hence, with a little extra arrangement, recursive/self-referential procedures).  Doing so requires that α-equivalence be implemented properly; two activations of the same procedure cannot share the same argument locations.  The data stack implements α-equivalence using de Bruijn indices (stack slots); arguments are passed on the data stack using activation records in the now-classic manner invented by Dijkstra for the purpose.  It is not self-reference that gives rise to the need for a stack, but rather re-entrancy of procedures, which can arise in several ways, not just recursion.  Moreover, recursion does not always require re-entrancy—the so-called tail call optimization is just the observation that certain recursive procedures are not, in fact, re-entrant.  (Every looping construct illustrates this principle, albeit on an ad hoc basis, rather than as a general principle.)

44 Responses to It Is What It Is (And Nothing Else)

  1. Thanks for the post!

    The first edition of your PFPL is a continuing source of inspiration for me. (Hopefully the second edition will be printed to be more friendly to the student; though there is space to take notes on the margins, the book doesn’t bend comfortably. It’d be great if it were of the same printed quality as Pierce’s types book (the introductory one)).

    Minor typo (extra ‘a’) in “Consequently, a self-referential functions such as…”. Also the formula immediately following that sentence seems to be cut off from the right.

  2. subttle says:

    Hi, I enjoyed reading your point of view. Out of curiosity, could you formulate your argument in terms of automata? I’m interested in hearing your thoughts, thanks.

    • Well, finite state machines can be self-referential when they have cycles in their graphs. Each transition is irrevocable; there is no concept of “return”. So no stack is involved.

  3. Alain Marty says:

    It was very interesting to read your post. I’m working on a language where recursion is built without any stack: I would be happy to have your opinion.

  4. sigfpe says:

    > If you write n * square n’ then a stack is needed

    This is not true. A stack is not needed. The reason a stack is used in this case is that the stack is convenient to use because it’s already there. The reason the stack is already there is to support recursion.

    Back in the old days a Fortran compiler, say, implementing `square`, would have had a fixed location for storing any registers that needed saving. The stack was a later innovation that enabled recursion. What’s more, from a compiler writer’s perspective the stack enabled recursion and nothing else. So it makes perfect sense that recursion and stacks are closely associated, at least in the minds of older programmers. See, for example, page 94 here:

    • Yes, it’s true that Fortran did not need a stack because one can allocate the space required for the pending multiplication in the code itself, and one can arrange function calls to place arguments in a dedicated spot for that function alone. If you can ensure that functions are predefined and have only one activation at a time, you can get away with that. But I am taking for granted that higher-order functions and lambda’s are a given, and that one wishes to allow a function to have multiple activations; this could occur with concurrency, for example. Then one uses some form of a stack (perhaps heap-allocated, not necessarily the conventional “run-time stack”) to manage function calls — essentially alpha-conversion for the multiple instances. But my main point is that self-reference is not tied to functions, and that it itself does not require space (as illustrated by the latch example). If self-reference is implemented properly, there is no need for “tail call optimization”, because it’s not about calls at all.

  5. morrisett says:

    “what is needed is a cost semantics with which one can predict space usage accurately.”

    Ah, but that is the point that most people have in introducing the concept of a stack. Indeed, the space cost model for Java (though not formally defined) makes the assumption that a tail-call still requires space to store the appropriate call context. This is often demonstrated by writing the non-tail recursive version of a function, such as factorial and then getting a stack overflow, later followed by showing how, using a numeric accumulator, one does not get the stack overflow.

    The same kinds of transformations must be understood in languages that do not have the tail-call space leak, so it is still relevant.

    Related nits: It’s not tail-call “optimization”. The other languages just have a frickin leak in their cost model. And, it’s not tail-recursion optimization: you can blow the stack with enough nested function calls without resorting to recursion.

    • Neel Krishnaswami says:

      I think this story is hard to teach because it fuses multiple steps into one.

      1. We can introduce the stack by taking a program and doing a CPS translation on it. In languages without first-class control, the continuation will be used linearly, which creates the close tie between the (translated) program text and its space usage.
      2. Tail call optimization corresponds to eta-reducing the continuation whenever it’s obvious to do so.
      3. Introducing accumulating parameters corresponds to using specialized continuation representations, which are derivable via defunctionalization. (Cf Mitch Wand’s classic old paper, “Continuation-Based Program Transformation Strategies”.)

      You and I know how to this systematically, but I think to students it can easily be overwhelming, because each of these steps has a serious idea in it that takes a while to digest.

    • FWIW, I’ve been teaching the concepts for decades, as has Felleisen, without difficulty. The key is to use SOS to specify what you mean entirely in terms of the language itself. I never speak of a “stack”, or any other extra-linguistic concepts, and there is never any difficulty in explaining the space usage implied in executing a program. As one commenter remarked, the source of the difficulty is to confuse implementation with abstraction, and is compounded by describing a mistaken implementation. From there its a maze of twisty little passages from which most students never emerge.

  6. The unwarranted introduction of irrelevant implementation details is a pest. I have also written about similar issues – some of them even worse than misteaching recursion. Hungarian universities sometimes teach factually incorrect concepts.

  7. I wonder how ” n * f n’ ” inside the body of f got changed to ” n * fact n’ ” after unrolling. I realize that “fact” and “f” refer to the same thing, but one doesn’t usually replace a name by a synonym because one doesn’t need to.

    • Oh, well, I thought I was helping to make the discussion clearer.

      Officially, we have fix x is e steps to [fix x is e / x]e (substitution). So I named the fix and used that name when writing the unrolled form.

  8. Andrew Gacek says:

    “The best way, in my opinion, is to use structural operational semantics…”

    Since structural operational semantics are defined inductively, won’t students need to already understand recursion in some sense? When you construct a derivation tree in structural operational semantics, you don’t completely know what the resulting term is until you’ve fully constructed the derivation. Once you’ve finished constructing the top of the derivation, you unwind this back down to fill in the various right-hand sides of your judgments. This procedure is not part of the definition of structural operational semantics, but it’s an important one for students to understand. It’s very much like unwinding a stack. In fact, a derivation is just rules stacked on top of each other.

    • Yes, it’s quite elegant how I can use SOS from the start to get them used to recursion so that when I teach recursion there is nothing to it.

      Bear in mind that the unrolling interpretation involves no premises to the transition rule, so there is no circularity in the sense you are concerned about. My very point is that there is no stack, meta or otherwise, involved in self-reference! It’s a crime that so many people have been taught otherwise.

    • Incidentally, learning SOS is no more difficult than understanding parentheses in elementary school algebra. SOS is so natural one can use it without even explaining it.

  9. Ogro says:

    Is the link to the prior parallelism article broken? It looks like it’s trying to force a WP post. Thank you for the interesting reading!

    • Could be, let me check …. I can’t figure it out. There used to be a simple way to cross-link posts, but now there isn’t any more. If I use the link tool in the editor to link to the post, it results in a link to an “edit this post” page. I haven’t figured out what to do.

      I tried another thing, perhaps that works. It’s now very clumsy whereas it used to be very easy.

  10. Can I mention two senses in which there _might_ be a somewhat fundamental connection between stacks and recursion?

    The first is that the whole grammatical structure of programming languages is bootstrapped on innate human capabilities to process natural language. Recursion is one of the axiomatic parts of language that we all understand subconsciously (i.e., we almost certainly have dedicated mental hardware for it), and some of us graduate to understanding it consciously. It wouldn’t be crazy to speculate that there is a stack data structure in human brains to provide this implicit understanding. Beyond the fact that it’s speculation, there’s no implication that explicit understanding should be based on stacks, too, of course.

    The second is that it seems like most programmers think more readily in big-step operational semantics than small-step, at least for some important classes of programs. We know how to translate a big-step semantics systematically into an abstract machine by introducing a stack. Novice programmers, especially, may be applying that transformation mentally without realizing it, so that this kind of explanation has educational value for them! It does make sense to “charge” that feature to nested expressions, but perhaps explaining it that way to novices would be less helpful than just accepting the way they’ll tend to think by default.

    • codeinfig says:

      “The second is that it seems like most programmers think more readily in big-step operational semantics than small-step, at least for some important classes of programs.”

      i agree and i would go further: while learning to code can often be as simple a matter as vocabulary and grammar, learning to think about code is more about “thinking down” to the level of discrete computational steps.

      theres nothing unnatural about breaking everything down into steps, we do it all the time. but its unintuitive at least to the average person who is consciously more abstract. this leaves education a matter of “top-down” or “bottom-up” explaining, or preferably both.

      the question here is whether a stack is a “false bottom.” in some cases i guess it is, but is an example really false if it doesnt cover all possible contexts of the thing its an example of? in short, i suspect i agree with you.

  11. codeinfig says:

    “real programmers don’t use recursion”

    i must say, ive never heard that one. and although ive written a few recursive routines, the only connection with the stack that i know is that in the 16-bit days, any recursive routine that did “real work” (more than a few calls to self) would “run out of stack space.” i think the error message said so. the source of the myth, perhaps?

    i imagine you could implement recursion using a stack, though ive never tried.

    i admit, based on these i assumed that the recursionlimit in python was a “stack limit,” but ive never taught that recursion requires a stack. i prefer to get to the practical mechanics *first*, and make as few assumptions as possible when teaching the underlying works.

    incidentally, how/where/with what does the program keep track of each value of a self-referencing function? does it differ from one language to another? is it very much unlike a stack?

  12. I agree with your account of the fundamental character of recursion, but I think you’re setting up the stack explanation as something of a straw man. In my experience teaching the idea to beginners, this is just one of several approaches used to overcome a conceptual roadblock to accepting self-reference as something that’s even possible to use. For a great many of the students I’ve taught, seeing the way that the inductive case “unwinds”after reaching the base case is crucial to comprehending the difference between well-founded self-reference on the one hand and circularity on the other; until they get over that hurdle of a circular understanding, most or all of them have no chance of comprehending the core ideas.

    It is just one of several alternatives for intuition-building, of course, working well for some students and not at all for others. Usually, my upper division math majors in the intro CS course do much better with the initial lens of inductive proof, instead. I use the kind of substitution examples you argue for, too, and many students also find that effective (others, not so much). I also give them folds, though never by that name or in a direct form (sorry to report that we’re still a Java outfit in the intro class), because seeing common patterns of recursion and useful applications is also crucial.

    My point overall is that explanations like the call stack may distract from the fundamental character of recursion, but as a teaching tool, it works, at least for a broad enough segment of the beginner population to justify its continued use. That argument and your observation about the primacy of recursion as a concept (which I agree with) are not incompatible.

    • I don’t see how one can build intuition based on a mistaken conception of what is going on. I’ve been teaching recursion for decades, and have never once resorted to talking about stacks.

    • In what way is it a mistaken conception of what’s going on? Certainly, the call stack is not inherent to the notion of recursion, nor is it the only way to implement it. There’d be no use in disputing those facts. On the other hand, the call stack *is* the way recursion is implemented in many mainstream languages, the C/C++/Java family in particular (whether those are appropriate starting points for the beginning of the CS curriculum is a separate question, of course).

      My point is simply that concrete intuitions are often useful, especially for beginners, and that this is one of many tools that can be effective. I have also been able to teach recursion without reference to this implementation strategy, but for a certain demographic of students (generally, ones who come in to CS1 with a weak math background but the ability to improve this), it has proven very useful. For example, stack overflow errors provide a nice way of visualizing the nature of divergent computation. Being able to trace the execution of statements in simple recursive calls by hand (which includes an understanding of the call stack) is a form of conceptual bootstrapping for which I have frequently observed success.

      None of that takes away from your larger point, which is that recursion is not a specialized or optional technique, to be avoided in production code. On that matter, again, I see no use in disputing anything. But the kind of solid, mathematically mature view of this notion of computation is the most important goal here, and it is not wrong to begin students with a specialized operational view of one implementation, even if that means that you have to make clear that it’s not inherent to the semantics itself.

    • But it would be of an incorrect implementation. Why propagate ignorance?

    • Because it works in getting students past an initial conceptual hill that many of them fail to climb otherwise. It requires an iterative improvement, in order to finally reach the correct view, but so what? That’s better than having them never reach it at all.

    • You’re then confusing an abstract concept with a broken implementation. No good can come of that.

    • Recursion is an abstract, mathematical concept which must be understood at face value. It is just an accident that recursive functions on most computers are implemented using stacks, and insisting on this contingency detracts from understanding recursion.

      Two points: First, stacks are used for function calls in general, not only for recursion. Does anyone sane advocate teaching (mathematical) functions in reference to call stacks? The same applies to recursion. Second, recursion can absolutely be implemented without stacks. Check out my ICFP 2011 to see how this can be done.

  13. abb538 says:

    I agree with your thesis. However, let me play the devil’s advocate here.

    Suppose recursion is as easy to explain as you claim. Then surely, the dual notion of corecursion is also easy to explain.

    Corecursion is often explained as a way of defining “streams”. Or a way to “generate” elements. However, these words are part of that mumbo-jumbo that is not directly related to corecursion.

    And on top of that, a “stream” does not mean anything computationally, to someone who is not already a computer scientist. And “generate” will only confuse non-computer-scientists further.

    So is there an analogues simple explanation of corecursion, or a reason that that is fundamentally harder to explain?

    • You’re confusing fundamentally different concepts of recursion. My post is concerned with “general recursion”, which is just “self-reference”. You are referring to what is sometimes called “structural recursion”, which is also mentioned in Dan Ghica’s comment on this post. The structural form is most relevant in situations where all functions are total, not the common case for programming languages. One can still use the structuring method in partial languages, but it does not have the force of ensuring termination, undermining much of the point of distinguishing the concept. Inductive and coinductive types are discussed in some detail in my PFPL, quite apart from general recursion. In the exercises to the chapter on PCF I explore the relationship between the two settings, and the semantically significant role of laziness in the partial case.

    • abb538 says:

      Yes, that was my confusion. Thanks.

  14. Well said. Furthermore, recursion is best motivated when used on a data structure which is itself recursive — structural recursion. Comparing a recursive function over a recursive data structure (e.g. expression evaluation) versus the same thing done in an imperative and iterative style it’s quite enlightening. On the other hand, trivial examples of recursion (such as factorial) are rather baffling because they can be done just as easily in an imperative and iterative style. The final point is that the biggest payload of advantage is delivered when you need to reason about the code as well. Inductive reasoning using recursive functions is much easier than Hoare-style reasoning for imperative code.

    In fact my entire first year programming course is structured around this very idea. Here is a link to it:

  15. I suspect the reason for recursion-stack confusion is that in a typical implementation one does not get stack overflow unless one uses recursion. Essentially the need for stack can be forgotten as it is always there and is big enough as long as one does not use “bad” patterns. Even supposedly low-level languages like C or their modern contenders like Rust do not allow to express the stack requirements in the code.

    The price for this “stack is always there” is that when one needs to care about stack, it is really hard to do. It took years for Linux to switch to 4K stacks, green thread implementations are hard as for performance they must estimate the real stack usage in a typical program leading to mess of heuristics etc, one must forget about common frameworks on memory constrained systems etc.

    • It’s worth remembering the whole concept of a stack as found in, say, the Linux kernel is but one way to go. There are many ways to manage control state in a program that do not involve a contiguously allocated region of memory.

    • I should clarify. My point is that programmers are not aware of the stack at all. Asking a person who program in mainstream languages about the stack usage of their code most likely brings a wild guess at best. The stack existence is only revealed when using recursion on poor implementations without tail call optimization and continues OS-limited stack.

      This unawareness of stack leads to no pressure on implementations to improve its usage and performance (optimizing tail calls is helpful even without recursive calls) under false premises like that it destroys debugging due to lack of stack traces.

    • My point, in part, is that there is no such thing as tail call optimization.

      Setting that aside, what is needed is a cost semantics with which one can predict space usage accurately. See my PFPL for details on cost semantics.

    • Igor, I don’t think it’s true that programmers are not aware of the stack. Stack traces in particular are a primary debugging aid in mainstream languages, and programmers rely on it to an extent that many are violently opposed to tail call elimination (“makes the language unusable”). The problem of course is that they like to think of the call stack as an execution history, when in fact it’s a representation of the continuation. Without TCE, the two are symmetric, and thus indistinguishable. With TCE, the difference becomes very apparent.

      And then there are those languages where you can reflect on the stack, and some people love that, but I’ll refrain from expressing my opinion about that…

    • Eli Barzilay says:

      Here’s a guess at why some people are, as Andreas puts it, violently opposed to tail call elimination. The history of the computation is indeed useful when state is involved — but what you’ll really want is not just the “current history” as represented by a stack of call frames; you’ll want the history of all code that made x be a bogus 2 when you expected it to be 1. Why aren’t TCE-opposers demand that? Probably because they’re vaguely aware that keeping it will make things impractical, combined with never actually using a system where that information is available (which would make people cry when losing it). I think that *this* kind of history is much more relevant and useful than “who called this function”, but (I’m guessing) people feel that it’s important to know the caller since it is more likely to be responsible for x having a bad value. Maybe that’s also why functional programmers adopt TCE faster, since they don’t use state as much.

    • arielbyd says:

      I suspect that people tend to use the “fortran” mode of thinking to analyze programs, and that does not work with recursion.

      Also, low-level languages allow you to allocate memory on stack frames. This is a part of their semantics and can’t be eliminated, and often requires for stack-frame memory to be kept.

    • Quite true. The problem with “the stack” as if it were found in nature is that recursion need not involve it. The lesson of tail recursion is that recursive procedures need not always be reentrant: no stack required.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: