In a recent post I asked whether there is any such thing as a declarative language. The main point was to argue that the standard “definitions” are, at best, not very precise, and to see whether anyone might offer a better definition. What I’m after is an explanation of why people seem to think that the phrase has meaning, even though they can’t say very clearly what they mean by it. (One commenter analogized with “love” and “happiness”, but I would counter by saying that we’re trying to do science here, and we ought to be able to define our terms with some precision.)

As I mentioned, perhaps the best “definition” that is usually offered is to say that “declarative” is synonymous with “functional-and-logic-programming”. This is pretty unsatisfactory, since it is not so easy to define these terms either, and because, contrary to conventional classifications, the two concepts have pretty much nothing in common with each other (but for one thing to be mentioned shortly). The propositions-as-types principle helps set them clearly apart: whereas functional programming is about executing proofs, logic programming is about the search for proofs. Functional programming is based on the dynamics of proof given by Gentzen’s inversion principle. Logic programming is based on the dynamics of provability given by cut elimination and focusing. The two concepts of computation could not be further apart.

Yet they *do* have one thing in common that is usefully isolated as fundamental to what we mean by “declarative”, namely *the concept of a variable*. Introduced by the ancient Hindu and Muslim mathematicians, Brahmagupta and al Kwharizmi, the variable is one of the most remarkable achievements of the human intellect. In my previous post I had secretly hoped that someone would propose variables as being central to what we mean by “declarative”, but no one did, at least not in the comments section. My unstated motive for writing that post was not so much to argue that the term “declarative” is *empty*, but to test the hypothesis that few seem to have grasp the importance of this concept for designing a civilized, and broadly applicable, programming language.

My contention is that *variables*, properly so-called, are what distinguish “declarative” languages from “imperative” languages. Although the imperative languages, including all popular object-oriented languages, are based on a concept that is *called* a variable, they lack anything that actually *is* a variable. And this is where the trouble begins, and the need for the problematic distinction arises. The declarative concept of a variable is the *mathematical* concept of an *unknown* that is given meaning by substitution. The imperative concept of a variable, arising from low-level machine models, is instead given meaning by assignment (mutation), and, by a kind of a notational pun, allowed to appear in expressions in a way that resembles that of a proper variable. But the concepts are so fundamentally different, that I argue in *PFPL* that the imperative concept be called an “assignable”, which is more descriptive, rather than “variable”, whose privileged status should be emphasized, not obscured.

The problem with purely imperative programming languages is that they have *only* the concept of an assignable, and attempt to make it serve also as a concept of variable. The results are a miserable mess of semantic and practical complications. Decades of work has gone into rescuing us from the colossal mistake of identifying variables with assignables. And what is the outcome? If you want to reason about assignables, what you do is (a) write a mathematical formulation of your algorithm (using variables, of course) and (b) show that the imperative code simulates the functional behavior so specified. Under this methodology the mathematical formulation is taken as *self-evidently* correct, the standard against which the imperative program is judged, and is not itself in need of further verification, whereas the imperative formulation is, invariably, in need of *verification*.

What an odd state of affairs! The functional “specification” is itself a perfectly good, and apparently self-evidently correct, program. So why not just write the functional (*i.e.*, mathematical) formulation, and call it a day? Why indeed! Declarative languages, being grounded in the language of mathematics, allow for the *identification* of the “desired behavior” with the “executable code”. Indeed, the propositions-as-types principle elevates this identification to a fundamental organizing principle: propositions are types, and proofs are programs. Who needs verification? Once you have a mathematical specification of the behavior of a queue, say, you already have a running program; there is no need to relegate it to a stepping stone towards writing an awkward, and invariably intricate, imperative formulation that then requires verification to ensure that it works properly.

Functional programming languages are written in the universally applicable language of mathematics as expressed by the theory of types. Such languages are therefore an integral part of science itself, inseparable from our efforts to understand and master the workings of the world. Imperative programming has no role to play in this effort, and is, in my view, doomed in the long run to obsolescence, an artifact of engineering, rather than a fundamental discovery on a par with those of mathematics and science.

This brings me to my main point, the popular concept of a *domain-specific language*. Very much in vogue, DSL’s are offered as the solution to many of our programming woes. And yet, to borrow a phrase from my colleague Guy Blelloch, the elephant in the room is the question “what is a domain?”. I’ve yet to hear anyone explain how you decide what are the boundaries of a “domain-specific” language. Isn’t the “domain” mathematics and science itself? And does it not follow that the right language must be the language of mathematics and science? How can one rule out *anything* as being irrelevant to a “domain”? I think it is impossible, or at any rate inadvisable, to make such restrictions *a priori*. Indeed, full-spectrum functional languages are already the world’s best DSL’s, precisely because they are (or come closest to being) the very language of science, the ultimate “domain”.

Dear Robert

I am interested in the second half of your second paragraph, differentiating functional programming and logical programming. Could you point me to any work (book chapters, papers, etc.) that explains this difference at more length? Are they just different traditions, or do they relate in some way? My impression is that your PFPL, and Pierce’s TAPL focus on functional programming.

Thank you

Ivan

Well, they are pretty much totally unrelated. I’m not even sure there would be a reference to cite, but maybe another reader does.

How about something that does for logic programming what PFPL does for functional programming? Does “Practical Foundations for Programming Languages” not apply to Logic Programming Languages?

I chose to exclude them because they are of limited scope and utility. The pernicious notion of “programming paradigms”, which are ill-defined, encourages a belief in a false sense of parity among things that are not comparable.

I thought your definitions of functional programming and logical programming were quite good (hence my initial comment). However, this is clearly not going anywhere.

Sorry if I’m making matters worse!

I’ll just Google for “dynamics of provability”, “cut elimination” and “focusing”. :)

All good choices. Logic programming is covered in our Constructive Logic class at Carnegie Mellon, not in the PL class (because I don’t believe in paradigms.)

Useful thanks! (https://www.andrew.cmu.edu/course/15-317/) (agree about the paradigm thing)

But why keeping “variable” for functional programming? Even for algebra, this is quite misleading as it doesn’t communicate how it varies. It feels like all the trouble is coming from that unfortunate mathematical terminology.

I think that the concept of variable in functional programming is the mathematical concept of variable; in particular, the range of variation of a variable is a type, and indeed Russell defined a type to be exactly the range of significance of a variable. It seems to me that it is common in everyday informal mathematics to be a bit loose about these things, because it’s supposed to be clear to the reader what is really going on, but I know that in my own experience it can be trouble to figure out what is intended. This is especially true in elementary mathematics, such as calculus, where variables are used in ways that can be hard to figure out precisely without a lot of experience (or with sufficiently little that you don’t notice or blame yourself!) The only place I’ve seen these things sorted out clearly is in a differential geometry book, such as Spivak’s, that takes care at the outset to clarify some confusing conventional notations.

Auke: by efficiency, I don’t mean “this is faster than that” — after all, a functional program can *at most* be a log factor slower than an imperative one, because you can always resort to implementing a RAM with a balanced binary tree.

Instead, what I mean is that we, as humans, often want to reason about the resource usage of our programs — and reasoning about efficiency means we need logical principles to deduce how efficient (or not) a program might be. However, the reasoning principles we’ve developed for functional programs are (deliberately) insensitive to resource usage. For example, viewed as functions on lists of numbers, merge sort and bubble sort are equal, because they return the same answers on all inputs. But viewed in terms of efficiency, they are not equal — merge sort is O(n log n), and bubble sort is O(n^2).

This leads to the second point: big-O notation demonstrates one of the major problems of complexity, from the type-theoretic point of view. Type theory differs from the usual machine-oriented approach of algorithms and complexity because of variables and lambda-abstraction: it lets us work meaningfully with partial programs, and not just whole machines. But big-O notation gives the complexity of the output in terms of the complexity of the input, which is a purely first-order way of looking at the world. This leaves us without any guidance on how to talk about the complexity of higher-order functions — for example, what’s the complexity of fold on lists?

So what we want, ultimately, is for resource usage to be accounted for in a type-theoretic way. That is, we want a language with (a) primitives that give us control over resources like memory, (b) a type structure such that its beta- and eta-rules preserve complexity, and (c) a notion of complexity which respects substitution. I think linear logic will be an important ingredient in this, but fundamentally we don’t have the whole picture yet.

I have been thinking about this particular problem (how to properly manage resources in a functional language) as well, but from a practical programming standpoint rather than a type-theoretic one. (Although I am sure this can be given a proper type-theoretic formulation by someone better educated than me.) So here is my take:

In a Standard ML-like language, we ditch exceptions and reference cells (which are not really functional, anyway), and add three new concepts: regions, linear types and aliasing references.

Regions are a mechanism for talking in a static language about memory allocation that happens in a dynamic runtime process. In many ways, regions are similar to types: Every value-level term has a region (in addition to a type). Regions can be either inferred or explicitly annotated. Functions can be region-polymorphic. A compile-time region-checking phase ensures that no region can go out of scope before any value-level term that has that region. In some other ways, regions are similar to value-level terms: Regions denote objects that are constructed and destructed at runtime, namely, blocks of memory that can be malloc()ed and free()d.

Linear types are enforced by means of opaque signature ascription. That is, if a module exposes an abstract linear type T, then, outside of this module, instances of T cannot be duplicated or dropped at will. Effectively, from the point of view of the rest of the program, whenever T appears in a covariant or contravariant position, we have a constructor or a destructor for T, respectively. Now we can see why we do not need mutable reference cells: all we have to do is destroy an old object and construct a new one in its place.

Aliasing references are similar to Rust’s immutable borrowed pointers: they “freeze” the referenced object, so it cannot be destructed before the reference goes out of scope.

The ML Kit Compiler, which was built about twenty years ago and is still in use today, is based on regions of the kind you describe. Your suggestion about “linear types” is probably related to Rinus Plasmeijer’s “uniqueness types”, which were invented in the Clean language 25 years ago.

But, I have to ask: the premise of my paper with Blelloch at POPL 12 is that it’s a myth that manual memory allocation is required for performance. GC wins, even in terms of cache hierarchies, if done properly. I never understood the point of “regions”; they seem only to appeal to conventional wisdom, but have no real contribution to make afaict.

MLKit: Admittedly, I have never given it a try. (I have only used MLton and Poly/ML.) However, if the MLKit compiler accepts all legal Standard ML programs as they are, and then on its own tries to figure out using who-knows-what-heuristics which objects are more profitably stored in this or that region, I do not think it is a faithful implementation of my proposal. In my proposal, the region of a value is every single bit as deterministic as its type. You can get a compile-time region error if you attempt to conflate two demonstrably different regions, just like you get compile-time type errors if you attempt to conflate two demonstrably different types (by using a term of one type where a term of the other type is expected, which should only be possible when one type is *visibly* a synonym for the other).

Clean and uniqueness types: Yes, I am aware of those. But my proposal can do something that, to the best of my knowledge, Clean cannot do: Delimit the context in which a resource must be used linearly/uniquely. Or, to be more precise, delimit the context in which a resource can be duplicated or thrown at will; elsewhere it must be used linearly/uniquely. For example, we can have a module inside of which the type file_desc (riptor) is just a type synonym for int, but outside of which file_desc cannot be duplicated or thrown away at will. If Clean can already do this, well, my bad, I should have said “uniqueness types” instead.

GC and performance: I am not really convinced that GC always wins. It is true that naïvely using deterministic cleanup (C++’s “RAII”), storing every dynamically allocated object in its own little memory chunk (what C++’s standard std::allocator does, sadly), results in worse performance than using GC. But, in my experience, with custom allocators, in either C or C++, one can get significantly better performance than with GC. And, when I need that extra performance in a ML or Haskell program, I resent having to use a FFI to call C code.

I think it is a myth that manual allocation, or any form of regions, are ever necessary, or even particularly effective, compared to properly designed garbage collection. See my POPL 12 paper with Blelloch for a case-in-point in which we show that even memory hierarchy effects, such as cacheing, are neatly handled by GC without any special provisions in the language. Or, to put the point another way, Rust is very misguided and 25 years behind the times. I suppose that’s to be expected in PL’s, the present fad is always reliving the ancient past of research.

People typically complain about garbage collector’s behaviour wrt. delays and allocation failures, rather than its cache efficiency – although, because of prefetch and constant factors, linked lists are still less efficient than arrays.

The fact that ‘variable’ is an absolutely fundamental concept (and is tied to substitution) seems to have been grasped by very few people. The first place I saw this expounded forcefully was Paul Taylor’s excellent Practical Foundations of Mathematics. Much to my chagrin, when I asked a related question on MathOverflow (about variables, symbols, indeterminates and parameters), I got rather underwhelming answers. There is some terminology, which is to me even more fundamental than

declarativeorimperative, used all the time, but with no formal definition.Neel is on to something important. Look at it another way: if mathematics was indeed the best DSL, wouldn’t Maple and/or Mathematica simply be the best programming languages? They are not, not by a very very long shot [they make python look sane]. Their problem is that they focus solely on expressivity and efficiency, leaving correctness and even clear semantics aside. Coq, on the other hand, focuses so hard on correctness that efficiency is an amazing struggle to achieve, and still an active research endeavour.

What remains amusing to me is that a lot of the current formalizations of straight-up ‘computational’ mathematics in proof assistants corresponds to algorithms that were first implemented in Computer Algebra systems in the early 70s, minus all the work on Analysis (closed-form integration, closed-form summation, closed-form DEs, etc). That does not seem to have even been started. There are much more profound philosophical questions to be asked in that direction than just “what is a variable”!

Nice points all.

Aha! is an example of a general-purpose declarative programming language: https://code.google.com/p/aha-programming-language/. Indeed, its concept of variable is different to the imperative languages. You might find it interesting.

Efficiency.

Constructive mathematics makes computability an integral part of the process of proof, rather than something bolted onto the side. However, constructive methods do not help much in integrating resource usage into proof — the cost semantics of functional programs are not stable under transformations which preserve the value semantics of a program (eg, beta/eta expansions and reductions, not to mention parametricity). In slogan form, one might say that there’s more to complexity than computability.

It’s pretty clear that this should eventually be looked into. For example, when a combinatorist says “constructive”, he or she means “without using exhaustive search”, not “computable”, and formalizing that kind of idea is of obvious interest. For simple types, bounded linear logic seems like a good start, but the full-spectrum case seems like a very hard nut to crack: my attempts have quickly lead to questions like “what are dependent bunched types?”

Good point. But first one must make the argument that functional programs are not efficient. And to make that argument, one must first have a measure of efficiency for these languages. Blelloch and I keep chipping away at that. By and large imperative programs are NOT more efficient, unless you’re one of those lost souls using Haskell.

Maybe I should stress that, being been a charter member of the “extract programs from proofs” project, I am not advocating that approach, precisely because it does not give control over efficiency. But functional languages, which encompass imperative languages, provide all the efficiency you want, particularly in the parallel case. The idea that “imperative means more efficient” is largely (wholly?) unjustified, except of course if you’re stuck with laziness.

Wait, are you saying that the constraint that your program has to be verifiable correct is what makes imperative languages faster than declarative ones? Whether you write imperatively or declaratively, once you have established a proof of the correctness of a program (in the imperative case something you do outside your programming environment entirely, and in the declarative case using an automated process that is called “type checking”), the programs that are being /run/ are exactly the same. You are right in that constructive methods do not necessarily help in managing resources, but neither are they a nuisance.

Regarding your second point: the problem you are referring to can often be solved by generalizing the problem. If I want to prove that n+k=k+n, and then make the additional assumption that k=0, then obviously I can use different proof strategies than if I hadn’t. If you have a problem for which in a specific case (such as when working over the naturals) there is an exhaustive proof/solution/witness/algorithm, then generalize to more general numerals or ordered sets to rule out this specific solution.