When it comes to controlling the complexity of developing and, more importantly, maintaining a large system, the only game in town is modularity. And as even the strongest proponents of unityped languages have come to accept, modularity is all about types (static types, of course, there being no other kind). A decomposition of a system into modules consists of an application of the structural principle of substitution (transitivity of entailment, composition of maps) that is fundamental to the very conception of a type system:
In pedestrian terms the type is the “header file” describing , and is the new code implementing specification in terms of an unspecified implementation of the functionality specified by . The interaction between and is mediated entirely by the type ; access to the source code of is denied to the client, precisely so that the dependence between the components is weakened in anticipation of future changes to , or to allow for flexibility in development ( need not even exist for the development of to proceed, as long as the type is specified).
To be most useful, it is important that the relationship between and be many-to-many, and not many-to-one or one-to-one. Think of as the API for a mapping between keys and values. It is absolutely essential that the language admit that many different modules be of type , and it is absolutely essential that a given module satisfy many distinct types , without prior arrangement. The type is purely descriptive of the extensional behavior of a module of that type, and cannot be regarded as pinning down any particular implementation of that behavior. Moreover, a given module may well support a variety of views, for example by “forgetting” certain aspects of it in particular contexts. For example, one may neglect that an implementation of mapping supports deletion in a setting where only extension and application are required.
This is all pretty basic, but what surprises me is how few languages support it cleanly and simply. One particularly awkward method is to rely on extra-linguistic “tools” that manipulate build parameters to switch choices of for a given , quickly resulting in an ad hoc language of its own just to manage the build scripts. Another technique is to manipulate or restrict inheritance so that some degree of modularity can be simulated, much as one can bang in nails using a pair of pliers. A common methodology, in fact, attempts to cut down inheritance to provide what we had in ML in the early 1980′s (functors), obviating the intervening decades of maladaptations of bad ideas.
More disappointingly, for me at least, is that even relatively enlightened languages, such as Haskell or F#, fail to support this basic machinery. In Haskell you have type classes, which are unaccountably popular (perhaps because it’s the first thing many people learn). There are two fundamental problems with type classes. The first is that they insist that a type can implement a type class in exactly one way. For example, according to the philosophy of type classes, the integers can be ordered in precisely one way (the usual ordering), but obviously there are many orderings (say, by divisibility) of interest. The second is that they confound two separate issues: specifying how a type implements a type class and specifying when such a specification should be used during type inference. As a consequence, using type classes is, in Greg Morrisett’s term, like steering the Queen Mary: you have to get this hulking mass pointed in the right direction so that the inference mechanism resolves things the way you want it to. In F# the designers started with the right thing (Caml) and eliminated the very thing that matters the most about ML, it’s module system! Instead the F# designers added a bunch of object-oriented concepts (for the sake of compatibility with .net and with the mindset of MS developers), and tricked up the language with features that are more readily, and flexibly, provided by the module system.
Why bring this up? Apart from general grousing, my point is that we had little choice in what language to use in teaching our students. Modularity matters most, and we must have a language that supports flexible modularity in the form I am describing here. When we examined our options, which we did very carefully, the only contenders are Standard ML and O’Caml. We could have gone with either, but were persuaded to use Standard ML, which has worked beautifully for our purposes. The decisive factor in choosing between the two ML’s was simply that we have a prior code base in Standard ML on which to draw, and there are two implementations of Standard ML that support parallelism (MLton and Poly/ML), albeit neither optimally for our purposes. Haskell provides better support for parallelism (by undoing its unfortunate commitment to laziness, which results in an unusable cost model for both time and, especially, space), but wasn’t suitable because of its lack of support for modularity.
As I have mentioned, our plan is to re-do the introductory curriculum in Computer Science to modernize it and to place better emphasis on principles rather than current technologies. One aspect of this is to re-do the standard Data Structures and Algorithms course to eliminate the over-emphasis on ephemeral data structures, and to treat parallel algorithms as the general case that encompasses the increasingly irrelevant case of one processor. (Yes, this is a technological trend, but it is more importantly a conceptual change that emerges from focusing on language, rather than machine, models of computation.) What is a data structure? It’s a signature, or interface, written in the language you’re programming in. What’s an algorithm? It’s a structure, or implementation, of that signature. A signature, such as that for a mapping, can be implemented in various ways, with different cost trade-offs (logarithmic lookup vs. constant lookup, for example). A given algorithm, such as a balanced tree, can implement many different data structures, such as mappings or sets. The distinction between a peristent and an ephemeral mapping is a difference of data structure, that is, of signature. The demands are different, the algorithms are different. We should be able to support both forms as easily and cleanly as the other, to be able to compare them, and to explain, for example, why the ephemeral case is of limited utility. It is not too much to ask to be able to write these examples as running code, with a minimum of fuss or bother!
We have for decades struggled with using object-oriented languages, such as Java or C++, to explain these simple ideas, and have consistently failed. And I can tell those of you who are not plugged into academics at the moment, many of my colleagues world-wide are in the same situation, and are desperate to find a way out. The awkward methodology, the “design patterns”, the “style guidelines”, all get in the way of teaching the principles. And even setting that aside, you’re still doing imperative programming on ephemeral data structures. It just does not work, because it is fundamentally the wrong thing. Just try to teach, say, binary search tree delete; it’s a horrific mess! You wind up with absurd “null pointer” nonsense, and a complex mess caused by the methodology, not the problem itself. Pretty soon you have to resort to “frameworks” and “tools” just to give the students a fighting chance to get anything done at all, distancing them from the essential ideas and giving the impression that programming is painful and ugly, an enormous tragedy.
Our solution is to abandon it entirely, pushing OO techniques to a more advanced level for students who wish to learn them, and concentrating on the basic principles of modularity, parallel and sequential cost analysis, and direct verification of running code at the introductory level. Dijkstra used to say “beauty is our business”, to which I would add that life is too short, and bright minds too precious, to waste on ugly things. And if you take this point of view seriously, the best and the brightest are drawn to, rather than repulsed by, the field. Pace some of my colleagues at a Major Institute of Technology, students absolutely do love to program and do love the beauty of code, provided that the code you ask them to write is, in fact, beautiful. There is nothing more dreary than the corporate bureaucracy of OOP, and few things more lovely than the mathematical elegance of FP.