Some thoughts on teaching FP

A number of people have asked some very good questions about the details of how we teach certain concepts in our new functional programming class for freshmen at Carnegie Mellon.  Rather than spray responses among the various comments, I’ll summarize a few major points here in hopes of helping others who may wish to teach a similar course.  So this post is not really meant for a broad audience, but rather for the specialist; feel free to skip it if it seems too focused for your interests.  I promise to write some more controversial material of more general interest soon!  Meanwhile, here are a few thoughts presented in no particular order of importance.

Moving to a more technical level, the use of structural operational semantics is indispensable for providing a rigorous foundation for understanding program execution, reasoning about program correctness, and for defining a cost model to support reasoning about asymptotic complexity.  There is no substitute for this!  Without a crisp formulation of the semantics of the language, it is impossible to discuss any of these issues in a meaningful and technically precise way.  With it you can readily resolve any questions about “what happens if …”, giving the students a tool that they can use themselves to answer such questions.  Moreover, as program verification becomes more important in industrial practice, as well as academic research, it is essential that students be educated in the tools of semantics.  Structural operational semantics is very easy to teach, and presents no problems for the students.  We just use it, and they get it without any fuss or bother.  It is a natural extension of their experience with high school algebra.  Be not afraid of using these tools to teach programming!

As I’ve explained previously, it is a very good idea to avoid Booleans as much as possible.  And, above all, don’t mention equality!  The equals sign in programming languages is not the equals sign of mathematics.  Propositions are not Booleans, and it only confuses matters to use notations that encourage this misconception.  Related to this, avoid if-then-else entirely, and instead use only case analysis for branching, even when the value to be discriminated is a Boolean.  We consistently write things like

```case Int.compare(x,y) of
LESS => ...
| GREATER => ...
| EQUAL => ...```

rather than a nested conditional branch.  It encourages students to think in terms of pattern matching, and prepares the ground for later developments, including a smooth transition to pattern matching over more complex data structures and reasoning inductively when programming recursively.

Teaching parallelism is completely straightforward, because the model of computation inherently avoids unnatural and unnecessary problems of interference, and focuses attention on the critical issue of data dependencies among computations in a program.  Students have no trouble computing the work (sequential time complexity) or span (parallel time complexity) of a program, and have no problems reading off recurrences for the respective time complexities.  Later, when we introduce sequences, the idea of computing in parallel with the entire sequence, rather than item-by-iterm (as encouraged by the dreadful iterators so beloved in the oo world), comes naturally and easily.  The key to this, of course, is that data structures in a functional language are naturally persistent; it is monstrously hard to use ephemeral data structures in a parallel computation, and is not something we could hope to teach freshmen.

A major decision for us is how to teach the expression and enforcement of abstraction in a program.  In a departure from our previous approach, we have decided against using opaque ascription (sealing) as a means of enforcing abstraction.  It has its virtues, but the problem is that it does not mesh well with other language features, in particular with substructures and type classes (views).  For example, consider the signature of a mapping whose domain is an ordered type of keys:

```signature MAPPING = sig
structure Key : ORDERED
type 'a mapping
val lookup : Key.t -> 'a mapping -> 'a
...
end```

Unfortunately, sealing a structure with this signature renders the module useless:

```structure IntMapping :> MAPPING = struct
structure Key = Int
type 'a mapping = 'a bst
...
end```

The trouble is that not only is the type ‘a IntMapping.mapping abstract, as intended, but so is IntMapping.Key.t, which is not at all intended!  To get around this we we must create a specialization of the signature MAPPING using one of several means such as

```signature INT_MAPPING = MAPPING where type Key.t=int
structure IntMapping :> INT_MAPPING = ...```

Of course one need not name the signature, but this illustrates the general problem.  As things get more complicated, you have more and more clauses that specify the types of things (sharing specifications).

The alternative, which has worked very well for us, is to eschew opaque ascription, and instead rely on the datatype mechanism to make types abstract.  So to give an implementation of the abstract type of mappings with keys being integers, we proceed as follows:

```structure IntMapping : MAPPING = struct
structure Key : ORDERED = Int
datatype 'a bst = Empty | Node of 'a bst * (Key.t * 'a) * 'a bst
type 'a mapping = 'a bst
val insert = ...
end```

The point is that since the constructors of the type ‘a bst are not exported in the interface, the type ‘a IntMapping.mapping is abstract.  Note as well that the use of transparent ascription on the structure Key ensures that keys really are integers (of type Int.int), and are not abstract, exactly as intended.  This formulation allows us to state simple rules of signature matching (every specification in the signature has a corresponding declaration in the structure), and allows us to enforce abstraction boundaries with a minimum of fuss.  The students have had absolutely no trouble with this at all, and we have had no trouble structuring our code this way.

When using functors (parameterized modules) to combine modules it is, of course, necessary to impose sharing constraints to ensure that only coherent compositions are possible.  (Rather than take the space to explain this here, I will rely on the reader’s experience to understand what is involved here.)  These sorts of sharing specifications are perfectly natural, easily explained, and have presented absolutely no difficulties for the students.  We illustrated their use in our game tree search example, in which the “referee” module is parameterized by the two “player” modules, which must of course cohere on their concept of a “game” (it’s no use pitting a chess player against a checkers player!).  The code looks like this

```functor Referee
(structure Player1 : PLAYER and Player2 : PLAYER
sharing type Player1.Game.t = Player2.Game.t) : REFEREE = ...```

The sharing specification states precisely and concisely the natural coherence constraint governing the two players.  Here again, the dog we feared never barked, and the students found it all quite intuitive and unproblematic.  This allowed them to expend their time on the actual complexities of the problem at hand, such as how to think about alpha-beta pruning in a parallel game-tree search, rather than get all tied up with the bureaucracy of structuring the code itself.

The virtue of teaching bright young students is that they are bright and they are young.  Their brilliance is, of course, a pleasure.  We have to work hard to come up with sufficiently challenging exercises, and many students challenge us with their solutions to our problems.  Their youth means that they come to us with a minimum of misconceptions and misinformation that they’ve picked up on the street, and are open to learning methods that are entirely non-standard (at least for now) with respect to what their friends are learning at other universities.  What makes Carnegie Mellon a special place is precisely that the students are pushed into thinking hard in ways that they might not be.  Personally, I hope that more universities worldwide build on what we have started, and give their students the same benefits that ours are already enjoying.

11 Responses to Some thoughts on teaching FP

1. dsannella says:

“In a departure from our previous approach, we have decided against using opaque ascription (sealing) as a means of enforcing abstraction. … The alternative, which has worked very well for us, is to eschew opaque ascription, and instead rely on the datatype mechanism to make types abstract.”

That works provided you always use datatypes, and always hide their constructors. So for instance

datatype age = mkage of int
fun birthday(mkage n) = mkage(n+1)

in a structure, with a signature including `type age` (not `eqtype`) and not including the constructor `mkage`. I’m not sure that’s really better than using opaque ascription – it seems just about as elegant as the relationship between Integer and int in Java. And you also eschew the explanation of functors as abstracted – in the sense of lambda abstraction, not data abstraction – versions of structures: the relationship between S and T in

structure S:>SIG = …
structure T = … S …

is like the relationship between S and T in

functor T(X:SIG) = … X …
… T(S) …

See my comment on “Modules Matter Most” – I still think there has to be a better way. And it seems to me that the way you are doing things – which you suggest is a better way in practice – amounts to explicitly hiding the identity of the types that you want to be abstract, rather than explicitly revealing the identity of the types (like `Key.t` in your `IntMapping` example) that you want to expose. Maybe what is missing is just a more elegant language construct to do that? Dare I suggest Java-style `private`?

• Abstract Type says:

You are free to use opaque ascription, which provides the kind of private declarations you seem to want, but for teaching beginners I find it more trouble than it is worth. We have had zero problems with this, either practically or conceptually; we had a lot of problems using the methods you favor (as did you, apparently). It just does not scale.

2. csfreshman says:

As a Freshmen in a different college I very much like the style of teaching you are using.

“Related to this, avoid if-then-else entirely, and instead use only case analysis for branching, even when the value to be discriminated is a Boolean.” is something I will be using in my own programs

I will be using this in my own programs for now on. I think it makes sense and makes it easier to reason about my code than if-then-elses

Students with prior “programming experience” are, if anything, at a disadvantage, because they think they know things that are either wrong or inapplicable.

You should have said: Students with poor prior “imperative programming experience” are, if anything, at a disadvantage, because they think they know things that are either wrong or inapplicable.

• Abstract Type says:

Point taken.

3. glorkspangle says:

You have “structure INT_MAPPING = MAPPING where …”. I think this is a typo for “signature …”.

4. I am yet to start studying, but have seen a lot of languages and styles already, and in my opinion, functional programming is vastly superior to other styles. I’m almost writing no functional code, but that is due to the fact that functional languages are not that well integrated into Linux systems in order to allow me to do what I want to do.

Here are my basic rules:

```* global state must be constant
* program by contract, verify invariants
* recursion is better than iteration
* correctness is more important than performance
* documentation is more important than features
```

The biggest advantage of a functional program is that the solution of the problem is basically the problem itself. For example, in Python, getting an array of the names of all persons in an array of persons in a functional form would be:

names = [person.name for person in persons]

This is much shorter and more understandable than the imperative version:

names = []
for person in persons:
names.append(person.name)

If person.name were knowingly pure, we could now easily run the first example in parallel. We can’t do this for the second version as we loose the notion of transforming one array to another one, but only have an iteration for which we do something in every step.

It’s like saying “start the laptop and login” vs. “open the laptop lid, press the round button, once the login screen is there enter your name in the first box and your password in the second box, then click login.” – in the first one you know what the goal is and could do it in multiple ways (you could enter the password before the username), in the second one you only know one way of doing it, preventing any further optimization by magic entities.

Functional programming is superior because we think about the problem, and not about a specific algorithm to solve it.

5. andrejbauer says:

How do you cope with the fact that in the presence of divergent programs beta-rule is an inequality rather than an equality? It seems to me that you need to make a clear distinction between an expression and its value very early on, so that you can explain substitution rules correctly. Do students really understand the difference between an expression and its value? The brightest philosophers didn’t until the 20th century.

• Abstract Type says:

By not teaching equational reasoning, but instead focusing on the operational semantics. So, for example, to prove something about a function of type int->int, we consider all possible numerals as inputs. There is no need to consider anything else.

6. One consequence is gender equity, because even with American society being what it is, the women have no particular disadvantage with respect to the men when it comes to our style of thinking and programming.

This rings true for me, certainly — my first CS course in college was a functional programming course, and as someone who’d never programmed before, I think I was not only on par with but actually at something of an advantage to my classmates who were longtime programmers, because I had no bad habits to unlearn. Also, it’s not just gender equity but socioeconomic equity — having had the chance to do programming before college is often (although counterexamples do exist!) a symptom of socioeconomic privilege.

• Abstract Type says:

What you said! Thanks!

• Chris Martens says:

Yeah, I wanted to comment correcting “gender equity” to “privilege equity”. Also, I did come in to college with a couple years of “bad” programming experience, but I definitely felt more *comfortable* in my FP class where the environment was about learning something together, not competing as to who already knows the most.

What’s confusing is that the FP community still seems to be less privilege-balanced as a field than other subfields of CS, and I personally knew a lot more women in undergrad who preferred C hacking to ML. I wish I knew why.

Follow