Don’t Mention Equality!

One thing I’ve learned in teaching CS is that what I choose not to say is as important as what I choose to say about a particular topic.  It requires a lot of care in some cases to retrain myself to avoid a certain line of thought that experience shows leads only to mistakes and misconceptions by the students.  A classic example is the vexed question of equality, which creates no end of difficulties for students, and invites all sorts of misunderstandings.  My advice: don’t mention equality!

Let me explain.  First off, in rudimentary languages, such as C, there is neither the motivation nor the possibility to avoid equality.  All data is first-order, and all expression types admit equality in constant time.  Equality of integers presents no problems, but it is often neglected that there are (according to the IEEE standard) two different meanings of equality of floats, one of which is not reflexive, other other of which is not transitive; name your poison.  Pointers must be comparable not only for equality, but also comparable to “NULL”, whatever that may mean.  (This is Tony Hoare’s self-described “billion dollar mistake”, the great mother of all iatrogenic disorders in programming languages.)  Although it’s a bit of a mess, no one thinks that equality is an issue in such languages.  Relatively speaking, it’s not.

But in abstract languages, such as ML, equality presents more fundamental problems.  Because, contrary to what everyone naively thinks, not all types even admit an equality test.  One obstacle is decidability: if you have functions, say, in your language, you can’t expect to compare them for equality.  Another is feasibility: if you have aggregates, such as trees or streams, in your language, you wouldn’t want to compare them for equality, even if you could in principle.  And most importantly, it doesn’t make sense: an equality test violates representation independence for abstract types, so you have no business relying on it in the first place.  Two equal (in the abstract sense) sets may well have distinct representations, or they may not.  The point of abstraction is to ensure that no client code can rely on whether they do or not.

So, if you want to bring up equality, you are buying into a huge mess.  But there is an alternative: just say no to equality.  More accurately, don’t say anything at all, don’t even bring it up.  And avoiding it has another benefit, namely you can avoid Booleans (the step-mother of all iatrogenic disorders in programming languages) entirely, and you will be better off.  (I’ll leave that discussion for another post.  Now, to head off pointless argument, let me say that eventually you’ll want to introduce equality, because you do sometimes need it, but only at a much, much later stage, when students have gotten safely past the real obstacles to learning to write good, clean code.  It’s no use rolling logs at them as well, let them spend their time on serious stuff instead.

Many uses of equality are better replaced by pattern matching.  If you have an inductively defined type, such as

datatype nat = Z | S of nat

then any function whose domain is nat should be defined by cases, not by comparison:

fun plus x y = case x of Z => y | S(x’) => S(plus x’ y)

Many students will want to write this instead:

fun plus x y = if x=Z then y else S (plus (pred x) y)

where pred has been defined previously, even if you never bring up equality yourself!  (It’s amazing what kids “learn” on the street.)  This has to be stamped out!  Not only does it require an additional function to obtain the predecessor, but the poor student has deprived himself of one of the most valuable tools in his repertoire, the exhaustiveness check for pattern matching.  (More subtly, and worse, the definition of pred must itself check for Z, even though we “know” that x cannot be Z at the call site.  I’ll return to this when discussing the confusion between booleans and propositions.)

But what if you’re programming with machine integers?   For example, how do we define the factorial function on the type int in ML?  A basic approach is to use the concept of an ordered type that is provided in the Standard ML library.  It comes down to this:

fun fact x = case compare (x, 0) of LSS => raise Undefined | EQL => 1 | GTR => x * fact (x-1)

You still need a predecessor here, but we can get rid of that with a little more effort:

datatype TRI = Neg of int | Zero | Pos of int
fun tri x = case compare (x,0) of LSS => Neg (x+1)  | EQL => Zero | GTR => Pos (x-1)
fun fact x = case tri x of Neg _ => raise Undefined | Zero => 1 | Pos of x’ => x * fact x’

When considered in isolation, this looks a bit heavy, but in practice it’s all a matter of designing the right interfaces to begin with, and using them to your advantage.

Applying the same principles to the option type makes clear why “null pointer analysis” is such a stupid idea.  Suppose that the variable x is of type t option for some type t.  Rather than write

if x=NONE then raise NullPtrException else …

which does not propagate the non-NONE status of x into the else branch, instead write the clearer case analysis

case x of NONE => raise NullPtrException | SOME x’ => …

wherein the second clause x’ is of type t, not t option, and hence cannot under any circumstances be NONE!


5 Responses to Don’t Mention Equality!

  1. mbourdua says:

    I’m not too familiar with SML, but shouldn’t plus be defined as: fun plus x y = case x of Z => y | S(x’) => S (plus x’ y)?

    I guess I don’t really understand what this accomplishes. In the above examples, Pattern-matching seems like syntactic sugar for equality. To me, you are tiptoeing around this fact by refusing to name it as such.

    The 3rd example above – NULL case analysis – is the only one where I can clearly see that your alternative is superior. But even in this case, I’m not really sure why you focus on the removal of equality, rather than on the design of careful abstractions (e.g. Don’t propagate unnecessary information out of a function).

    Anyway, it’s a neat idea. I’d be interested in more and larger examples. Perhaps that would clear things up for me.

  2. johnwcowan says:

    I don’t have anything to say about pedagogical strategies, but some of the underpinnings of this post seem to me more than questionable. It seems that by equality you mean identity of behavior: two things are equal if they are behaviorally indiscernible. So unless your abstract sets are so abstract that it’s impossible to enumerate their members, there is an obvious criterion for abstract set equality: sets are equal iff their members are equal. More precisely, the cardinalities must be finite (allowing the axiom of choice to be applied uncontroversially) and each member of s1 must be a member of s2 and vice versa. Of course, equality of the members must be defined: for a set of functions, you won’t be able to determine equality; for a set of integers, you will. Trees can be handled in a comparable manner, giving an implementation isomorphic to Scheme’s “equal?” predicate.

    (What is the reflexive but intransitive definition of floating-point equality? As far as I know, there is only one kind of IEEE 754 equality, which is reflexive except at NaN.)

  3. […] we’ve created—tools that do things like recover from Boolean blindness or check for null pointer errors.  To be sure, these tools are totally amazing, but then so are hot metal typesetting […]

  4. […] good example is the equality test, e=e’, of type bool.  As previously discussed, this seemingly innocent function presents serious complications, particularly in abstract […]

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: