Wednesday, September 01, 2004

Equality and Equivalence.

My previous entry on the implementation of Object.equals() has attracted some comment, but the principle response was from Adrian Sutton, who objected to my example of a buggy equals implementation.

In one respect his criticism is valid; in general a subclass should delegate to its superclass for inheritited state equality. To do otherwise is to break encapsulation, and duplicate code. On the otherhand his complaint misses the point:

Class B is incorrectly assuming that it knows how to compare class A's data, and more importantly class B is changing the behavior of A in a way that is not compatible with the original model. Essentially, class A defines equality as having the same n1 values and class B is in the wrong for changing that. [emph added]
What Adrian misses is that if B is unable to meaningfully extend A's definition of equality then A should indicate that by declaring its implementation final, in the absence of a final declaration, B is entitled to extend it. I give an example of a meaningful use of final/instanceof in the Shape/Circle/Ellipse example at the end of the post, which I chose as it conveniently demonstrates all three cases associated with appropriate use of instanceof:
  1. Abstract class implementing equals using instanceof to allow proper delegation of equality in subclasses.
  2. Extensible concrete class, uses instanceof to allow subclassing; equals declared final to enforce contract
  3. Subclassing the concrete class, overriding accessors permitting inherited (unoverridable) equals method to operate correctly (besides meeting the superclass contract ;).
On the other hand Adrian is mistaken when he suggests that the use of getClass() in equals is a bug. There are two flaws with his reasoning. The first arises from a confusion between syntax and semantics. Adrian offers a code fragment, suggesting that it would need A to use instanceof for correctness:
class C extends A {
  public getValue() {
    return n1;
  }
}
The confusion is in thinking this defines a new class. This is a discussion of semantics, hence any class offered as a counter example must needs be a new class semantically, while the above is strictly syntax. The easiest way to approach the semantics of a class is as a function constructing independent, concurrent, finite automata. Now C doesn't define any new states, and doesn't define any new behaviour (state transitions), semantically then it doesn't define a new class --- mathmatically, C is still isomorphic up to congruence with A, so in a very real sense they are the same class. What C actually defines is a function. A new, if rather clumsy, projection on A returning n1. I had better make it clear that I'm not suggesting there is a non-clumsy way of doing this, rather that the reason why it is valid to want an C to be equivalent to a A is that C doesn't define a new class, just a new function. Hence it is a completely different case to B, which is a non-congruent class. If you have a concrete class that must support subclassing, instanceof is a bug.

Moreover this is exactly what we would expect when we consider what it actually means for two objects to be equivalent. Remember, an object is an FA, and as such there is a well established body of work on the meaning of equivalence, anyone interested should read up on simulation and bisimulation. Informally however the key is recognising that the bundling of state and behaviour provided by an object is not just something we get to ignore when it becomes inconvenient, at least not without losing correctness. That if we are serious about OO, we are not just answering the question "are we currently in the same state?", but "are we in the same state, and will we remain in the same state in response to the same stimula?". In other words, it doesn't matter that B.n1 == A.n1, if A and B have different behaviours they aren't the same object.

It becomes particularly important to recognise this when you face the prospect of also implementing Comparable. The problem here is that the contract for compareTo is that of a Total Order. The best you can manage on A U B is a Pre-order, and once again you are left with the prospect of subtle and obscure bugs. From the javadoc for Comparable:

This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method. ... For the mathematically inclined, the relation that defines the natural ordering on a given class C is:
       {(x, y) such that x.compareTo((Object)y) <= 0}.
The quotient for this total order is:
       {(x, y) such that x.compareTo((Object)y) == 0}.
It follows immediately from the contract for compareTo that the quotient is an equivalence relation on C, and that the natural ordering is a total order on C. When we say that a class's natural ordering is consistent with equals, we mean that the quotient for the natural ordering is the equivalence relation defined by the class's equals(Object) method:
     {(x, y) such that x.equals((Object)y)}.
Yet after all this, I do share Adrian's pain in this. The unfortunate fact is that if the author implemented A.equals properly, but forgot to include an accessor to A.n1, there is no clean way of gaining access to it. If you do find yourself in this fix, you are left with either direct bytecode hacking (the JVM doesn't know about the access permissions, so you can just access n1 directly), or package injection (if n1 doesn't have private scope). Alternatively, if the author has followed Adrian's advice, it becomes impossible to implement B correctly, and as Adrian's use of instanceof is wrong at a semantic level, no amount of bytecode hackery will save you.

So yet again we come back to my original point. Either you are implementing an abstract class; or you are writing a concrete class and can support either subclass equivalence, or polymorphic equality. "But I need both"!? Tough; Java doesn't support both, and a good engineer knows to work within the limitations of their tools. That a sizable segment of the java community continues to ignore the limitations of their language by presenting incorrect code, pretending that both are achievable, is a sad indictment on the state of that community.

No comments: