Saturday 22 November 2008

Item 8: Obey the general contract when overriding equals

Quite a big item this, with some surprising subtleties. Getting this wrong can cause some subtle and hard to track down bugs, so it's well worth looking at these points in some detail.

The author starts by describing when you need to provide an implementation for equals(), or rather when you don't: when instances of a class are inherently unique, such as active entities, or other cases when it's just not useful to compare objects. Or, when a superclass has an implementation that's sufficient. And finally, when you're sure no-one will ever compare objects of the class in question.

The author then lists the key criteria that any implementation of equals() has to comply with. It has to be: reflexive, symmetric, transitive, consistent and return false when given null.

Some of these are more problematic than others and can be easily be missed. For example, it might seem like a good idea to make it possible to compare your class with another with similar semantics, as in the CaseInsensitiveString example, so that you could do:
if (caseInsensitiveString.equals(normalString)) { ... }
The problem with this is that if you switched the order of the objects so that equals() was called on the normal string, then you'd get a different result - so you'd break the symmetry requirement.

Another example shows that the transitivity requirement can be violated by treating derived classes different from a superclass when comparing objects in equals(). This example also the highlights a thorny problem: how do you provide a sensible implementation of equals() in a class hierarchy where derived classes add members that are included in comparisons? The somewhat surprising conclusion is that you can't! Not without breaking the key requirements on equals() with the consequences demonstrated. The author also debunks one claimed solution to this: have the implementation of equals() check that the objects compared have the exact same class, as opposed to being of the same type. In other words, saying this:
@override public boolean equals(Object o) {
if (o==null || o.getClass() != getClass())
return false;
<...>
}
instead of:
@override public boolean equals(Object o) {
if (!(o instanceof ThisClass)
return false;
<...>
}
This is actually quite a controversial point that has generated many heated debates. See for example this article: http://www.javaworld.com/javaworld/jw-06-2004/jw-0614-equals.html?page=2, which claims the former is the right solution - with a typically inflammatory discussion following in the comments section below.

I'm not going to try to take sides here, I'll just say that I find Joshua Bloch argues convincingly for his case. The example he gives is a derived class (CounterPoint, extending Point) which adds a count of the number of objects created. One would clearly want this object to behave just like Point in equality comparisons as it adds no value members that ought to affect this - otherwise it breaks Liskov's substitution principle.

But that leaves the question of what to do when adding a value member to a derived class. Bloch suggests that this can be avoided by the use of composition rather than inheritance. So, don't make the class that needs to add the value member a subclass of the initial class, make it contain an instance of it, giving an example with a ColorPoint class that contains an instance of a Point. A classic case of un-asking the question, you could say.

(The asymmetry of the equals() operation in object oriented languages seems to me to be the root cause of this problem - could one imagine language mechanisms that dealt with comparison in a better way? Are there languages that handle comparisons in a different way? Anyway, that's a digression...)

The item wraps up giving a summary of what a good equals() implementation looks like. I won't repeat that here, I would suggest looking at an example instead. Strangely enough, this section doesn't contain a clear cut "this is a good equals() implementation" example, though the one in class Point on page 37 is one. I would suggest looking at exapmles such as found in the JDK itself. For example classes, java.awt.Point2D, java.awt.Eclipse2D (interestingly, the latter has the "compare-to-self optimisation", the former doesn't - I'm looking at JDK version 1.6). Or, look at java.lang.Integer.

One last point about implementing equals(): IDEs like Eclipse will happily auto-generate equals() for you - not necessarily in a nice way though. For a simple value class with three members, like this:
public class TestBean {
private String strVal;
private Integer intVal;
private Float floatVal;
}
I get this monstrosity using Eclipse 3.4 with default settings:
    @Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
TestBean other = (TestBean) obj;
if (floatVal == null) {
if (other.floatVal != null)
return false;
} else if (!floatVal.equals(other.floatVal))
return false;
if (intVal == null) {
if (other.intVal != null)
return false;
} else if (!intVal.equals(other.intVal))
return false;
if (strVal == null) {
if (other.strVal != null)
return false;
} else if (!strVal.equals(other.strVal))
return false;
return true;
}
Note it uses getClass() instead of instanceof - but it has an option to change this! :-)

This can be greatly simplified if we use a helper method that performs comparisons in a way that copes with nulls. The Google Collections library provides an implementation of this, but it's trivial to write your own if you wish, of course. Using this, and leaving out the possibly premature optimisation of checking against self, we can rewrite the above to say:
    @Override
public boolean equals(Object obj) {
if (!(obj instanceof TestBean))
return false;
TestBean other = (TestBean) obj;
return Objects.equal(strVal, other.strVal)
&& Objects.equal(intVal, other.intVal)
&& Objects.equal(floatVal, other.floatVal);
}
...which I think is nicer. Then make it more complex if your profiler ever tells you that this method is a performance bottleneck - though that has never happened to me yet.

Jan Stette

No comments: