ACCU Mentored Developers Effective Java Project: November 2008

Sunday 30 November 2008

Item 11 - Override Clone Judiciously

When I started reading this item I couldn't quite believe it. So I checked the Java documentation and sure enough Object does have a protected clone method and the Cloneable interface does not have any methods. Then I started wondering how adding an interface with no methods could stop a super class protected method from throwing. So I tried it:

public class AnObject
{
 public AnObject clone()
 {
  AnObject obj = null;
  
  try 
  {
   obj = (AnObject)super.clone();
  } 
  catch (CloneNotSupportedException e) 
  {
   e.printStackTrace();
  }
  
  return obj;
 }
 
 
 public static void main(String[] args) 
 {
  new AnObject().clone();
 }
}

Sure enough, it threw CloneNotSupportedException. So I added the Cloneable interface:

public class AnObject implements Cloneable
{
 ... 
 
 public static void main(String[] args) 
 {
  new AnObject().clone();
 }
}

and the exception went away. I was puzzled until I remembered Java's reflection mechanism. Object's protected clone method, when called, must check to make sure that subclasses implement the Cloneable interface and throw the exception if they don't. Mystery over. However, it's a shame that the item didn't explain that explicitly.

The item does point out that although Object's clone method returns an Object, since Java 1.5 covariant return types allow an overriding method to return a different type to the method it is overriding. Based on this I wonder if the Cloneable interface ought to look more like this:

public interface Cloneable 
{
 T clone();
}

The the subclass would look like this:

public class AnObject implements Cloneable
{
 @Override
 public AnObject clone()
 {
  AnObject obj = null;
  
  try 
  {
   obj = (AnObject)super.clone();
  } 
  catch (CloneNotSupportedException e) 
  {
   e.printStackTrace();
  }
  
  return obj;
 }
}

This of course wont work with the current implementation as Object's clone method still looks for the original Cloneable interface.

The things to remember are:

If you override the clone method in a nonfinal class, you should return the object obtained by invoking super.clone.
In practice, a class that implements Cloneable is expected to provide a properly functioning clone method

The item then goes on to discuss the need for implementing deep copying when writing a clone method for a class that refers to other objects. Cloning the Stack example shown in the item, without performing a deep copy, results in two Stack objects both referring to the same elements. This can be overcome by making the clone method clone the elements as well as the Stack object.

Finally the item explains that it is better to provide an alternative means of object copying, or simply not providing the capability. The alternatives it recommends are a copy constructor:

public AnObject(AnObject obj);

or a copy factory:

public static AnObject newInstance(AnObject obj);

I think the item presents a reasonable case for generally avoiding the Cloneable interface.

Paul Grenyer

Thursday 27 November 2008

Item 10: Always Override toString

So, on to item 10, "Always override toString()". Quite a short and simple item this one, the author's arguments for this is that it is convenient for describing objects in print statements and similar, and for displaying objects in debuggers. Another case not explicitly mentioned is in logging statements. I agree that this is good advice, even though I must admit to not doing as often as I probably should myself.

The default implementation in Object is usually not very helpful - though it does show which instance of an object that is referenced, something other examples usually don't. I guess there may be cases when debugging logs for example when this is useful. Anyway, the author goes on to describe what makes a good implementation of toString(): ideally describing as much as possible of the object, though for some objects a descriptive summary will have to do.

Finally, there's a discussion about whether the details of the format returned by toString() should be specified, so in effect becoming part of the public API of a class. Examples where this is done include the boxed primitive types. Another example is java.util.Date. I can't help thinking that this is the exception rather than the rule, and really just applies to very simple value objects. And I can't see myself adopting the author's suggested JavaDoc format for documenting toString(), that seems over the top for most cases.

Actually, this last point reminds me of an impression I've had with a few of the items in this book: they seem to be advice that makes sense for someone implementing widely used library code (as Mr. Bloch has done of course, his name is all over the base Java libraries), but makes less sense for application developers for example. Is that a fair comment to make?

Jan Stette

Saturday 22 November 2008

Item 9: Always override hashCode when overriding equals

This one is pretty straightforward - none of the joys that doing a proper equals function gives you, since you're only concerned with yourself.

So, why do it? The short answer is "Because you're told to".

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#hashCode()

Second bullet point says two objects which are equal according to equals(Object) must have the same hash code. The default Object.hashCode() will nearly always give different answers for different objects, despite what equals() may say, so it's unsuitable.

He then goes into a little more detail on why this is. Here's my way of thinking about it. Consider a simple value object - IIRC his is a phone number. Let's make a phone book - a Hashtable, key this phone number object, value being the name associated with this phone number. (ok, it's a backwards phone book).

This phone book has an entry for 555-1234 - for Fred, who lives on USian TV. If I want to find out who's associated with 555-1234, I need to create a phone number entry and ask the hash table for the value associated with this. But if I've not overridden hashCode, there's no reason the hash table will be able to find it - it'll look in the hash bucket for this test object (here I'm assuming everybody understands how hash tables work - is this the case?), but the one for Fred could well be in a different hash bucket. Oops, it's lost.

So, we now know we need to write one. Great - but how to do it? He then describes a simple algorithm for doing so. Lots of multiplying by 17 or 37 (prime numbers) - gives a good spread.

The choice of which fields to consider when writing hashCode is important. The main rule is you mustn't let anything which isn't used in equals affect hashCode - otherwise you can end up with different hashCodes for equal objects.

You can be slack about what you use - you don't need to use all the fields. But you want to be sure your choice gives you a good range - otherwise you'll end up with all your values in not many hash buckets, which means you're effectively doing a very stupid scan on the bucket to find stuff - full table scan in database-land is the equivalent.

Hash calculations obviously want to be reasonably fast. He mentions caching them, and also using lazy evaluation in an example - but notes this is only really relevant for big complex stuff. (The cached value mustn't be used in the hash calculation :-) )

In the 1st ed, he goes on about how the built-in ones (eg for String) have had a bit of a chequered past - ie they didn't necessarily work terribly well. Not sure if he's kept that in the 2nd ed - I'm hoping these days they all just work.

There is a temptation to say "well, I'm not going to use this class in a hashed collection". If so, returning the same value for all of them would work - but you'd need to be really confident nobody was going to change their mind later, and writing the function properly is sufficiently easy that you may as well just do it and be safe.

I think I've had about one value class with equals() and hashCode(). I had strings as the primary keys, so just using the hashcode from them was sufficient for me.

Clive George

Item 8: Obey the general contract when overriding equals

Quite a big item this, with some surprising subtleties. Getting this wrong can cause some subtle and hard to track down bugs, so it's well worth looking at these points in some detail.

The author starts by describing when you need to provide an implementation for equals(), or rather when you don't: when instances of a class are inherently unique, such as active entities, or other cases when it's just not useful to compare objects. Or, when a superclass has an implementation that's sufficient. And finally, when you're sure no-one will ever compare objects of the class in question.

The author then lists the key criteria that any implementation of equals() has to comply with. It has to be: reflexive, symmetric, transitive, consistent and return false when given null.

Some of these are more problematic than others and can be easily be missed. For example, it might seem like a good idea to make it possible to compare your class with another with similar semantics, as in the CaseInsensitiveString example, so that you could do:

if (caseInsensitiveString.equals(normalString)) { ... }

The problem with this is that if you switched the order of the objects so that equals() was called on the normal string, then you'd get a different result - so you'd break the symmetry requirement.

Another example shows that the transitivity requirement can be violated by treating derived classes different from a superclass when comparing objects in equals(). This example also the highlights a thorny problem: how do you provide a sensible implementation of equals() in a class hierarchy where derived classes add members that are included in comparisons? The somewhat surprising conclusion is that you can't! Not without breaking the key requirements on equals() with the consequences demonstrated. The author also debunks one claimed solution to this: have the implementation of equals() check that the objects compared have the exact same class, as opposed to being of the same type. In other words, saying this:

@override public boolean equals(Object o) {
    if (o==null || o.getClass() != getClass())
        return false;
    <...>
}

instead of:

@override public boolean equals(Object o) {
    if (!(o instanceof ThisClass)
        return false;
    <...>
}

This is actually quite a controversial point that has generated many heated debates. See for example this article: http://www.javaworld.com/javaworld/jw-06-2004/jw-0614-equals.html?page=2, which claims the former is the right solution - with a typically inflammatory discussion following in the comments section below.

I'm not going to try to take sides here, I'll just say that I find Joshua Bloch argues convincingly for his case. The example he gives is a derived class (CounterPoint, extending Point) which adds a count of the number of objects created. One would clearly want this object to behave just like Point in equality comparisons as it adds no value members that ought to affect this - otherwise it breaks Liskov's substitution principle.

But that leaves the question of what to do when adding a value member to a derived class. Bloch suggests that this can be avoided by the use of composition rather than inheritance. So, don't make the class that needs to add the value member a subclass of the initial class, make it contain an instance of it, giving an example with a ColorPoint class that contains an instance of a Point. A classic case of un-asking the question, you could say.

(The asymmetry of the equals() operation in object oriented languages seems to me to be the root cause of this problem - could one imagine language mechanisms that dealt with comparison in a better way? Are there languages that handle comparisons in a different way? Anyway, that's a digression...)

The item wraps up giving a summary of what a good equals() implementation looks like. I won't repeat that here, I would suggest looking at an example instead. Strangely enough, this section doesn't contain a clear cut "this is a good equals() implementation" example, though the one in class Point on page 37 is one. I would suggest looking at exapmles such as found in the JDK itself. For example classes, java.awt.Point2D, java.awt.Eclipse2D (interestingly, the latter has the "compare-to-self optimisation", the former doesn't - I'm looking at JDK version 1.6). Or, look at java.lang.Integer.

One last point about implementing equals(): IDEs like Eclipse will happily auto-generate equals() for you - not necessarily in a nice way though. For a simple value class with three members, like this:

public class TestBean {
    private String strVal;
    private Integer intVal;
    private Float floatVal;
}

I get this monstrosity using Eclipse 3.4 with default settings:

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        TestBean other = (TestBean) obj;
        if (floatVal == null) {
            if (other.floatVal != null)
                return false;
        } else if (!floatVal.equals(other.floatVal))
            return false;
        if (intVal == null) {
            if (other.intVal != null)
                return false;
        } else if (!intVal.equals(other.intVal))
            return false;
        if (strVal == null) {
            if (other.strVal != null)
                return false;
        } else if (!strVal.equals(other.strVal))
            return false;
        return true;
    }

Note it uses getClass() instead of instanceof - but it has an option to change this! :-)

This can be greatly simplified if we use a helper method that performs comparisons in a way that copes with nulls. The Google Collections library provides an implementation of this, but it's trivial to write your own if you wish, of course. Using this, and leaving out the possibly premature optimisation of checking against self, we can rewrite the above to say:

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof TestBean))
            return false;
        TestBean other = (TestBean) obj;
        return Objects.equal(strVal, other.strVal)
            && Objects.equal(intVal, other.intVal)
            && Objects.equal(floatVal, other.floatVal);
    }

...which I think is nicer. Then make it more complex if your profiler ever tells you that this method is a performance bottleneck - though that has never happened to me yet.

Jan Stette

Item 7: - Avoid Finalizers

This item could have been written to put me straight. I'm the C++ programmer it speaks about and I was desperate for finalizers to be Java's destructor. They aren't. In fact, unlike C++'s destructors they are unpredictable, often dangerous and generally unnecessary. Ok, so if you're not careful with exceptions destructors in the C++ can be dangerous too. Java finalizers are worse.

The item explains that you should never do anything time critical in finalizers as the JVM is "tardy" at running them. I did some experiments closing database connections in finalizers and could generate any evidence that they were called at all. If and when finalizers are called is JVM implementation specific. So cross platform programming using finalizers is unpredictable at best, disastrous at worst.

The item also explains that uncaught exceptions thrown in finalizers are ignored, but the finalization of the object is terminates leaving it in an unknown and potentially corrupt state, which can result in arbitrary nondeterministic behavior. Using finalizers also increases the time to terminate an abject a whopping 430 times according to an (unexplained) test example run by the author.

The item describes the alternative to using finalizers as providing an explicit termination method that can, typically, be called from a finally block. This is what I do in a lot of cases and what the Java database objects provide.

Finalizers could be used as a safety net for forgotten terminate method invocations on the basis that it's better to release resource late than never, but of course there's no guarantee it'll get released at all. Finalisers can also be used for "native peers" as long as they do not hold resources that need to be released.

The item finishes by explaining that finalizers are not chained in class hierarchies and therefore subclasses must call super class finalilizers explicitly and the guardian finalizer idiom.

I don't think there's much to argue with here. I also feel that Java would benefit from some sort cleanup mechanism, such as IDispose in C# or even C++ destructors.

Paul Grenyer

Item 6: Eliminate obsolete object references

Item 6 discusses the need to think about memory management even in a garbage-collected language. In the example of the Stack, a "memory leak" exists due to the handling of the internal array. The array keeps track of the number of active elements in the array and increases or decreases when the user pushes a new element or pops it out. But when popping out, the referenced element still stays in the array. So the garbage collector does not know to reclaim the elements. The point is that as a programmer, you can't just stop thinking about memory usage even in a garbage collected system.

It is interesting that the book calls this a memory leak. My experience with C++ is when something overwrites memory used by another object creating corruption, memory leaking from one part of the code to another. The book's example seems to be inefficient use of memory, not clearing it when it is no longer needed. I like the other term the book uses "unintentional object retentions" better.

In the example, the problem was that the stack was managing its own memory, containing a storage pool of the elements, Some could be eventually obsolete. So in these cases, the programmer should be alert for memory leaks. The general solution is to null out the object references once they are no longer needed. But nulling out object references should be rare, normally taken care of by the garbage collector.

The book continues with other common sources of memory leaks like caches, listeners and callbacks. I am not that familiar with caches. As the book describes them, they seem to be storage for quick retrieval. So an object can be placed in a cache and forgotten. A solution is to store only weak references to them. Listeners and callbacks can have a similar problem, you register them but don't deregister them.

I don't have any specific personal comments since I have not used Java, but trying to learn on my free time.

Tim Wright

Item 5: Avoid creating unnecessary objects

So this item could also be called "reuse objects as much as possible". ;-)
Joshua gives several advices how to avoid creating unnecessary objects.

1. read and know the language (spec). (or reuse immutable objects)

I also see this statement

String s = new String("i do not know the language really well"); // to not do this

too often. As Strings are immutable object in the Java Language the statment above is nonsense and creates unnecessary objects. (even worse if it is used in a loop). So: read and know the language and know about immutable and hence reusable objects

2. classes should offer a "static factory method" (seem Item 1)

As a hint in their API to reuse immutable object, their should use Item 1 e.g. a classical example is

Boolean.valueOf(String) // cheap: reuse object

instead of

Boolean(String)  // expensive: create new object

Also note that the javadoc of the latter version mentions, that using this ctor is expensive and the "static factory method" valueOf is preferred (and using this in eclipse you get the corresponding help immediately)

3. reuse mutable objects if you know they won't be modified

Now I do not agree 100% with the _phrasing_ of this statement, because if they are really immutable, why not make them final and immutable directly ? (most of the time this works) Also how do you know that they will not be modified. What about the user and maintainer of your code ?

However the example with Date and Calendar clarifies the intend and the solution: Do not compute the values again each time you use them in a function if they do not change. Instead make them "static final" and put the computation of them into a "static { }" Block. Now instead of n computations you have now one at class initialization time. If the method is never invoked you have a little overhead, initalizing the "static final" members, but that could be eliminated with Item 71 "lazy initializing fields". However Item 71 should really only be used for very large and expensive objects as it has other drawbacks and only pays off in a few cases.

4. be aware that autoboxing (new since Java 1.5) can create unnecessary objects

e.g. a small typo (using Long instead of long) now can trigger autoboxing and unboxing. So: prefer primitives to boxed primitives (the pre 1.5 rule was also prefer primitive to their Object counterparts) and: watch out for unintentional autoboxing (e.g. the classical typoe Long instead of long).

FYI: is think this is a good thing to know about: there is currently no findbugs rule for this but you can turn on a warning in eclipse "Boxing and unboxing conversions" ;-) There are other usefull warning in that eclipse java compiler setting for coding errors...

General Points to be aware of:

There is always a trade of between clarity, simplicity or power of a programm and efficiency or performance optimization. It is not to say that efficient programs could be not simplistic (no no ;-).

Also modern JVM implementaions do their best to optimize object creation and reclamation.However you should watch out for unnecesaary objects.

I use also often a profiler to find unnecessary objects. This is not an adviced from the book, but in my experience I found a lot of Items 5 in current code bases but using a profiler and tracking down those places.

Now a more general question is: what IS an "unnecessary" object, (if you want to start a discussion ;-) because there are counter-examples where we created additional objects (maybe they are unnecessary ?) intentionally.

Examples are:

pooled objects in EJB container or any other container infrastructre where you do not manage the object lifecycle on your own
datastructures where you have some additional reserve of objects to perform efficiently in the long term e.g. in HashMap(int initialCapacity, float loadFactor) you can have a loadFactor which the corresponding javadoc talks about.
if you look at optimizing your disk e.g. under windows you also need additional ressources (typically 1/3 of the disk) to run the diskoptimizer in forseeable time (but that is another story).

On the other hand there is the corresponding Item 39 "Make defensive copies when needed" which is linked to this Item 5. As there is e.g. no const ref passing like in C++ you sometimes have to make defensive copies when passing parameters to functions in java. So from a security and bug finding point it is more advisable to follow Item 39 and create a few more objects. ;-). However it is important to keep an eye on Item 5 and avoid the (usually) trivial cases where really unnecessary objects are created and thrown away immediately.

tem 4: Enforce non-instantiability with a private constructor

Once in a while, a Java programmer will want to create a utility class that has only static members. For example, java.lang.Math is one such class. In C++, we could simply write a set a free functions inside a namespace. Java, on the other hand, mandates that everything must be a class member, and this gives rise to this kind of class, where everything inside it is static, which seems like a bit of a hack to me.

As it wouldn't make sense to instantiate such a class, it's good practice to make the class uninstantiable. Making it abstract doesn't work: it can be subclassed, and the subclass can be instantiated.

The solution is to define a private constructor. This prevents the compiler from providing a public default constructor, and also prevents the class from being subclassed. This is, in fact, the same idiom that is often used in C++ to control instances (as in Item 1). However, there is a difference. In C++, it is usual not to provide a body for the private constructor. In Java, every method must have a body - even the private constructor, which is intended never to be called! To prohibit accidental calling of the private constructor from another method in this class, it is common to make it throw an exception:

public class UtilityClass {
  // Suppress default constructor for noninstantiability
  private UtilityClass() {
     throw new AssertionError();
  }
  ...
}

So, whereas in C++ an attempt to instantiate a non-instantiable class can be caught at link time, in Java it will only be caught during testing or, heavens forbidding, in production.

This item has left me wondering what the rationale was for disallowing free functions, which would have eliminated the need for such utility classes. Is the "everything is a class member" restriction just part of the language, or is it part of the JVM? If it's the latter, I wonder how languages like Groovy implement free functions.

Klitos Kyriacou

tem 3: Enforce the singleton property with a private constructor or an enum type

Ah, the Singleton pattern. Leaving aside any reservations we may have over the wisdom of using singletons - and the item does flag up the complication to testing that the pattern introduces - the various common methods of implementing singleton classes are discussed, concluding with a recommendation to use a Java 1.5 feature; an enum type.

Before release 1.5, the two options available both involved restricting the instantiatability of the singleton class through declaration of a private constructor, then providing access to the sole instance by slightly different approaches.

The first approach is to provide a public static final INSTANCE field. This is a very simple solution, but is inflexible: the INSTANCE field is in the public interface of the class, setting it's singleton status in stone. This is presented in a positive light, as it makes it clear that the class is a singleton. I remain unconvinced: this exposure of implementation detail in the interface is one of the flaws of the pattern.

This is alleviated by the second approach: make the INSTANCE field private, and add a public static factory method. Item 1 already mentioned the use of static factory methods in managing instance-controlled lasses: this is an example. Because the instance is now returned from a method, the creation policy can be changed without affecting the public interface.

Both these solutions have an issue with serializability. Deserialization usually instantiates a new object, which rather breaks the singleton model. For a singleton to be serializable, it needs to provide a readResolve method, which can return the sole instance instead of the newly instantiated one. This technique is the subject of its own item (way down at #77), but in summary it is a bit messy, and just one more thing you need to worry about. Is there a simpler solution?

As of release 1.5, the answer is yes. One of the new features of this release was enum types. An enum type is a type whose fields consist of a fixed set of constants. But enum types are also fully-fledged classes, which are instance-controlled.

Item 30 investigates the features of enum types in depth. For now, the key point is their conceptual similarity to singletons, meaning that support for singletons is now more or less built in to the language. The third, and preferred, implementation is as follows:

public enum Elvis {
       INSTANCE;

       public void leaveTheBuilding() { ... }
   }

This is an admirably simple solution. I do feel uncomfortable however that Elvis is once again committed to being a singleton for life. And what would Priscilla have to say about that?

Ewan Milne

Item 2: Consider a builder when faced with many constructor parameter

tem 2 describes the Builder Pattern and when it should be used.

The item gives an example of a class that has a number of optional properties that should be initialised via a constructor and explains how they can be initialized using the telescoping constructor pattern or the JavaBean pattern. It concludes that "the telescoping constructor pattern works, but it is hard to write client code where there are many parameters, and harder still to read it" and that "the JavaBean pattern precludes the possibility of making a class immutable." His argument in both cases is convincing.

As a solution, the item suggests a variation on the builder pattern. Basically, the class with the optional properties has an inner class that can be used to intialise the properties on construction. The resulting initialisation syntax looks like this:

NutritionFacts cocaCola = new NutritionFacts.Builder(240,8).calories(100).sodium(35).carbohydrate(27).build();

The item points out that "the builder pattern stimulates named optional parameters" and that the pattern does not really become useful until you have at least four optional properties and if needed should be used as soon as possible as refactoring to the pattern can be problematic.

The item summarises the build pattern as "a good choice when designing classes whose constructors or static factories would have more than a handful of parameters."

I wish this had occurred to me before. I could have used it in so many places. Again, it's not Java specific.

Item 1: Consider static factory methods instead of constructors

This item explains how to deal with situations where instances of a class have several similar ways of being constructed. Instead of adding several constructors to a class which might be difficult or error prone to use, static factory methods are written with names that clarify their purpose. This allows the class designer to disambiguate similar (or indeed identical) constructor signatures so that the class user is less likely to use the wrong function to create an instance of the class.

The item discusses several advantages of static factory methods:

their self documenting nature, and their extra flexibility when construction parameter sets are ambiguous
their use with object caches like the GoF Flyweight pattern and other "instance-controlled" classes (a term which is new to me although the author impies that it is idiomatic)
their use with derived class hierarchies particularly for implementation hiding where the concrete instance returned by the factory method is of a non-public type, this allows implementation variation depending on construction arguments or other mechanisims such as being able to return types which have been registered at runtime in a so called "service provider framework"
they may be used to simplify parameterized type construction by eliminating the duplication of type parameters by the user by letting the compiler deduce the types;

and two disadvantages:

if all construction is done via static factories and all constructors are non-public then users cannot derive from the classes in their packages - although this would be an advantage if it was desired by the class designer
without careful documentation or standardized naming, it isn't obvious to users that the factory methods are what they are and should be used to instantiate objects of the class

As with the GoF book which also classifies items (in their case more formally as "Design Patterns") in a catalogue, it should be noted that choosing item 1 verses item 2 requires an analysis of the number of construction parameters and whether any of them are optional. In particular static factories may not be suitable with optional arguments.

I disagree a little with the implied similarity to items in this book and design patterns, the first two items (which is as far as I have read) are not design patterns just examples of idiomatic Java usage. Design patterns deal with relationships between classes in a mostly language independant way, whereas this book and similar ones for C++ rarely give any advice on how classes should collaborate with each other. Nevertheless the information is valuable and I am only querying the references to the GoF book that imply these items are design patterns.

Bill Somerville

ACCU Mentored Developers Effective Java Project