Why do you need to override equals and hashCode methods in Java?6 min read

If you have used some data structures such as a hash map or a hash set for storing your custom object, you probably have to write your own implementation of the hashCode and equals methods to make your data structure behaves the way you want, but did you wonder why you have to do so? And why are those 2 things correlated? What if you override one of them and ignore the other? In this article, we are going to unravel why and when we need to override equals and hashCode methods, and what we can achieve from doing this.

Before diving into understanding and using equals and hashCode, it’s worth mentioning why we need to override these methods on a Map or a Set in the first place. Typically, when thinking about a Set or a Map, we usually relate them as unordered data structures, and as contrary to the list, duplicate elements shouldn’t be allowed. The idea of overwriting hashCode and equals methods ensure that your data structure only stores distinct elements and, more importantly, get efficient and proper access when writing, reading, and deleting elements.

Let’s suppose we have a class A, and what if we want to put some objects of this class into a Set without implementing hashCode and equals methods for this class A? Let’s have an example to see what will happen, here we have a simple Candy class:

class Candy {
    private String name;
    private String flavor;

    public Candy(String name, String flavor) {
        this.name = name;
        this.flavor = flavor;
    }
    // getters, setters and toString method.
}

Then I create an instance of a Set that will hold some candies on it, and I deliberately put some identical candies there:

Set<Candy> candies = new HashSet<>();
candies.add(new Candy("Oreo", "Strawberry"));
candies.add(new Candy("Kit Kat", "Chocolate"));
candies.add(new Candy("Oreo", "Strawberry"));

When we iterate through the set and print each of its elements, here is what we get:

Candy{name='Kit Kat', flavor='Chocolate'}
Candy{name='Oreo', flavor='Strawberry'}
Candy{name='Oreo', flavor='Strawberry'}

Clearly, there are 2 identical candies on the Set; this is obviously not what we expected. If you don’t override equals and hashCode methods when deciding whether or not to put an element to a set, the set itself will look at the address of the element, even 2 elements have the same data, but their addresses are district. To overcome this, we need to override the hashCode and equals methods on the Candy class:

@Override
public boolean equals(Object o) {
     if (this == o) {
        return true;
     }
     if (!(o instanceof Candy)) {
        return false;
     }
     Candy candy = (Candy) o;
     return Objects.equals(getName(), candy.getName()) &&
                Objects.equals(getFlavor(), candy.getFlavor());
}

@Override
public int hashCode() {
     return Objects.hash(getName(), getFlavor());
}

If you’re using some modern IDE such as IntelliJ or Eclipse, this code fragment can be automatically generated for you. Let’s print our set again:

Candy{name='Oreo', flavor='Strawberry'}
Candy{name='Kit Kat', flavor='Milk'}

The set is now containing unique elements, apparently hashCode and equals method do the job, here is what Joshua Bloch says on Effective Java:

You must override hashCode() in every class that overrides equals(). Failure to do so will result in a violation of the general contract for Object.hashCode(), which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.

How to override hashCode

There are different ways for overriding hashCode; one of them is to do something like this:

@Override public int hashCode() { 
    return 13;
}

If you’re using some old IDE such as Netbean, if you let the IDE does the insertion for you, you might end up overriding something like this. This code ensures 2 equal objects are hashed to the same bucket (described below), but it’s disastrous because every object will be hashed to the same location. Instead of O(1) for item lookup, it now require O(n).

So providing a good hashCode is crucial if you want to keep the invariant, but how can you write a good one? Some solutions are already available, such as:

  1. Objects.hash method
  2. Technique from Effective Java

The first one is dead simple, when override hashCode, everything you need to do is to pass all parameters that you need in the equals method as arguments of this Objects.hash function, for example:

Objects.hash(name, age, hobby); // returns some random hash value

This hash method accepts a varagrs list of Object type, meaning you can pass any number of parameters and each parameter can have an arbitrary type. However, it comes with a cost if there are multiple primitive values passed to the method since all of them need to wrap in an array and convert to their corresponding boxed value each time the method gets called.

The second approach I learned from a top-rated Java book “Effective Java”, in this book, Joshua Bloch guides us a simple recipe for writing a good hashCode function like this:

  • 1: Declare a variable named result, and initialize it to the hash code c for the first significant field in your object (significant field is the field you use in your equals() comparasion, as computed in step 2.a)
  • Compute int hash code for c field:
  • 2a-1: If the field is primitive, compute the Type.hashCode(f), where Type is boxed type corresponding typ f's type.
  • 2a-2: If the field is an object reference and the equals method compares by recursively invoking equals, then we recursive invoke hashCode on this field.
  • 2a-3: If the field is an array, then treat each significant element in this array as a seperate field, that is, computing hash code for each significant element by apply these rules recursively and finally when done, combine the values to step 2b-1. If all elements in array are significant, then using Arrays.hashCode function.
  • 2-b: Combine the hash code gets from the step 2.a with the result as follow:

result = 31 * result + c;

  • 3. Return the result.

For example, I have a Song class which have some properties title, duration, artists, isAvailableOnSpotify, here is how I would write the hashCode function:

    @Override
    public int hashCode() {
        int result = title.hashCode();
        result = 31 * result + Integer.hashCode(duration);
        result = 31 * result + Arrays.hashCode(artists);
        result = 31 * result + Boolean.hashCode(isAvailableOnSpotify);
        return result;
    }

How HashMap, HashSet and Hashtable work internally?

If you’re curious about how HashMap and HashSet work internally, the best way might be to open the implementation of these data structures and start pondering how they work. However, when an element is put to a Set or a Map will be stored in some bucket, and each bucket can store one or more elements. When overriding hashCode, tell the Java compiler to put your object in a correct bucket position because internally, each buckets in a Set or a Map doesn’t store just one element. It’s designed for storing multiple elements on a single bucket, which is why you also need to override your equals method. Let’s pretend 2 distinct objects have the same hash code (which, of course, usually do); now, these 2 objects are mapped to the same bucket as their hashCode are the same, to determine whether this is the identical object that has been put in before, the equals method takes the job to scan each entry of the bucket, if there is no identical object found, then a new object is put to this bucket; otherwise, it will be discarded.

When many elements are hashing to the same bucket, the HashMap might degrade each node to a TreeNode, which now becomes a TreeMap. As a result, the time complexity of basic operations such as search, insert, update now is O(log n).

Previous Article
Next Article

Buy Me a Coffee