Understanding equals and hashcode contract when using collections in Java
This tutorial explains the contract defined for overriding
Tutorial starts off with creating a
The tutorial then implements
OUTPUT of the above code
In the above code, we create 2
We then check whether
As the
OUTPUT of the above code
In the above output, you can see that the two objects we created -
Let us now run the earlier code again and the output this time, with
In the above output, the 2
OUTPUT of the above code
The above output confirms what we had deduced - hashcodes for variable
To understand the problem with hashcodes, we will first need to understand a little bit about hashing, and the role played by hashcodes in hashing.
Understanding Hashing
Hashing, as you can see in the diagram above, is a mechanism used to distribute data into multiple buckets inside a hashed collection, with each bucket identified by a hashcode. So, the hashcode returned by an object determines which bucket the object is stored in.
When searching for an object to be removed using an 'equal but different object', such as
So, let us now override
Let us now run the earlier code again, the one with all the debugging code. The output comes out to be as follows -
This time
Having seen
1. Official Java API for Object class
java.lang.Object
’s equals()
and hashcode(
) methods and its importance when working with objects in a Java collection. It is this contract which lays down exactly how the two methods need to work together to allow objects to be correctly added and removed in Java collections.Tutorial starts off with creating a
HashSet
instance and stores POJOs of Employee
class in it. It first shows how, in the absence of overridden in-agreement hashCode()
and equals()
methods, it's simply not possible to accurately remove an Object
from the HashSet
based on the uniqueness of the attribute values of the object. The tutorial then implements
equals()
and hashCode()
methods for Employee
, along with explaining the mechanism of hashing based on which hashed collections store objects in memory, to show a correctly working HashSet
instance. Lastly, it goes through the official contract for equals()
and hashCode()
methods defined in Object
class.
Example showing HashSet
being used without equals()
and hashCode()
overridden
Let us first take a look at the code for Employee
POJO with 2 attributes name
& age
, and another set of code which uses the Employee
as the type for a HashSet
.
Code showing HashSet usage without overriding equals() & hashCode()
//Employee.java POJO class
package com.javabrahman.corejava;
public class Employee {
private String name;
private Integer age;
public Employee(String name, Integer age) {
this.name = name;
this.age = age;
}
//Getters and Setters for name and age go here
public String toString() {
return "Employee Name:" + this.name
+ " Age:" + this.age;
}
}
//Class using Employee POJO in a HashSet
package com.javabrahman.corejava;
import java.util.HashSet;
public class EqualsNHashCode {
public static void main(String args[]){
HashSet<Employee> empHashSet=new HashSet<>();
empHashSet.add(new Employee("David", 32));
empHashSet.add(new Employee("John", 45));
empHashSet.forEach(System.out::println);
boolean isRemoved=empHashSet.remove(new Employee("David", 32));
System.out.println("David is removed: "+isRemoved);
empHashSet.forEach(System.out::println);
}
}
Employee Name:David Age:32 Employee Name:John Age:45 David is removed: false Employee Name:David Age:32 Employee Name:John Age:45
Employee
instances named David
and John
. We then print the 2 employees using a forEach()
method. Next, we remove David
from the HashSet
using a newly instantiated Employee
object which has the same name and age - ‘David
’, 32
.We then check whether
David
was removed. false
is returned by the HashSet.remove()
method indicating that David
wasn't removed from the HashSet
. This is confirmed when we print the employees again using forEach()
method, resulting in David and John both getting printed. So - What went wrong? Why was david not removed from the hashset?
What went wrong?
We tried to remove David
by passing another instance of Employee
with same name and age. HashSet
internally uses Object.equals()
to find the matching object it is trying to remove. Since equals()
is not overridden in Employee
class, equals()
implementation being used by the HashSet
is the default one defined in Object
class which compares the memory references of the two objects it's equating. As the
Employee
object holding David in the HashSet
and the Employee
object passed to the HashSet.remove()
method are different, their memory adddresses will be different, and hence they are bound to fail the equals()
test. Let us confirm this understanding by modifying our earlier code and adding some debugging code to it - HashSet is unable to equate Employee objects for removal
package com.javabrahman.corejava;
import java.util.HashSet;
public class EqualsNHashCode {
public static void main(String args[]){
HashSet<Employee> empHashSet=new HashSet<>();
Employee David = new Employee("David", 32);
Employee John = new Employee("John", 45);
empHashSet.add(David);
empHashSet.add(John);
empHashSet.forEach(System.out::println);
Employee DavidForRemoval=new Employee("David", 32);
System.out.println("----David's Objects are equal? "+David.equals(DavidForRemoval));
boolean isRemoved=empHashSet.remove(DavidForRemoval);
System.out.println("David is removed: "+isRemoved);
empHashSet.forEach(System.out::println);
}
}
Employee Name:David Age:32 Employee Name:John Age:45 ----David's Objects are equal? false David is removed: false Employee Name:David Age:32 Employee Name:John Age:45
In the above output, you can see that the two objects we created -
David
(which we stored in HashSet
) and DavidForRemoval
which we passed to the HashSet.remove()
method are indeed not equal. To solve this problem let us now go ahead and override equals()
method in Employee
class.equals() code added to Employee.java
@Override
public boolean equals(Object obj) {
if (obj == this) {
return true;
}
if (!(obj instanceof Employee)) {
return false;
}
Employee empObj = (Employee) obj;
return this.age == empObj.age
&& this.name.equalsIgnoreCase(empObj.name);
}
hashCode()
implemented, is as follows - Employee Name:David Age:32 Employee Name:John Age:45 ----David's Objects are equal? true David is removed: false Employee Name:David Age:32 Employee Name:John Age:45
In the above output, the 2
Employee
instances for David named David
and DavidForRemoval
are now found 'equal' using the equals()
method. However, still the remove call for David didn't work. As you might have guessed by now, the reason for non-removal of David lies in the fact that their hashcodes don't match. To verify this understanding let me quickly show you what is printed when we print the hashcodes of David
and DavidForRemoval
.Checking the default hashcode values for Employee objects
//rest of the code is same as earlier
System.out.println("----David's hashcode: "+David.hashCode());
System.out.println("----John's hashcode: "+John.hashCode());
System.out.println("----DavidForRemoval's hashcode: "+DavidForRemoval.hashCode());
----David's hashcode: 685325104 ----John's hashcode: 460141958 ----DavidForRemoval's hashcode: 1078694789
David
(685325104
) and variable DavidForRemoval
(1078694789
) are different. So, the problem at hand now can be summarized like this - HashSet
ends up searching for the stored David
object in the wrong bucket of the hashtable and so is unable to find and remove it, despite the two objects being equal (using equals()
i.e.).To understand the problem with hashcodes, we will first need to understand a little bit about hashing, and the role played by hashcodes in hashing.
When searching for an object to be removed using an 'equal but different object', such as
David
and DavidForRemoval
in our case, HashSet
needs to search for an object which is equal to DavidForRemoval
in the same bucket in which David
was stored. Otherwise, as in our case, with the hashcodes of the 2 objects being different, HashSet
ends up searching in the wrong bucket and is unable to find the object. So in our case, HashSet was searching in the bucket identified by DavidForRemoval
’s hashcode(1078694789
) while David
was stored in a bucket identified by a different hashcode(685325104
). The search was destined to fail.So, let us now override
hashCode()
method as well in Employee
in order to get the same hashcode value for equal objects i.e. objects with same value of attributes.Overriden hashCode() in Employee.java
@Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + this.name.hashCode();
hash = hash * 31 + this.age;
return hash;
}
----David's hashcode: 2039983707 ----John's hashcode: 71751281 Employee Name:John Age:45 Employee Name:David Age:32 ----DavidForRemoval's hashcode: 2039983707 ----David's Objects are equal? true David is removed: true Employee Name:John Age:45
David
was removed from the HashSet
succesfully. The hashcode for David
and DavidForRemoval
was same this time - 2039983707
. As a result, HashSet
was able to search in the correct bucket for David's Employee
instance. And, given that the two objects could be equated correctly using equals()
, HashSet
was able to find David's originally stored object, which was equal to DavidForRemoval
, and remove it.hashCode()
and equals()
methods in action, and how their correct implementation together is a basic requirement when working with hashed collections, let us now take a look at the 'official' contract defined in Object
class for these methods.
Contracts of equals()
and hashCode()
as defined in Object API
Contract of equals()
method
(Note - The rules below are applicable for non-null reference values x and y.)
- It is reflexive. x.equals(x) should return true.
- It is symmetric.x.equals(y) should return true if and only if y.equals(x) returns true.
- It is transitive.If x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
- It is consistent. Multiple invocations of x.equals(y) consistently return true or consistently return false.
- For any non-null reference x, x.equals(null) should return false.
hashCode()
method
- On every invocation, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. The hashcode value need not be same for different execution environments.
- If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
- It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.
equals()
and hashCode()
methods
- Hashcodes for equal objects(which return true when compared using equals()) should be same.
- Hashcodes for unequal objects can be same. I.e. there is a one-to-many relationship between hashcodes and objects. (This is in line with the concept of hashing we saw above- as a particular hashcode can have multiple objects in its bucket).
equals()
and hashCode()
usage in hashed collections in Java. We took an example in which equals()
and hashCode()
weren't agreeing to see how object access and removal wouldn't work properly in such cases. We then implemented the two methods in the POJO class we were storing in the HashSet
as per their defined contract, and saw that object access and removal did work fine with the change.
1. Official Java API for Object class