Some notes about TreeSet in Java5 min read
The most frequent data structure that I use in Java is the List. But sometimes if I have to store distinct elements without caring about the ordering, then I would use a Map or a Set depending on my needs. In Java, you can create a simple Set to store some distinct elements by using the HashSet
class. For example, I have a simple class called SuffixString
, this class has 3 properties:
- The
index
number property (which acts as an identifier for this suffix) - The
weight
property, I want to use this property later on for the sorting purpose - And finally the
suffix
itself, it’s a string taken from another string in which thesuffix
is a suffix of it.
public class SuffixString {
private final int index;
private final int weight;
private final String suffix;
public SuffixString(int index, int weight, String suffix) {
this.index = index;
this.weight = weight;
this.suffix = suffix;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof SuffixString)) return false;
SuffixString that = (SuffixString) o;
return index == that.index && weight == that.weight && Objects.equals(suffix, that.suffix);
}
@Override
public int hashCode() {
int result = Integer.hashCode(index);
result = 31 * result + Integer.hashCode(weight);
result = 31 * result + suffix.hashCode();
return result;
}
}
So now I can create a HashSet to put some elements there:
Set<SuffixString> suffixs = new HashSet<>();
As already mentioned in this post, when you use the Set data structure, you should always override equals()
when overriding hashCode
. The Set instance you create is actually backed by a Map, then when putting an element inside a Set, you actually put this element into a Map with a key without the value. And suppose you want to add some elements to the Set, these two methods will sequentially invoke, in this manner:
- First the
hashCode
method will be invoked to compute thehash
of an element you want to put in the set, if 2 objects have different hash codes, then they are different, and no need to callequals
method to further verify if there are 2 identical objects. - In case 2 objects return the same hash code, this could mean there is only one object, but there can also be 2 objects mapped to the same hash code. From this point, the
equals()
method will be invoked to check if these 2 objects are actually the same one.
Now if I put 2 different suffix strings with identical properties, both hashCode
and equals
will do the job to avoid putting a duplicate entry to the set:
Set<SuffixString> suffixSet = new HashSet<>();
suffixSet.add(new SuffixString(1, 100, "bcdef");
suffixSet.add(new SuffixString(1, 100, "bcdef");
System.out.println(suffixSet.size()); // 1
But what if we use a TreeSet
instead of a HashSet
? Unlike a regular HashSet, the TreeSet implementation helps us maintain the sorted order of the elements we put inside the Set, it can be the natural order such as 1,2,3 or ‘a’, ‘b’, ‘c’, or you can customize the sorting order as you need, these are 2 constructors that are frequently used:
Set<Integer> sortedIntegerSet = new TreeSet<>(); // this will use the sorting strategy of the type that you put into this set
Set<Laptop> sortedStrSet = new TreeSet<>(new Comparator<Laptop>() {
@Override
public int compare(Laptop o1, Laptop o2) {
// provide the impl for this
}
}); // create the custom comparator as you need, the thing you put into this set needn't to implement the Comparable interface
For the first one, the type of object that you put into the Set must implement the Comparable
interface, if not, then the code will not compile. For the second one, the Comparator
you use in the constructor will be used to compare elements, and it will compile just fine even Laptop
is not comparable.
The caveat here is that the hashCode
and equals
has no effect to determine where an element will be placed in the set, but instead, the comparing order for the object you put in will determine that matter. For example, let’s do a simple example like this:
public class Movie implements Comparable<Movie> {
private final String title;
private final int yearReleased;
private float averageRating;
public Movie(String title, int yearReleased, float averageRating) {
this.title = title;
this.yearReleased = yearReleased;
this.averageRating = averageRating;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Movie movie)) return false;
return yearReleased == movie.yearReleased
&& Float.compare(movie.averageRating, averageRating) == 0
&& Objects.equals(title, movie.title);
}
@Override
public int hashCode() {
return Objects.hash(title, yearReleased, averageRating);
}
@Override
public int compareTo(Movie o) {
return this.title.compareTo(o.title);
}
// toString...
public static void main(String[] args) {
Set<Movie> movies = new HashSet<>();
Movie firstMovie = new Movie("Twilight", 1998, 4.3f);
Movie secondMovie = new Movie("Twilight", 2008, 4.7f);
movies.add(firstMovie);
movies.add(secondMovie);
System.out.println("List movies from the hash set:");
System.out.println(movies);
Set<Movie> setMovies = new TreeSet<>();
setMovies.add(firstMovie);
setMovies.add(secondMovie);
System.out.println("List movies from the tree set:");
System.out.println(setMovies);
}
}
And this is the actual output of this program:
List movies from the hash set:
[Movie{title='Twilight', yearReleased=1998, averageRating=4.3}, Movie{title='Twilight', yearReleased=2008, averageRating=4.7}]
List movies from the tree set:
[Movie{title='Twilight', yearReleased=1998, averageRating=4.3}]
Here, it comes as no surprise that the HashSet gives us 2 elements since there are 2 different movies having the same title but different yearReleased
and averageRating
. However, with the TreeSet
below, since an element is added to the set by using the comparable result of that element, notice when we implement the Comparable
for the Movie
, we only compare the title
of the movie, hence there is only one element added to the set because 2 movies have the same title.
The TreeSet is a red-black tree implementation that maintains the insertion order when you put an element to a set, but instead of O(1) as a regular set to put an element in, it requires O(logn) operations to put the element in the right spot, the deletion also requires the same amount of operations. And depends on your use case, for example, if you want a collection of unique elements and they are in sorted order, and you want to search and write to this collection frequently, returning a subset of the collection, the TreeSet might be a good choice.