I am trying to implement a hash cons in java, comparable to what String.intern does for strings. I.e., I want a class to store all distinct values of a data type T in a set and provide an T intern(T t) method that checks whether t is already in the set. If so, the instance in the set is returned, otherwise t is added to the set and returned. The reason is that the resulting values can be compared using reference equality since two equal values returned from intern will for sure also be the same instance.
Of course, the most obvious candidate data structure for a hash cons is java.util.HashSet<T>. However, it seems that its interface is flawed and does not allow efficient insertion, because there is no method to retrieve an element that is already in the set or insert one if it is not in there.
An algorithm using HashSet would look like this:
class HashCons<T>{
HashSet<T> set = new HashSet<>();
public T intern(T t){
if(set.contains(t)) {
return ???; // <----- PROBLEM
} else {
set.add(t); // <--- Inefficient, second hash lookup
return t;
}
}
As you see, the problem is twofold:
- This solution would be inefficient since I would access the hash table twice, once for
containsand once foradd. But okay, this may not be a too big performance hit since the correct bucket will be in the cache after thecontains, soaddwill not trigger a cache miss and thus be quite fast. - I cannot retrieve an element already in the set (see line flagged
PROBLEM). There is just no method to retrieve the element in the set. So it is just not possible to implement this.
Am I missing something here? Or is it really impossible to build a usual hash cons with java.util.HashSet?