I'm having a big amount of large lists of objects. Each object has a unique id. It looks something like this:
List a = {obj1, obj2, obj3}
List b = {obj3, obj4, obj5}
List c = {obj1, obj2, obj3}
// up to 100 million of them
Now I'd like to remove "List c" since it has the same content as "List a" in order to save memory.
For this purpose I'm simply adding them all to a hashmap and check if the key already exists. The objects are actually references in a large network graph. If only one is wrong the whole application crashs. Because it is very important that there will never be the same key for different objects I don't use the default
List.hashCode()
function but do this instead:
StringBuilder sb = new StringBuilder();
for ( List list : myList )
sb.append(list.getId());
return Hashing.sha256().hashString(sb.toString(), Charsets.US_ASCII).toString();
This works perfectly fine. Just it is very slow. Is there any way to achieve the same result in less time?
Lists'hashCode()implementation does not serve your purpose.inthash codes for 100M distinct objects, then you are consuming around 2% of all the available hash codes. The technique you describe has a reasonably high probability of producing a few hash collisions in that case.