1

In testing with String.hashCode() I noticed that it does not have an avalanche effect. I know that there's a java port of Jenkins hash, but I was wondering if there's a hash function, maybe in some apache library or something, that has this property.

Edit: I'm looking for a function that exhibits this property, and returns a 32-bit (or 64-bit) integer (for example, like Jenkins hash). I'm not using it for cryptography, and I'm not intending to replace String.hashCode in general. I just thought hashCode had this property, and it turns out it doesn't, and I'm wondering if there's anything in java's standard libs or maybe an apache lib, that satisfies my need.

5
  • 1
    The hashCode function just deals with equal/not equal. There is no need for the property you think of here. Commented Feb 15, 2011 at 19:39
  • I'm not quite sure if this will help you, but Apache Hadoop has a Jenkins and Murmur hash implementation, just google the classes. Commented Feb 15, 2011 at 19:43
  • 1
    hashCode() is designed for simplicity and speed for using in Hashtable/HashMap etc. It is also optimised for ASCII text. (i.e. uses a prime multiplier of 31 which is more than the number of characters) Since most hash maps are small not all 32-bits need to flip to produce a good distribution of hash values. Commented Feb 15, 2011 at 20:31
  • @Peter Lawrey: the performance of all data structures is good when they're small. The selling point of HashMap is that performance is average case O(1) when it's large. Why would (or did) java designers optimize for the case that doesn't seem to matter? Commented Mar 2, 2011 at 16:28
  • Because big-O doesn't consider the constant nor that this is an operation which may occur many times. e.g. say you want to copy a HashMap, you want each copy to be efficient. Commented Mar 3, 2011 at 7:43

1 Answer 1

3

The avalanche effect, as described in the wikipedia page you linked to, is an important property of cryptographic hash functions. String.hashCode() is not a cryptographic hash function. Its only goal is to generate sufficiently distributed hash codes for different strings so that HashMap, HashSet and all other hash-based collections are efficient when holding strings.

For cryptographic hash functions, look at JCA, which allows generating SHA-1, MD5, and other cryptographic digests, which all have the effect you're looking for.

Sign up to request clarification or add additional context in comments.

4 Comments

I guess I should have been more clear. I want a hash function for strings that returns a 32 bit integer and exhibits an avalanche effect. I'm not using it for a hash table.
Use an existing cryptographic function, convert the result to a String, and call hashcode() on the string.
I thought avalanche was an important property of any good hash function. For example, if Im storing values in a HashMap and many keys are similar ("k1", "k2", "k3", "k4"), they'll end up being stored unevenly (contiguously actually). The result is that collisions are more likely, and that degrades performance. Because of this, I'm surprised that java doesn't provide a better default hashCode method. Or am I missing something? I'm sure the designers of java had a good rationale, but I just don't understand it.
@Kevin: No, the avlanche property is not an important property of any good hash function, and no, collisions are not more likely if similar keys are stored adjacently.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.