7

I am trying to test out how String hashcode works by adding a new character and do the reverse to subtract it out, however it didn't give me the right answer when i did the calculation.

For example

int PRIME = 31;

//this will print 1687846330 
System.out.println("mlkjihgfedcb".hashCode());   

//this will print 783628775 which equals to "mlkjihgfedcba".hashCode();
System.out.println(("mlkjihgfedcb".hashCode() * PRIME + (int) 'a')); 

//this will print 25278344 which doesn't equals to "mlkjihgfedcb".hashCode()
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') / PRIME); 

I am wondering did i do the math right on my last step above ? overflow shouldn't matter for the calculation right ?

4
  • 1
    It's not reversible. e.g. stackoverflow.com/questions/9406775/… Commented Dec 27, 2013 at 22:34
  • 2
    this is wrong on so many levels Commented Dec 27, 2013 at 22:36
  • @Bill I don't see people explain about why is it not reversible in that thread...where do you see it ? Commented Dec 27, 2013 at 22:40
  • 1
    If two different strings can produce the same result, you could not know which one of those strings produced that hash code, so you could not reverse the calculation. Commented Dec 27, 2013 at 22:42

3 Answers 3

11

Hash functions are not reversible functions. As proof, consider the Pigeonhole principle, you can only store values in the integer range in a hashCode but there are an infinite number of Strings. Therefore, there must be multiple strings with the same hashCode (and there are).

Sign up to request clarification or add additional context in comments.

Comments

5

Multiplying a big number by 31 leads to integer overflow, which cannot be reversed using division by 31.

But hold on though: arithmetic using int in Java is equivalent to arithmetic modulo 232. Since 31 is odd it has a multiplicative inverse modulo 232. Therefore multiplication by 31 can be reversed with multiplication by the said "inverse." That is:

int inverse = -1108378657;
// This will print the hashCode of mlkjihgfedcb
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') * inverse); 

3 Comments

how do we get or calculate the inverse number ?
You can use the extended Euclidean algorithm, or use Hensel's lemma. The latter is easiest to implement: first set x = a = 31 and then repeat x = x*(2-a*x) until it converges, up to five times.
But you don't have to calculate it, it's -1108378657
2

The specific reason this fails is

1687846330 * 31 + 97 = 52323236327

The number 52,323,236,327 is much larger than what an int can hold, so information is lost off the left-hand end. That information cannot be recovered by inverting the hashCode() algorithm.

If you do the arithmetic you'll see that the hash code you got for the second string is exactly 52323236327 mod 231

More important, the mapping from objects to hash codes is intended to be opaque and non-reversible. Just because it's implemented in a specific way today does not mean the next version of the library has to do it the same way. The only guarantee is that the possibility of collision is extremely low (but non-zero).

1 Comment

I don't know any way the behavior of the string's hash function could change without breaking any switch statements that use a string argument, since the compiled code includes hardcoded hash values for the strings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.