2

Is there a known Java String with hashCode exactly equal to Integer.MIN_VALUE ? It would be helpful for writing a test for a hash table to help avoid a common mistake of running Math.Abs on the hashcode before performing the remainder operation.

Ideally the string would include only ASCII characters, but I'm not sure if it woul dbe feasible.

3
  • 5
    if an underscore is allowed: "HZcxf_".hashCode() == Integer.MIN_VALUE Commented Nov 14, 2022 at 18:29
  • Wow @user16320675: that was fast. If you submit it as an answer, I will accept. Commented Nov 14, 2022 at 18:36
  • 1
    I'm curious how @user16320675 found that. I wrote a little program that checks random strings of printable ASCII characters (all strings length 6). It ran for about 3 billion strings without a match before I just killed it. Commented Nov 14, 2022 at 19:15

2 Answers 2

9

Based on the formula for hash code (from StringLatin1):

public static int hashCode(byte[] value) {
    int h = 0;
    for (byte v : value) {
        h = 31 * h + (v & 0xff);
    }
    return h;
}

It depends linearly on the characters, the longer the string and greater the characters, the greater the hash code will be, until it overflows. Also note that the first characters have a greater impact on the resulting hash code (more often multiplied by 31).

The basic idea of the two first algorithm is to increment the characters until the hash code turns negative, starting with the left-most character since it has a greater weight. The searched string must have the character previous to the one that caused it to overflow on each position but the last one.

The code starts testing the strings "A", "AA", "AAA", ... until one starts returning a negative value - the previous string is used as starting value.
Now it starts incrementing the first character up to Z or until a string with a negative hash is found. The same is done for every next character. Since the hash code of such strings was not yet reaching Integer.MIN_VALUE, an additional pass is done, to also test lowercase characters. This should have been integrated in the previous loop...
Now the last character is adjusted to exactly get to Integer.MIN_VALUE - simple since the last character is just added, without multiplication to calculate the hash code.

Here the code:

var string = "A";
while ((string+"A").hashCode() > 0) {
    string += "A";
}

var array = string.toCharArray();
var i = 0;
while (i < array.length) {
    array[i] += 1;
    if (array[i] > 'z' || new String(array).hashCode() < 0) {
        array[i] -= 1;
        i += 1;
        continue;
    }
}

i = 1;
while (i < array.length) {
    if (array[i] == 'Z') {
        array[i] = 'a';
    }else {
        array[i] += 1;
    }
    if (array[i] > 'Z' || new String(array).hashCode() < 0) {
        if (array[i] == 'a')
            array[i] = 'Z';
        else
            array[i] -= 1;
        i += 1;
        continue;
    }
}
int hash = new String(array).hashCode();
if (hash > 0) {
    array[array.length-1] += Integer.MAX_VALUE - hash + 1; 
}
System.out.printf("%s = %d%n", new String(array), new String(array).hashCode());

This results in:

HZcxf_ = -2147483648


Merging the two incrementing loops of previous code, we have:

var string = "A";
while ((string+"A").hashCode() > 0) {
    string += "A";
}

var array = string.toCharArray();
var i = 0;
while (i < array.length) {
    var prev = array[i];
    if (prev == 'Z') {
        array[i] = 'a';
    } else {
        array[i] += 1;
    }
    if (array[i] > 'z' || new String(array).hashCode() < 0) {
        array[i] = prev;
        i += 1;
        continue;
    }
}
int hash = new String(array).hashCode();
if (hash > 0) {
    array[array.length-1] += Integer.MAX_VALUE - hash + 1; 
}
System.out.printf("%s = %d%n", new String(array), new String(array).hashCode());

Resulting in (slightly different than previous):

HZdZG_ = -2147483648


Another method would be more strongly based on the hash calculation, basically undoing it.
Since I did not want to work with negative number, it starts with Integer.MAX_VALUE, which is one less than Integer.MIN_VALUE (considering over/underflow).
First it finds out how often it must be divided by 31 until the result is less than 128 (ASCII), kind of determining the string length. Next it loops and finds out each character with some special handling to avoid characters less than ' '.
At the end, the last character is incremented by one to move the hash code from MAX_VALUE to MIN_VALUE by overflowing.

var string = "";
var remain = Integer.MAX_VALUE;
var i = 0;
var multiplier = 1;
while (remain > 127) {
    remain /= 31;
    multiplier *= 31;
    i += 1;
}
remain = Integer.MAX_VALUE;
while (i >= 0) {
    var ch = (char)(remain / multiplier);
    remain -= ch * multiplier;
    multiplier /= 31;
    if (i > 0) {
        // correct if next ch will be less than ' '
        var correct = (' ' - (remain / multiplier) + 30) / 31;  // old fashion rounding
        if (correct > 0) {
            ch -= correct;
            remain += correct * 31 * multiplier;
        }
    } else {
        ch += 1;
    }
    string += ch;
    i -= 1;
}
System.out.printf("%s = %d%n", string, string.hashCode());

And its results:

I='<*! = -2147483648


Note: the last code will definitively fail if the hash code algorithm of String is changed! The previous two may fail, depending on how the hash calculation is changed.

Sign up to request clarification or add additional context in comments.

Comments

1

String#hashCode() is defined as:

Returns a hash code for this string. The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

Now you just need to solve for -2147483648 (perhaps with the restriction of only printable ASCII chars: 32–127) :)

Or you brute-force (this will take a while):

public class HashFinder {
    private static final int SIZE = 7;
    private static long hashesCalculated = 0L;

    public static void main(String[] args) {
        hashesCalculated = 0L;
        final long start = System.nanoTime();
        findHash(SIZE);
        final long duration = System.nanoTime() - start;

        System.err.println("Checked strings of size " + SIZE);
        System.err.println(hashesCalculated + " hashes in " + TimeUnit.NANOSECONDS.toSeconds(duration) + "s");
    }

    public static void findHash(final int size) {
        findHash("", size);
    }

    public static void findHash(final String prefix, final int size) {
        if (size <= 0) {
            return;
        }

        final StringBuilder sb = new StringBuilder(prefix).append(' ');
        for (char c = ' '; c < '~'; ++c) {
            sb.setCharAt(prefix.length(), c);
            final String s = sb.toString();
            ++hashesCalculated;
            if (s.hashCode() == Integer.MIN_VALUE) {
                System.out.printf("Found string with min hashCode! '%s'%n", s);
            }

            findHash(s, size - 1);
        }
    }
}

But allocating all those strings and string builders is expensive. Brute-forcing becomes feasible when we calculate the hash code manually from a char array:

public class HashFinderBytes {
    public static void main(String[] args) {
        final char start = ' ', end = '~';
        for (int size = 1; size <= 9; size++) {
            char[] chars = new char[size];
            Arrays.fill(chars, start);

            final long startNano = System.nanoTime();
            final long combinations = BigInteger.valueOf(end - start).pow(size).longValue();
            System.err.println("Checking " + combinations + " strings of size " + size);
            for (long i = 0; i < combinations; ++i) {
                if (hashCode(chars) == Integer.MIN_VALUE) {
                    System.out.printf("Found string with min hashCode! \"%s\"%n", new String(chars));
                    System.out.println("Sanity check: " + (new String(chars).hashCode() == Integer.MIN_VALUE));
                }
                for (int j = 0; j < chars.length; ++j) {
                    ++chars[j];
                    if (chars[j] <= end) {
                        break;
                    }
                    chars[j] = (byte) start;
                }
            }
            final long duration = System.nanoTime() - startNano;

            final long millis = TimeUnit.NANOSECONDS.toMillis(duration);
            System.err.println("in " + millis + "ms (" + (combinations / millis) + " ops/ms)");
        }
    }

    public static int hashCode(char[] value) {
        int h = 0;
        for (char v : value) {
            h = 31 * h + (v & 0xff);
        }
        return h;
    }
}

In fact, there are lots of strings with a hash code identical to Integer.MIN_VALUE.

Length 6:

I='<*!
H\'<*!
G{'<*!
I<F<*!
H[F<*!
GzF<*!
I;e<*!
HZe<*!
Gye<*!
I=&[*!
H\&[*!
G{&[*!
I<E[*!
H[E[*!
GzE[*!
I;d[*!
HZd[*!
Gyd[*!
I=%z*!
H\%z*!
G{%z*!
I<Dz*!
H[Dz*!
GzDz*!
I;cz*!
HZcz*!
Gycz*!
I=';I!
H\';I!
G{';I!
I<F;I!
H[F;I!
GzF;I!
I;e;I!
HZe;I!
Gye;I!
I=&ZI!
H\&ZI!
G{&ZI!
I<EZI!
H[EZI!
GzEZI!
I;dZI!
HZdZI!
GydZI!
I=%yI!
H\%yI!
G{%yI!
I<DyI!
H[DyI!
GzDyI!
I;cyI!
HZcyI!
GycyI!
I=':h!
H\':h!
G{':h!
I<F:h!
H[F:h!
GzF:h!
I;e:h!
HZe:h!
Gye:h!
I=&Yh!
H\&Yh!
G{&Yh!
I<EYh!
H[EYh!
GzEYh!
I;dYh!
HZdYh!
GydYh!
I=%xh!
H\%xh!
G{%xh!
I<Dxh!
H[Dxh!
GzDxh!
I;cxh!
HZcxh!
Gycxh!

Length 7 (all of the strings below end with a space character); not all shown:

p4*|{e 
oS*|{e 
nr*|{e 
p3I|{e 
oRI|{e 
nqI|{e 
p2h|{e 
oQh|{e 
nph|{e 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.