You asked "Or, am I missing something? Please enlighten me."
Yes, you are missing something.
Inside the HashMap class implementation, it protects against poor hashing functions:
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
So your resulting hashcodes in your example are:
k1 - Before: 3366 After: 3566
k2 - Before: 3367 After: 3567
k3 - Before: 3368 After: 3552
So even in your small sample size of 3 elements, one of them got rehashed. Now, this doesn't guard against aggressively evil hashCodes (return randomInt(); or return 4; simply can't be protected against) but it does guard against poorly written hashcodes.
I should also point out that you can change things a great deal by using non trivial inputs. Consider for example the following strings.
k1longer - Before: 1237990607 After: 1304548342
k2longer - Before: 2125494288 After: 2040627866
k3longer - Before: -1281969327 After: -1178377711
Notice how much different the lower bits are: those are the only things that matter for a hashcode is the low bits. The size of the backing map is always a power of two. It's actually documented that way in the code:
/**
* The table, resized as necessary. Length MUST Always be a power of two.
*/
transient Entry[] table;
The rehashing does a fairly decent job of making sure that the upper bits (which are typically ignored in the hash table) still have an impact on the lower bits. Here's the mapping of original hashcode positions and the bits they effect:
00: 00000000000000000000000000000001
01: 00000000000000000000000000000010
02: 00000000000000000000000000000100
03: 00000000000000000000000000001000
04: 00000000000000000000000000010001
05: 00000000000000000000000000100010
06: 00000000000000000000000001000100
07: 00000000000000000000000010001001
08: 00000000000000000000000100010010
09: 00000000000000000000001000100100
10: 00000000000000000000010001001000
11: 00000000000000000000100010010000
12: 00000000000000000001000100100001
13: 00000000000000000010001001000010
14: 00000000000000000100010010000100
15: 00000000000000001000100100001000
16: 00000000000000010001001000010001
17: 00000000000000100010010000100010
18: 00000000000001000100100001000100
19: 00000000000010001001000010001001
20: 00000000000100010010000100010011
21: 00000000001000100100001000100110
22: 00000000010001001000010001001100
23: 00000000100010010000100010011000 # means a 1 in the 23rd bit position will
24: 00000001000100100001000100110001 # cause positions 4, 5, 8, 12, and 20 to
25: 00000010001001000010001001100010 # also be altered
26: 00000100010010000100010011000100
27: 00001000100100001000100110001001
28: 00010001001000010001001100010010
29: 00100010010000100010011000100100
30: 01000100100001000100110001001000
31: 10001001000010001001100010010000
So, your worries about "degrading big-O lookup time from O(1) to be...who knows how bad...maybe worse than O(log n)" and "Wouldn't Sun provide a default hashCode function that will work well for large data sets?" may be put to rest - they have safeguards in place to prevent that from happening.
If it helps you get some peace at all, here's the author tags for this class. They're literally all stars in the Java world. (comments with # are mine)
* @author Doug Lea # Formerly a Java Community Process Executive Committee member
* @author Josh Bloch # Chief Java architect at Google, amongst other things
* @author Arthur van Hoff # Done too many hardcore Java things to list...
* @author Neal Gafter # Now a lead on the C# team at Microsoft, used to be team lead on javac