How does Java implement hash tables?

Question

Does anyone know how Java implements its hash tables (HashSet or HashMap)? Given the various types of objects that one may want to put in a hash table, it seems very difficult to come up with a hash function that would work well for all cases.

If only there was some way of looking at the Java source code, goddamit! — oxbow_lakes
– oxbow_lakes, Commented Oct 29, 2009 at 23:59
Not everyone realises it's available, unfortunately. That's why this site is here. If you have the link, post it (as JG did below) — Brian Agnew
– Brian Agnew, Commented Oct 30, 2009 at 0:09

sinuhepop · Accepted Answer · 2009-10-30 00:45:32Z

22

HashMap and HashSet are very similar. In fact, the second contains an instance of the first.

A HashMap contains an array of buckets in order to contain its entries. Array size is always powers of 2. If you don't specify another value, initially there are 16 buckets.

When you put an entry (key and value) in it, it decides the bucket where the entry will be inserted calculating it from its key's hashcode (hashcode is not its memory address, and the the hash is not a modulus). Different entries can collide in the same bucket, so they'll be put in a list.

Entries will be inserted until they reach the load factor. This factor is 0.75 by default, and is not recommended to change it if you are not very sure of what you're doing. 0.75 as load factor means that a HashMap of 16 buckets can only contain 12 entries (16*0.75). Then, an array of buckets will be created, doubling the size of the previous. All entries will be put again in the new array. This process is known as rehashing, and can be expensive.

Therefore, a best practice, if you know how many entries will be inserted, is to construct a HashMap specifying its final size:

new HashMap(finalSize);

answered Oct 30, 2009 at 0:45

sinuhepop

20.4k17 gold badges75 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tom Hawtin - tackline Over a year ago

Technically depends upon implementation. It used to be popular to attempt a prime size of table. (Also, IIRC, the constructor to HashMap takes the capacity, so you should divide by the load factor and round up, but I think specifying the capacity is almost always more effort than it is worth.)

Nemin Over a year ago

Also, when there are more than one elements in the same bucket, Java then uses the equals() method to determine if the object in question in present in the HashTable. This is the reason why its recommended to override equals() and hashcode() together.

João Silva · Accepted Answer · 2009-10-30 00:00:53Z

8

You can check the source of HashMap, for example.

answered Oct 30, 2009 at 0:00

João Silva

91.8k29 gold badges156 silver badges158 bronze badges

Comments

Jim Garrison · Accepted Answer · 2009-10-30 15:47:40Z

7

Java depends on each class' implementation of the hashCode() method to distribute the objects evenly. Obviously, a bad hashCode() method will result in performance problems for large hash tables. If a class does not provide a hashCode() method, the default in the current implementation is to return some function (i.e. a hash) of the the object's address in memory. Quoting from the API doc:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

edited Oct 30, 2009 at 15:47

answered Oct 29, 2009 at 23:47

Jim Garrison

87k20 gold badges161 silver badges196 bronze badges

6 Comments

escalon Over a year ago

hashcode() can return totally different ranges of values (eg 1 to 100 vs 1 to 100,000). So how does Java know how many buckets it needs to create for a hash table?

newacct Over a year ago

@escalon: hashcode() returns an int (which always has range -2^31 to 2^31-1). You modulo the hash code by the current number of buckets for the table. This is how these hash tables usually work. The number of buckets can change as the hash table grows.

Tano Over a year ago

escalon, your original question was how java implements hashcode, while your comment here is more on the theory of hash table design. You may want to check out the wikipedia articles on hashing.

Tom Hawtin - tackline Over a year ago

If you try Object.hashCode you should see that it clearly does not return the memory address. (On reasonable implementations, the value is stored in the object header. Sun's implementation initialises the value on first use, using a slight rehash of the memory address at the time of initialisation.)

Jim Garrison Over a year ago

@tackline You are of course correct. I've edited the answer.

|

brianegge · Accepted Answer · 2009-10-30 00:15:29Z

2

There are two general ways to implement a HashMap. The difference is how one deals with collisions.

The first method, which is the one Java users, makes every bucket in a the HashMap contain a singly linked list. To accomplish this, each bucket contains an Entry type, which caches the hashCode, has a pointer to the key, pointer to the value, and a pointer to the next entry. When a collision occurs in Java, another entry is added to the list.

The other method for handling collisions, is to simply put the item into the next empty bucket. The advantage of this method is it requires less space, however, it complicates removals, as if the bucket following the removed item is not empty, one has to check to see if that item is in the right or wrong bucket, and shift the item if it originally has collided with the item being removed.

answered Oct 30, 2009 at 0:15

brianegge

30k13 gold badges77 silver badges103 bronze badges

1 Comment

Tom Hawtin - tackline Over a year ago

IdentityHashMap and the hash map used in ThreadLocal use the probing algorithm (in Sun's JRE).

Community · Accepted Answer · 2017-05-23 12:08:45Z

In my own words:

An Entry object is created to hold the reference of the Key and Value.

The HashMap has an array of Entry's.

The index for the given entry is the hash returned by key.hashCode()

If there is a collision ( two keys gave the same index ) , the entry is stored in the .next attribute of the existing entry.

That's how two objects with the same hash could be stored into the collection.

From this answer we get:

   public V get(Object key) {
       if (key == null)
           return getForNullKey();
       int hash = hash(key.hashCode());
       for (Entry<K,V> e = table[indexFor(hash, table.length)];
            e != null;
            e = e.next) {
           Object k;
           if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
               return e.value;
       }
       return null;
   }

Let me know if I got something wrong.

Collectives™ on Stack Overflow

How does Java implement hash tables?

5 Answers 5

2 Comments

Comments

6 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

6 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related