Hashing algorithm and HashMap

Question

There are several topics on this in stackoverflow as well as in web, I could not understand them clearly. I found this answer in stackoverflow by Nicklas and his representation which gave me some meaningful insight over the topic.

Some ASCII art

key.hashCode()
     |
     | 32-bit value
     |                          hash table
     V                        +------------+    +----------------------+
HashMap.hash() ----+          | reference  | -> | key1 | value1 | null |
                   |          |------------|    +----------------------+
                   |          | null       |
                   | offset   |------------|    +---------------------+
                   +--------> | reference  | -> | key1 | value1 | ref |
                              |------------|    +---------------------+
                              |    ....    |                       |
                                                  +----------------+
                                                  V
                                                +----------------------+
                                                | key2 | value2 | null |
                                                +----------------------+

What hashing function does Java use to implement Hashtable class?

Map aMap = new HashMap();
aMap.put(key,value);

Can anyone explain me `what is an offset and its value?. Where is the offset value getting mapped?.
what's that hash-table?. Is it an Array of Objects?, If so does each object have an key, value1 and a reference propery?.

Can anyone re-explain the above diagram step-by-step. I am not able to understand the Hashing mechanism behind HashMap.

For (2), just look at the Java source, the 4th member of HashMap is Entry<K,V>[] table. — Bernhard Barker
– Bernhard Barker, Commented Sep 24, 2013 at 22:11
"2. The 32-bit value is then transformed by a second hash function (...) into an offset inside the hash table" - is that unclear? — Bernhard Barker
– Bernhard Barker, Commented Sep 24, 2013 at 22:12
Is Hashtable behaving like an array of objects, where does the offset value go?. I am just confused from when the hash function generates the offset value. — Shane
– Shane, Commented Sep 24, 2013 at 22:15
In put, the key value is hashed. The hash value is divided by the size of the hash table, and the remainder from that division (the offset in the above diagram) is used as the index to index into the actual hash table array. In the simplest case, where the hash table slot is empty, a new small object is created containing the key and the value. (ref is null.) The reference (pointer) to that object is placed in the selected hash table slot. — Hot Licks
– Hot Licks, Commented Sep 24, 2013 at 22:43

olegarch · Accepted Answer · 2013-09-24 23:10:15Z

7

First, lets I explain idea of hash search:

Lets you have some key for search. For example,key is just text string, like "hello".
Lets we compute some "checksum" for this string - for simple example, just by adding together ASCII codes of symbols. Lets (for example, I did not counted actually) this sum is 1009.
Lets we have array of fixed size, for example, SZ=10. Name this array as "hash_table". Each array element contains BUCKET -- this is set of possible keys.
So, we can compute OFFSET - index in the array, i.e. hash-table, by:

offset = sum % SZ = 1009 % 10 = 9;
Lets we deposit our word "hello" into busker #9 by:

hash_table[9].add_to_list("hello");

There, we have just single key "hello", stored in the cell hash_table[9];

Imagine, need insert 2nd key, for example, "world". Lets we compute sum, and it will be 37. Again, compute offset = 37 % 10 = 7. So, by this algorithm, we deposit key "world" into hashtable[7].
Collision. Imagine, you decided add 3rd word-key, "collision" Lets sum for this word will be 3007. When you compute offset, it would be 7. So, word "collision" must be deposited into the bucket, where is already exist word "world". This is OK. You just add element with world "collision" into linklist (head or tail). So:

hashtable[7] -> "world" -> "collistion" -> NULL; hashtable[9] -> "hello" ->NULL;
When you need search for a key "hello", you again compute checksum, and it again would be 1009. Thereafter, you compute offset = 9, and go directly to bucket #9. And, iterate linklist, and there you will found your word "hello". Analogous situation with "world" or "collision".

When you search for word, omitted in the table, you go to empty bucket, or to bucket, which does not contains your key. So, search unsuccessful.

if you make hash table too wide (for example, size as same as your dictionary), average bucket size would be ~1 element (if hash function has good spreading). So, search for a key wil be quick - just compute hashsum, go to bucket, and iterate 1-2 elements in the bucket.

Second - I will answer to your questions:

Offset - just index in the array. Array is a "hash table". Each array element contains pointer to linked list, contains keys, stored in the same bucket.
hashtable in the bucket scheme - just array, contains pointers to heads of buckets. With another hashing scheme (for example, multiple hashing, etc) - it can contains objects theyself, without linklists.

answered Sep 24, 2013 at 23:10

olegarch

3,9011 gold badge22 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Shane Over a year ago

Thanks for this explanation. From my understanding the HashCode calculated is nothing but the index of an array.

olegarch Over a year ago

Yes, you are correct. But, algorithm for compute hash sum isn't trivial. For example, just sum of ASCII-codes is not good, because of independent against letter position changing, i.e. sum("int") == sum("tin"). I designed my own hashing algorithm, see olegh.cc.st/hashing.pdf page #94.

olegarch Over a year ago

Sorry, no.. Maybe, sometime I will rewrite in English.. But I stopped write it ~3 years ago, because I do not see any significant interest to this theme...

Shane Over a year ago

I am sorry for troubling you more, i.sstatic.net/zHvEe.jpg in this image what does hash value map too... Does the hash value contain the index of array or something else... stackoverflow.com/questions/11596549/…

olegarch Over a year ago

I do not programming on Java, so do not know details of HashMap implementation in that language. In my book, I described in the 5.1.4. algorithm with secondary hash key, hash2. This key used for quick comparison within iterate bucket. What saved in the value "hash" in the picture - I do not know. Perhaps it is secondary hash, just for purpose, as I wrote above. If this is copy of original hash value - this is not smart idea, since there is few "new information" for secondary comparison.

Collectives™ on Stack Overflow

Hashing algorithm and HashMap

Some ASCII art

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Some ASCII art

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related