Why does Hashmap internally use a LinkedList instead of an Arraylist when two objects are placed in the same bucket in the hash table?
-
5Consider the cost of removal.Oliver Charlesworth– Oliver Charlesworth2015-06-20 18:54:18 +00:00Commented Jun 20, 2015 at 18:54
-
5Removing a specific item from an AL is more complex in both time and space than from a LL. Other than that, the benefits of ALs are not used at all, like finding an item by its indexDleep– Dleep2015-06-20 18:55:38 +00:00Commented Jun 20, 2015 at 18:55
-
6What are your other two questions?Glorfindel– Glorfindel2015-06-20 20:06:21 +00:00Commented Jun 20, 2015 at 20:06
-
3Note that it uses a linked list, not a java.util.LinkedList. the entries simply have a pointer to the next one in the bucket. This list should generally contain only one element, or at least very few, so the traversal cost is not a problem. And accessing by index isn't useful, so an array isn't needed.JB Nizet– JB Nizet2015-06-20 20:08:05 +00:00Commented Jun 20, 2015 at 20:08
-
3Using an ArrayList would either consume too much memory (if the array is too large), or need frequent copies to larger arrays (if the array is too small). The memory used by a linked list, OTOH, is directly proportional to the number of elements, and adding a new entry is always O(1).JB Nizet– JB Nizet2015-06-20 20:13:47 +00:00Commented Jun 20, 2015 at 20:13
3 Answers
Why does
HashMapinternally use sLinkedListinstead of anArraylist, when two objects are placed into the same bucket in the hash table?
Actually, it doesn't use either (!).
It actually uses a singly linked list implemented by chaining the hash table entries. (By contrast, a LinkedList is doubly linked, and it requires a separate Node object for each element in the list.)
So why am I nitpicking here? Because it is actually important ... because it means that the normal trade-off between LinkedList and ArrayList does not apply.
The normal trade-off is:
ArrayListuses less space, but insertion and removal of a selected element isO(N)in the worst case.LinkedListuses more space, but insertion and removal of a selected element1 isO(1).
However, in the case of the private singly linked list formed by chaining together HashMap entry nodes, the space overhead is one reference (same as ArrayList), the cost of inserting a node is O(1) (same as LinkedList), and the cost of removing a selected node is also O(1) (same as LinkedList).
Relying solely on "big O" for this analysis is dubious, but when you look at the actual code, it is clear that what HashMap does beat ArrayList on performance for deletion and insertion, and is comparable for lookup. (This ignores memory locality effects.) And it also uses less memory for the chaining than either ArrayList or LinkedList was used ... considering that there are already internal entry objects to hold the key / value pairs.
But it gets even more complicated. In Java 8, they overhauled the HashMap internal data structures. In the current implementation, once a hash chain exceeds a certain length threshold, the implementation switches to using a binary tree representation if the key type implements Comparable.
1 - That is the insertion / deletion is O(1) if you have found the insertion / removal point. For example, if you are using the insert and remove methods on a LinkedList object's ListIterator.
Comments
This basically boils down to complexities of ArrayList and LinkedList. Insertion in LinkedList (when order is not important) is O(1), just append to start. Insertion in ArrayList (when order is not important) is O(N) ,traverse to end and there is also resizing overhead.
Removal is O(n) in LinkedList, traverse and adjust pointers. Removal in arraylist could be O(n^2) , traverse to element and shift elements or resize the Arraylist.
Contains will be O(n) in either cases.
When using a HashMap we will expect O(1) operations for add, remove and contains. Using ArrayList we will incur higher cost for the add, remove operations in buckets
3 Comments
ArrayList overallocates, for large hashtables with a low load factor this becomes really significant.Short Answer : Java uses either LinkedList or ArrayList (whichever it finds appropriate for the data).
Long Answer
Although sorted ArrayList looks like the obvious way to go, there are some practical benefits of using LinkedList instead. We need to remember that LinkedList chain is used only when there is collision of keys. But as a definition of Hash function : Collisions should be rare
In rare cases of collisions we have to choose between Sorted ArrayList or LinkedList. If we compare sorted ArrayList and LinkedList there are some clear trade-offs
- Insertion and Deletion : Sorted ArrayList takes O(n), but LinkedList takes constant O(1)
- Retrieval : Sorted ArrayList takes O(logn) and LinkedList takes 0(n).
Now, its clear that LinkedList are better than sorted ArrayList during insertion and deletion, but they are bad while retrieval.
In there are fewer collisions, sorted ArrayList brings less value (but more over head). But when the collisions are more frequent and the collided elements list become large(over certain threshold) Java changes the collision data structure from LinkedList to ArrayList.