LECT 10, 11-DSALGO(Hashing).pdf

Lecture 10 & 11
HASHING
1
Course Supervisor: Syeda Nazia Ashraf
Data Structures & Algorithm
CSC-102

MOTIVATION
Linear Search
• Simplest Algorithm to search for a specific
target key in a data collection.
• Examines each element
• Takes 10 times longer to search for an element
in an array of 100 elements as compared to
the 10 element array O(n).
2

MOTIVATION
Binary Search
• Requires element to be in an order(sorted).
• Search time depends on the logarithm of the
collection size O(log n).
• Takes twice as long on average to search for
an element in an array of 100 elements as
compared to the 10 element array.
3

MOTIVATION
Conclusion
• The time taken for a search using each of
these methods depends on the size of the
collection.
• Hash data structures – Allow the storage and
retrieval of data in an average time which
does not depend at all on the collection size.
4

HASHING
• Hashing is the transformation of a string
of characters into a usually shorter fixed-
length value or key that represents the
original string.
• Hashing is used to index and retrieve items in
a database because it is faster to find the item
using the shorter hashed key than to find it
using the original value.
5

HASHING
Hash Tables(Hash Map)
• Simplest data structure.
• Hash Function – Basis of Hash Tables.
Hash Functions
• A hash function is any function that can be
used to map data of arbitrary size to data of
fixed size.
6

HASHING
Hashes
• The values returned by a hash function are
called hash values, hash codes, hash sums, or
simply hashes.
• Hash values are used to determine the
location in the table for the given element.
7

HASHING
IS THERE ANY PARAMETER FOR A GOOD
HASH FUNCTION?
• A good hash function is the one that
distributes the numbers fairly evenly in the
hash tables.
8

POPULAR HASH FUNCTIONS
1. Division Method
• A key (given element) is mapped into one of m
slots using the function.
h(k) = k mod m
Where m is the size of the table and is usually
chosen to be a prime number and k is the key. 9
Different types of hash functions are used for the mapping
of keys into tables.
(a) Division Method
(b) Mid-square Method
(c) Folding Method

10
1. Division Method
• Choose a number m larger than the number n of keys
in k
• The number m is usually chosen to be a prime no. or
a number without small divisors
• The hash function H is defined as,
H(k) = k(mod m) or H(k) = k(mod m) + 1
• Denotes the remainder, when k is divided by m
• 2nd formula is used when range is from 1 to m.

11
• Example:
Elements are: 3205, 7148, 2345
Table size: 0 – 99 (prime)
m = 97 (prime no. close to 99)
H(k)=k(mod m) i.e 3205 mod 97=4
H(3205)= 4, H(7148)=67, H(2345)=17
• For 2nd formula add 1 into the remainders.
• H(k)=k(mod m)+1 to obtain:
• H(3205)= 4+1=5, H(7148)=67+1=68,
H(2345)=17+1=18
DIVISION METHOD
3205
2345
7148
17
67
.
0
.
4
.
99
.
3205
2345
7148
18
68
0
.
5
.
99

POPULAR HASH FUNCTIONS Contd…
2. Folding Method
• The key is partitioned into a number of parts k1 +
k2 + k3 + … kn
• where each part except possibly the last part
has the same number of digits as the required
hash address.
• Then the parts are added together, ignoring the
last carry. That is,
h(k)= k1 + k2 + k3 + … kn
• Sometimes the even numbered parts (k2, k4 …)
are reversed before adding.
12

Folding Method
• Here we are dealing with a hash table with
index from 00 to 99, i.e., two-digit hash table
• So we divide the K numbers of two digits
H(7148) = 71 + 48 = 119, here we will eliminate the
leading carry (i.e., 1). So H(7148) = 71 + 48 = 19

Folding Method
• Sometimes, for extra "milling;" the even-
numbered parts, k2, k4, . . . , are each reversed
before the addition
• H(7148) = 71 + 84 = 155, here we will eliminate the
leading carry (i.e., 1). So H(7148) = 71 + 84 = 55

FOLDING METHOD
Example
• Create a hash table for the Keys 3205, 7148,
2345 by using Folding Method
Solution
• Partition K into a number of parts.
• Each part has the same number of digits as
the required address.
• Add parts together ignoring the last carry.
• h(3205) , h(k)= 32 + 05 , hashed key= 37
• h(7148) , h(k)= 71 + 48 , hashed key= 119 (Discard
leading digit 1) = 19
• h(2345) , h(k)= 23 + 45 , hashed key= 68
15

FOLDING METHOD Contd…
• Alternatively , one may want to reverse the
second part before adding.
• h(3205) = 32 + 50 = 82
• h(7148) = 71 + 84 = 155 (Discard 1) = 55
• h(2345) = 23 + 54 = 77
• Creation of the hash table on board.
16

POPULAR HASH FUNCTIONS Contd…
3. Midsquare Method
• The key is squared . The hash function is
defined by
• h(k) = l where l is obtained by deleting
digits from both the ends of k2.
17

Mid-Square Method
• The key is squared and the address selected from the
middle of the squared number
• The hash function H is defined by:
h(k) = k2 = l
• Where l is obtained by digits from both the end of k2
starting from left
• The most obvious limitation of this method is the size of
the key
• Given a key of 6 digits, the product will be 12 digits, which
may be beyond the maximum integer size of many
computers
• Same number of digits must be used for all of the keys

Mid-Square Method - Example
• Consider following keys in the table and its hash
index :

Mid-Square Method - Example
Hash Table with Mid-Square Division

MID-SQAURE METHOD
Example
• Create a hash table for the Keys 3205, 7148,
2345 by using Mid-square Method
Solution:
• Square K.
• Strip predetermined digits from front and rear.
• e.g., use thousands and ten thousands places
• K: 3205 7148 2345
• k2: 10272025 51093904 5499025
• hashed key=h(k):72 93 99
• 4th and 5th digits counting from the right side, are
chosen for hash address.
21

22
•Table size [0..99]
•A..Z ---> 1,2, ...26
•0..9 ----> 27,...36
•Key: CS1 --->3+19+28 (concatenate) = 31,928
•(31,928)2 = 1,019,397,184 → 10 digits
•Extract middle 2 digits (5th and 6th) as table size
is 0..99.
•Get 39, so: H(CS1) = 39.
Hashing a string key

Hash Function Examples
Let h(k) = k % 15. Then,
if k = 25 129 35 2501 47 36
h(k) = 10 9 5 11 2 6
Storing the keys in the array is straightforward:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
_ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _
Thus, delete and find can be done in O(1), and
also insert, except…

Hash Function
What happens when you try to insert: k = 65 ?
k = 65
h(k) = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
_ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _
65(?)
This is called a collision.

25
• If two keys map on the same hash table
index then we have a collision.
• As the number of elements in the table
increases, the likelihood of a collision
increases - so make the table as large as
practical
• Collisions may still happen, so we need a
collision resolution strategy
COLLISION

COLLISION
• When a hash function maps two different keys to the same
table address, a collision is said to occur.
• Two elements can not be stored at the same location in the
hash table.
• Two approaches are used to resolve collisions.
• Open Hashing : Means that collisions are resolved by
storing the colliding object in a separate area.
• Separate chaining
• Closed Hashing (Open Addressing) : In closed hashing, all
keys are stored in the hash table itself.
• Linear Probing
• Quadratic Probing
• Double Hashing
26
What is Probing?
If the table position given by the hashed key is already
occupied, increase the position by some amount, until an
empty position is found

CLOSED HASHING METHODS (COLLISION
RESOLUTION TECHNIQUES)
Linear Probing
• Here we place the elements by using the hash
function
hi(x) = (h(x) + i) mod TableSize.
• One of the methods for dealing with collisions.
• If a data element hashes to a location in the table
which is already occupied , the table is searched
consecutively from that location until an empty
location is found.
• The key would then be stored in the empty location.
• rap around from the last to the first bucket array
location if necessary.
28

LINEAR PROBING
Exercise Question
• h(K) = K mod 7
• Insert keys: 76 93 40 47 10 55
29

33
Linear probing
hash table after
each insertion

LINEAR PROBING Contd…
Disadvantage
• Clustering- Elements appearing next to one
another thus increasing search time.
Searching/ lookup
• To search for a given key x, the cells of T are
examined, beginning with the cell at
index h(x) (where h is the hash function) and
continuing to the adjacent cells h(x) + 1, h(x) + 2,
..., until finding either an empty cell or a cell
whose stored key is x.
35

LINEAR PROBING Contd…
Deletion
• It is also possible to remove a key–value pair from
the dictionary. However, it is not sufficient to do
so by simply emptying its cell. This would affect
searches for other keys that have a hash value
earlier than the emptied cell, but that are stored
in a position later than the emptied cell. The
emptied cell would cause those searches to
incorrectly report that the key is not present.
• Use Tombstones or markers.
38

2. Quadratic Probing
• Here we place the elements by using the
hash function
• hi(x) = (h(x) + i2) mod TableSize.
• Fast searching as compared to linear
probing.
• secondary clustering since keys that have
the same hash value also have the same
probe sequence
41

42
2. Quadratic Probing
• Quadratic probing is a solution to the clustering
problem
– Linear probing adds 1, 2, 3, etc. to the original
hashed key
– Quadratic probing adds 12, 22, 32 etc. to the original
hashed key
• However, whereas linear probing guarantees that all
empty positions will be examined if necessary,
quadratic probing does not

43
• If the table size is prime, this will try approximately
half the table slots.
• More generally, with quadratic probing, insertion may
be impossible if the table is more than half-full!
H(k) = h, h+1, h+4, h+9, h+25,……, h+i2

44
Quadratic Probing
• Quadratic Probing eliminates primary clustering
problem of linear probing.
• Collision function is quadratic.
– The popular choice is f(i) = i2.
• If the hash function evaluates to h and a search in
cell h is inconclusive, we try cells h + 12, h+22, … h
+ i2.
– i.e. It examines cells 1,4,9 and so on away from the
original probe.
• Remember that subsequent probe points are a
quadratic number of positions from the original
probe point.

QUADRATIC PROBING Cont…
Example
• h(K) = K mod 7
• Insert keys: 76 93 40 47 10 55
45

49
A quadratic probing
hash table after each
insertion (note that
the table size was
poorly chosen
because it is not a
prime number).

3. Double Hashing
• uses a secondary hash function h’(k) and
places the colliding item in the first
available cell of the series.
• The value calculated by the second hash
functions acts as an offset.
50

51
3. Double Hashing
• 2nd hash function H’ is used to resolve the collision.
• Suppose a record R with key k has hash address H(k)=h
and H’(k) = h’ ≠ m
• Therefore we can search the locations with addresses,
H’(k) = h, h+h’, h+2h’, h+3h’,…….
• If m is prime, then this sequence access all the
locations.

DOUBLE HASHING Cont…
53
index
count

56
Let the keys are 76, 93, 40, 47, 10, 55 and table size is 7 then apply Double
hashing technique for each insertion.

Open addressing: store the key/entry in a different position.
Separate Chaining
• Chain together several keys/entries in each
position.
• Instead of storing the data item directly in the
hash table, each hash table entry contains a
reference to a data structure, e.g. a linked list.
• In the worst case scenario, all items hash to the
same value . Thus we store them in the data
structure ( linked list ).
57
Open addressing

58
• The idea is to keep a list of all elements that hash to
the same value.
– The array elements are pointers to the first nodes of the
lists.
– A new item is inserted to the front of the list.
• Advantages:
– Better space utilization for large items.
– Simple collision handling: searching linked list.
– Overflow: we can store more items than the hash table
size.
– Deletion is quick and easy: deletion from the linked list.
Separate Chaining

59
Disadvantages of Separate Chaining
• Parts of the array might never be used.
• As chains get longer, search time increases
to O(n) in the worst case.
• Constructing new chain nodes is relatively
expensive.
• Is there a way to use the “unused” space
in the array instead of using chains to
make more space?

60
0
1
2
3
4
5
6
7
8
9
0
81 1
64 4
25
36 16
49 9
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10.

SEPARATE CHAINING
• In our example, we use a linked list:
• keys: 5, 17, 37, 20, 42, 3, 11
61

62
Applications of Hashing
• Compilers use hash tables to keep track of declared
variables
• A hash table can be used for on-line spelling checkers — if
misspelling detection (rather than correction) is important,
an entire dictionary can be hashed and words checked in
constant time
• Game playing programs use hash tables to store seen
positions, thereby saving computation time if the position
is encountered again
• Hash functions can be used to quickly check for inequality
— if two elements hash to different values they must be
different

LECT 10, 11-DSALGO(Hashing).pdf

More Related Content

What's hot

Similar to LECT 10, 11-DSALGO(Hashing).pdf

More from MuhammadUmerIhtisham

Recently uploaded

LECT 10, 11-DSALGO(Hashing).pdf