hashing in data structures and its applications

Introduction
• Hashing is an important data structure designed to solve the problem of
efficiently finding and storing data in an array. For example, if you
have a list of 20000 numbers, and you have given a number to search
in that list- you will scan each number in the list until you find a match.
• Hashing is mainly used to overcome the drawbacks of linear sets
&binary sets.
• O(n)-time complexity of linear search
• O(logn)-time complexity for binary search
• O(1)-hashing

What is hashing?
• Hashing is a popular technique in computer science that involves mapping large data sets
to fixed-length values. It is a process of converting a data set of variable size into a data
set of a fixed size.
• It is also known as the message digest function.
• It is a technique that uniquely identifies a specific item from a collection of similar items.
• Hashing is a technique in which we can perform searching operation in a constant
computing time.

Static hashing
1.Hash tables
2.Hash functions

Hash tables
• Hash table is data structure(array) which contains fixed cells where
each cell contains key & value.
• It can be defined as a bucket where the data are stored in an
array format. These data have their own index value. If the
index values are known then the process of accessing the
data is quicker.

Hash function
• The mathematical function to be applied on keys to obtain
indexes for their corresponding values into the Hash Table.
• Hash Function can be defined as an algorithm or a function
that is used to map or convert data of bigger size or length
to a fixed or small index or hash value
• The hashing techniques in the data structure are very
interesting, such as:
• hash = hashfunc(key)
• index = hash % array_size

Hash functions
1.Division Method
2.Mid Square Method
3.Folding Method(pairing)
4.Multiplication Method

Division Method
• By performing module operation we have to insert record into a hash
table.
• Say that we have a Hash Table of size 'S', and we want to
store a (key, value) pair in the Hash Table. The Hash
Function, according to the Division method, would
be:H(key) = key mod M
• Here M is an integer value used for calculating the Hash
value, and M should be greater than S. Sometimes, S is used
as M.

Size of the Hash Table = 5 (M, S)
Key: Value pairs: {10: "Sudha", 11: "Venkat", 12: "Jeevani"}
For every pair:
{10: "Sudha"}
Key mod M = 10 mod 5 = 0
{11: "Venkat"}
{12: "Jeevani"}

Mid square method
• It is a two-step process of computing the Hash value. Given a {key:
value} pair, the Hash Function would be calculated by:
1.Square the key -> key * key
2.Choose some digits from the middle of the number to obtain the Hash
value.
• Suppose the size of the Hash Table is 10 and the key: value pairs are:
{10: "Sudha, 11: "Venkat", 12: "Jeevani"}
Number of digits to be selected: Indexes: (0 - 9), so 1
H(10) = 10 * 10 = 100 = 0
H(11) = 11 * 11 = 121 = 2

3. Folding Method
• Given a {key: value} pair and the table size is 100 (0 - 99
indexes), the key is broken down into 2 segments each
except the last segment. The last segment can have less
number of digits. Now, the Hash Function would be:
• H(x) = (sum of equal-
sized segments) mod (size of the Hash Table)
• For suppose "k" is a 10-digit key and the size of the table is
100(0 - 99), k is divided into:
sum = (k1k2) + (k3k4) + (k5k6) + (k7k8) + (k9k10)
Now, H(x) = sum % 100

Let us now take an example:
The {key: value} pairs: {1234: "Sudha", 5678: "Venkat"}
Size of the table: 100 (0 - 99)
For {1234: "Sudha"}:
1234 = 12 + 34 = 46
46 % 100 = 46
For {5678: "Venkat"}:
5678 = 56 + 78 = 134
134 % 100 = 34

Multiplication method
1.We must choose a constant between 0 and 1, say, A.
2.Multiply the key with the chosen A.
3.Now, take the fractional part from the product and multiply it by the
table size.
4.The Hash will be the floor (only the integer part) of the above result.
• So, the Hash Function under this method will be:
H(x) = floor(size(key*A mod 1))

For example:
{Key: value} pairs: {1234: "Sudha", 5678: "Venkat"}
Size of the table: 100
A = 0.56
For {1234: "Sudha"}:
H(1234) = floor(size(1234*0.56 mod 1))
= floor(100 * 0.04)
= floor(4) = 4
For {5678: "Venkat"}:
H(5678) = floor(size(5678*0.56 mod 1))
= floor(99 * 0.68)
= floor(67.32)
= 67

Collision in hashing
1.in this, the hash function is used to compute the index of the array.
2.The hash value is used to store the key in the hash table, as an index.
3.The hash function can return the same hash value for two or more keys.
4.When two or more keys are given the same hash value, it is called
a collision. To handle this collision, we use collision resolution techniques.

Collision resolution techniques
1.Separate chaining (open hashing)
2.Open addressing (closed hashing)->
1.Linear probing
2.Quadratic probing
3.Double hashing

Separate chaining technique
• Means maintaining linked lists for storing records
• In separate chaining technique, each bucket in hash table is associated with
a linked list or some other data structure that can store multiple elements.
• When a collision occurs, the colliding elements are added to the linked list
associated with that bucket.
• The key and its corresponding value are stored as a node in list

Separate chaining technique
Advantages
• There is no problem in overflow & collision.
Disadvantages
It requires more memory in order to store the data

Open Addressing
• In open addressing, when a collision occurs, the hash table’s slots
themselves are used to store the colliding elements. If a slot is already
occupied, the algorithm probes or searches for the next available slot
in a predetermined manner.

Linear probing
• In linear probing, if a collision occurs at a specific slot, the algorithm sequentially
searches for the next available (empty) slot by incrementing the index one by one
until it finds an empty slot.
• - The probing sequence is defined by the linear function:
• ‘hash(key) + i’
• where ‘hash(key)’ is the original hash value and ‘I’ is the probe number.

Drawbacks of linear probing
1.The main problem is primary clustering(most of the records stored in
a single cluster).
2.It takes too much time to find an empty slot.

Quadratic probing
• In this, when the collision occurs, we probe for i2th
slot in ith
iteration,
and this probing is performed until an empty slot is found.
• The cache performance in quadratic probing is lower than the linear
probing. Quadratic probing also reduces the problem of clustering.
• Drawbacks
Secondary clustering-whenever the half of the hash table is full it is
very difficult to find the location of the new element.

Double hashing
• H(K)=K%10
• H2(x) = P - (x%P), where P is a prime number smaller than N.
• It takes longer to determine two hash functions. The double probing
gives the very poor the cache performance, but there has no clustering
problem in it.

Dynamic hashing
• The drawback of static hashing is that it does not expand or shrink
dynamically as the size of the database grow or shrink.
• In dynamic hashing data buckets grow or shrinks(added or removed
dynamically) as records increases or decreases.
• Dynamic hashing is also called extended hashing.

Dynamic hashing/extended hashing
• It is a dynamic hashing method where in directories &buckets are used to
hash the data.
• It is aggressively flexible method in which the hash function also
experiences dynamic change.
• Directories store addresses of the buckets in pointer and id assigned to
each directory. Which may change each time when directory expansion
takes place.
• Buckets are used to hash the actual data.

Frequently used terms in Extendible Hashing:
• Directories: These containers store pointers to buckets. Each directory is given a unique id which may change
each time when expansion takes place.
• Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain more than one pointers
to it if its local depth is less than the global depth.
• Global Depth: It is associated with the Directories. They denote the number of bits which are used by the hash
function to categorize the keys. Global Depth = Number of bits in directory id.
• Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is associated with the
buckets and not the directories. Local depth in accordance with the global depth is used to decide the action that
to be performed in case an overflow occurs. Local Depth is always less than or equal to the Global Depth.

Frequently used terms in Extendible Hashing:
• Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then the
bucket is split into two parts.
• Directory Expansion: Directory Expansion Takes place when a bucket overflows. Directory
Expansion is performed when the local depth of the overflowing bucket is equal to the global depth.

hashing in data structures and its applications

hashing in data structures and its applications

More Related Content

What's hot

Similar to hashing in data structures and its applications

Recently uploaded

hashing in data structures and its applications