2. HASHING
Hashing is the transformation of a string of
characters into a usually shorter fixed-length value
or key that represents the original string. Hashing is
used to index and retrieve items in a database
because it is faster to find the item using the shorter
hashed key than to find it using the original value.
Hashing is nothing but by implementing a hash
table.
Hashing is a technique used for performing
insertions, deletions and finds in constant average
time
Meenakshi(M.tech)
3. Hash table
• Hash table ADT which supports only a subset of the operations
allowed by binary search trees.
• Hash table structure is merely an array of some fixed size,
containing the items.
Key could be an integer, a string, etc.
e.g. a name or Id that is a part of a large employee structure
• The size of the array is TableSize.
• The items that are stored in the hash table are indexed by values
from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to TableSize
– 1.
• The mapping is called a hash function.
Meenakshi(M.Tech)
4. Hash Function
A hash function is used to map a dictionaries pairs
into the positions of a hash table.
A bucket array is used for a hash table.
The hash function:
must be simple to compute.
must distribute the keys evenly among the cells.
If we know which keys will occur in advance we can
write perfect hash functions
Meenakshi(M.Tech)
5. Hash functions
If the input keys are integers then simply Key mod
TableSize is a general strategy.
Unless key happens to have some undesirable
properties.
e.g. all keys end in 0 and we use mod 10
If the keys are strings, hash function needs more care.
First convert it into a numeric value.
Meenakshi(M.Tech)
6. Hash functions
Some of the methods are used for creating a
hash function:
Division method
Multiplication method
Truncation method
Folding method
Extraction method
Meenakshi(M.Tech)
7. Hash Function Methods
Division method:
Map a key k into one of m slots by taking the
remainder of k divided by m.
h(k)= k mod m
Ex: If table size m=12, key k=100 then
h(100) = 100 mod 12
=4
Truncation method:
Ignore some digits of the key and use the rest
as the array index.
Ex: Student ID’s are key 123456789 then select 9
digit number select just the 4th and 8th position digits
as the index is i.e. 48 as the index.
Meenakshi(M.Tech)
8. Hash Function Methods
Folding method:
Partition the key into several pieces, i.e. two or
three or more pieces. Each of the individual parts is
combined using any of the basic operations such as
addition or multiplication
Ex: consider a number 123456. partition the number
12|34|56. Add these three parts 12+34+56=112 index of the
hash table to store the number 123456.
Extraction method:
In this method computing the address uses only a
part of the key. From a student id 928324312 the method
may use first four digits, 9283, or the last four digits, 4312,
or the combination of the first two and last two digits 9212
Meenakshi(M.Tech)
9. 9
Hash function for strings:
a l ikey
KeySize = 3;
98 108 105
hash(“ali”) = (105 * 1 + 108*37 + 98*372) % 10,007 = 8172
0 1 2 i
key[i]
hash
function
ali
……
……
0
1
2
8172
10,006 (TableSize)
“ali”
Meenakshi(M.Tech)
10. Collision Resolution
• If, when an element is inserted, it hashes to the
same value as an already inserted element, then
we have a collision and need to resolve it.
• There are several methods for dealing with
this:
– Separate chaining
– Open addressing
• Linear Probing
• Quadratic Probing
• Double Hashing
Meenakshi(M.Tech)
12. Collision Resolution
The most popular techniques for collision resolution are:
Separate Chaining
Open addressing
Separate Chaining:
The hash table is implemented as an array of linked
list.
The idea is to keep a list of all elements that hash to
the same value.
The array elements are pointers to the first nodes of the
lists.
A new item is inserted to the front of the list.
Meenakshi(M.Tech)
13. Collision Resolution
Advantages of Separate Chaining:
• Better space utilization for large items.
• Simple collision handling: searching linked list.
• Overflow: we can store more items than the hash
table size.
• Deletion is quick and easy: deletion from the linked
list.
• It Is a simple and efficient method
• Table size need not be a prime number
Meenakshi(M.Tech)
14. Operations
• Initialization: all entries are set to NULL
• Find:
– locate the cell using hash function.
– sequential search on the linked list in that cell.
• Insertion:
– Locate the cell using hash function.
– (If the item does not exist) insert it as the first item in
the list.
• Deletion:
– Locate the cell using hash function.
– Delete the item from the linked list.
Meenakshi(M.Tech)
15. Example
Let us take some keys 23, 13, 21, 14, 7, 8, 15 in the
same order, in a hash table of size 7 using separate
chaining with the hash function.
h(23)=23%7=2
h(13)=13%7=6
h(21)=21%7=0
h(14)=14%7=0(collision)
h(7) =7%7 =0(collision)
h(8) =8%7 =1
h(15)=15%7=1(collision)
Meenakshi(M.Tech)
16. Open Addressing
In an open addressing hashing system, all the data go inside
the table.
Thus, a bigger table is needed.
Generally the load factor should be below 0.5.
If a collision occurs, alternative cells are tried until an empty
cell is found.
There are three common collision resolution strategies:
Linear Probing
Quadratic probing
Double hashing
Meenakshi(M.Tech)
17. Linear probing is s technique for resolving hash collisions of
values of hash function.
In which an interval between probes will be fixed to 1
h(x)= x mod 10
Quadratic probing: In which an interval between probes is
increased by adding the successive outputs of a quadratic
polynomial to the starting value given by the hash function.
Quadratic Probing eliminates primary clustering problem of
linear probing.
h(key)=(h(key)+1)mod 10
Collision Resolution
Meenakshi(M.Tech)