This document summarizes the seminar on hashing presented by Ms. Manisha Ruchika and Ritika. It discusses what hashing is, how records are inserted, deleted, and searched using hashing techniques. It explains hash functions and methods like division, mid-square, and folding. It describes collisions that can occur during insertion and the two approaches to resolve collisions - open addressing using linear probing, and chaining. It provides examples to illustrate key hashing concepts.
1. SEMINAR ON HASHING
PRESENTED TO: PRESENTED BY:
MS. MANISHA RUCHIKA(6)
RITIKA(11)
MCA 1
S.S.D WOMEN’S INSTITUTE OF TECHNOLOGY, BATHINDA
(AFFILIATED TO PUNJABI UNIVERSITY,PATIALA)
2. CONTENTS
What is hashing?What is hashing?
Inserting ,deleting and searching a recordInserting ,deleting and searching a record
hash functions and their methodshash functions and their methods
CollisionCollision
Two ways of collision resolutionTwo ways of collision resolution
1.1. open addressingopen addressing
2.2. ChainingChaining
3. Hashing are a common
approach to the
storing/searching problem.
A collection of data is
stored ,and each data item
has a key associated with it.
Hashing
4. What is a Hash Table ?
The simplest kind of hash
table is an array of records.
This example has 701
records.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
An array of records
. . .
[ 700]
5. What is a Hash Table ?
Each record has a special
field, called its key.
In this example, the key is a
long integer field called
Number.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
6. What is a Hash Table ?
The number might be a
person's identification
number, and the rest of the
record has information
about the person.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ]
. . .
[ 700]
[ 4 ]
Number 506643548
7. What is a Hash Table ?
When a hash table is in use,
some spots contain valid
records, and other spots are
"empty".
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
8. The general idea of using theThe general idea of using the keykey to determine the address of ato determine the address of a
record is an excellent idea, but it must be modified so that a greatrecord is an excellent idea, but it must be modified so that a great
deal of space is not wasted.deal of space is not wasted.
This modification takes the form of a function H from the set K ofThis modification takes the form of a function H from the set K of
keys in to the set L of memory addresses.keys in to the set L of memory addresses.
H : KH : K L (hash function)L (hash function)
Chop is a technique, in which combine the pieces of key K to formChop is a technique, in which combine the pieces of key K to form
the hash address H(k).the hash address H(k).
key
key1 key2
Key1.1 Key1.2
9. Methods of Hash Functions
Division methodDivision method:- Choose a number m larger than the:- Choose a number m larger than the
number n of keys in k.number n of keys in k.
The hash function H is defined byThe hash function H is defined by
H(k)=k(mod m) or H(k)=k(mod m) +1H(k)=k(mod m) or H(k)=k(mod m) +1
k(mod m) denotes the remainder when k is divided by m.k(mod m) denotes the remainder when k is divided by m.
Second formula is used when we want the hash addressesSecond formula is used when we want the hash addresses
to range from 1 to m. rather than o to m-1to range from 1 to m. rather than o to m-1..
10. Example of division method
68 employees is assigned a unique 4-digit employee68 employees is assigned a unique 4-digit employee
number. Suppose L consists of 100 two-digit addresses.number. Suppose L consists of 100 two-digit addresses.
00,01,02,……99. we apply the division method to each of00,01,02,……99. we apply the division method to each of
the employee number: 3205, 7148, 2345.the employee number: 3205, 7148, 2345.
1. Choose a prime number m close to 99 such as m=97. then1. Choose a prime number m close to 99 such as m=97. then
H(3205)=4 , H(7148)=67 , H(2345)=17.H(3205)=4 , H(7148)=67 , H(2345)=17.
2. H(3205)=4+1=52. H(3205)=4+1=5
H(7148)=67+1=68H(7148)=67+1=68
H(2345)=17+1=18H(2345)=17+1=18
11. Mid square method
The key k is squared. Than the hash function H is definedThe key k is squared. Than the hash function H is defined
byby
H(k)=lH(k)=l
Here l is obtained by deleting digits from both ends of k*k.Here l is obtained by deleting digits from both ends of k*k.
Example:Example:
k: 3205k: 3205 71487148 23452345
k*k: 10 272 025k*k: 10 272 025 51 093 90451 093 904 5 499 0255 499 025
H(k): 72H(k): 72 9393 9999
12. Folding method
The key k is partitioned into a number of parts, k1,k2,The key k is partitioned into a number of parts, k1,k2,
….kr.….kr.
Parts are added together, ignoring the last carry.Parts are added together, ignoring the last carry.
H(k)=k1+k2+……+krH(k)=k1+k2+……+kr
Example:-Example:-
H(3205)=32+05=37H(3205)=32+05=37
H(7148)=71+48=19 (the leading digit 1 in this is ignored).H(7148)=71+48=19 (the leading digit 1 in this is ignored).
H(2345)=23+45=68H(2345)=23+45=68
13. Inserting a New Record
In order to insert a new
record, the key must
somehow be converted to an
array index.
The index is called the hash
value of the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
14. Inserting a New Record
Typical way create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
15. Inserting a New Record
Typical way to create a hash
value:
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
(Number mod 701)
What is (580625685 mod 701) ?
3
16. Inserting a New Record
The hash value is used for
the location of the new
record.
Number 580625685
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
[3]
17. Inserting a New Record
The hash value is used for
the location of the new
record.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
18. Collisions
Here is another new record
to insert, with a hash value
of 2.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
My hash
value is [2].
19. Collisions
This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision
occurs,
move forward until you
find an empty spot.
20. Collisions
This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision
occurs,
move forward until you
find an empty spot.
21. Collisions
This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685
Number 701466868
When a collision
occurs,
move forward until you
find an empty spot.
22. Collisions
This is called a collision,
because there is already
another valid record at [2].
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
The new record goes
in the empty spot.
23. Searching for a Key
The data that's attached to a
key can be found fairly
quickly.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
24. Searching for a Key
Calculate the hash value.
Check that location of the array
for the key.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.
25. Searching for a Key
Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.
26. Searching for a Key
Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Not me.
27. Searching for a Key
Keep moving forward until you
find the key, or you reach an
empty spot.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Yes!
28. Searching for a Key
When the item is found, the
information can be copied to
the necessary location.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Number 701466868
My hash
value is [2].
Yes!
29. Deleting a Record
Records may also be deleted from a hash table.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 506643548Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Please
delete me.
30. Deleting a Record
Records may also be deleted from a hash table.
But the location must not be left as an ordinary
"empty spot" since that could interfere with searches.
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
31. Deleting a Record
[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]
Number 233667136Number 281942902
Number 155778322
. . .
Number 580625685 Number 701466868
Records may also be deleted from a hash table.
But the location must not be left as an ordinary
"empty spot" .
The location must be marked in some special way so
that a search can tell that the spot used to have
something in it.
32. COLLISION RESOLUTION
There are two general ways of resolving collisions.There are two general ways of resolving collisions.
The particular procedure that one chooses depends on manyThe particular procedure that one chooses depends on many
factors. One important factor is the ratio of the number n of keysfactors. One important factor is the ratio of the number n of keys
in K(which is the number of the record in F) to the number m ofin K(which is the number of the record in F) to the number m of
hash addresses in L. this ratio, = n/m is called load factor.hash addresses in L. this ratio, = n/m is called load factor.
n= number of filled records.n= number of filled records.
m= total number of memory locations.m= total number of memory locations.
33. The efficiency ofThe efficiency of hash functionhash function with a collision resolutionwith a collision resolution
procedure is measured by the average number ofprocedure is measured by the average number of
probes(key comparisons) needed to find the location of theprobes(key comparisons) needed to find the location of the
record with a given key k.the efficiency depends mainly onrecord with a given key k.the efficiency depends mainly on
the load factor . The following two quantities:the load factor . The following two quantities:
S( )S( ) = average number of probes for a successful search.= average number of probes for a successful search.
U( ) = average number of probes for an unsuccessful search.U( ) = average number of probes for an unsuccessful search.
34. Open addressing
If the home slot for the record that is being inserted is alreadyIf the home slot for the record that is being inserted is already
occupied, then simply choose a different location with in the tableoccupied, then simply choose a different location with in the table..
But…how do we choose this alternate location?
The technique must be reproducible, and on average be cheap.
35. Linear Probing
Linear probing involves simply walking down the table until anLinear probing involves simply walking down the table until an
empty slot is foundempty slot is found..
36. Drawbacks of linear probing
The major drawbacks of linear probing is that, as the tableThe major drawbacks of linear probing is that, as the table
becomes about half full, here is a tendency towardbecomes about half full, here is a tendency toward
clusteringclustering..
The sequential searches needed to find an empty positionThe sequential searches needed to find an empty position
become longer and longer.become longer and longer.
Example:- if a new insertion hashes to location b, then it willExample:- if a new insertion hashes to location b, then it will
go there, but if it hashes to location a, then it will also gogo there, but if it hashes to location a, then it will also go
into b.into b.
a b c d e
37. The problem of clustering is essentially one ofThe problem of clustering is essentially one of instabilityinstability..
If a few keys happen randomly to be near each other, thenIf a few keys happen randomly to be near each other, then
it becomes more and more likely that other keys will joinit becomes more and more likely that other keys will join
them, and the distribution will become progressively morethem, and the distribution will become progressively more
unbalancedunbalanced..
38. Avoid the problem of
clustering
Quadratic probing:-Quadratic probing:- suppose a record R with key k has thesuppose a record R with key k has the
hash address H(k)=h. then, instead of searching the locationshash address H(k)=h. then, instead of searching the locations
with addresses h, h+1, h+2,…. We linearly search thewith addresses h, h+1, h+2,…. We linearly search the
locations with addresses.locations with addresses.
h, h+1, h+4, h+9, h+16,….h+i^2…h, h+1, h+4, h+9, h+16,….h+i^2…
If the number m of locations in the table T is a prime number,If the number m of locations in the table T is a prime number,
then the above sequence will access half of the locations in T.then the above sequence will access half of the locations in T.
39. Double Hashing
A second hash function H’ is used for resolving aA second hash function H’ is used for resolving a
collision. Suppose a record R with key k has the hashcollision. Suppose a record R with key k has the hash
addresses H(k)=h and H’(k)= h’ m then we linearlyaddresses H(k)=h and H’(k)= h’ m then we linearly
search the locations with addresses.search the locations with addresses.
h, h+h’, h+2h’, h+3h’,….h, h+h’, h+2h’, h+3h’,….
if m is a prime number , then the above sequence all theif m is a prime number , then the above sequence all the
locations in the table T.locations in the table T.
40. Chaining
Design the table so that each slot is actually a container that can hold multiple records.
Here, the “chains" are linked lists which could hold any number of colliding records.
Alternatively each table slot could be large enough to store several records directly…in that
case the slot may overflow, requiring a fallback…
This lecture illustrates hash tables, using open addressing.
Before this lecture, students should have seen other forms of a Dictionary, where a collection of data is stored, and each data item has a key associated with it.
This lecture introduces hash tables, which are an array-based method for implementing a Dictionary. You should recall that we have seen dictionaries implemented in other ways, for example with a binary search tree. The abstract properties of a dictionary remain the same: We can insert items in the dictionary, and each item has a key associated with it. When we want to retrieve an item, we specify only the key, and the retrieval process finds the associated data.
What we do now is use an array to implement the dictionary. The array is an array of records. In this example, we could store up to 701 records in the array.
Each record in the array contains two parts. The first part is a number that we'll use for the key of the item. We could use something else for the keys, such as a string. But for a hash table, numbers make the most convenient keys.
The numbers might be identification numbers of some sort, and the rest of the record contains information about a person. So the pattern that you see here is the same pattern that you've seen in other dictionaries: Each entry in the dictionary has a key (in this case an identifying number) and some associated data.
When a hash table is being used as a dictionary, some of the array locations are in use, and other spots are "empty", waiting for a new entry to come along.
Oftentimes, the empty spots are identified by a special key. For example, if all our identification numbers are positive, then we could use 0 as the Number that indicates an empty spot.
With this drawing, locations [0], [4], [6], and maybe some others would all have Number=0.
In order to insert a new entry, the key of the entry must somehow be converted to an index in the array. For our example, we must convert the key number into an index between 0 and 700. The conversion process is called hashing and the index is called the hash value of the key.
There are many ways to create hash values. Here is a typical approach.
a. Take the key mod 701 (which could be anywhere from 0 to 700).
So, quick, what is (580,625,685 mod 701) ?
Three.
So, this new item will be placed at location [3] of the array.
The hash value is always used to find the location for the record.
Sometimes, two different records might end up with the same hash value.
This is called a collision.
When a collision occurs, the insertion process will move forward through the array until an empty spot is found. Sometimes you will have a second collision...
...and a third collision...
But if there are any empty spots, eventually you will reach an empty spot, and the new item is inserted here.
The new record is always placed in the first available empty spot, after the hash value.
It is fairly easy to search for a particular item based on its key.
Start by computing the hash value, which is 2 in this case. Then check location 2. If location 2 has a different key than the one you are looking for, then move forward...
...if the next location is not the one we are looking for, then keep moving forward...
Keep moving forward until you find the sought-after key...
In this case we find the key at location [5].
The data from location [5] can then be copied to to provide the result of the search function.
What happens if a search reaches an empty spot? In that case, it can
halt and indicate that the key was not in the hash table.
Records can be deleted from a hash table...
But the spot of the deleted record cannot be left as an ordinary empty spot, since that would interfere with searches. (Remember that a search can stop when it reaches an empty spot.)
Instead we must somehow mark the location as "a location that used to have something here, but no longer does."
We might do this by using some other special value for the Number field of the record.
In any case, a search can not stop when it reaches "a location that used to have something here". A search can only stop when it reaches a true empty spot.