The document summarizes three string matching algorithms: Knuth-Morris-Pratt algorithm, Boyer-Moore string search algorithm, and Bitap algorithm. It provides details on each algorithm, including an overview, inventors, pseudocode, examples, and explanations of how they work. The Knuth-Morris-Pratt algorithm uses information about the pattern string to skip previously examined characters when a mismatch occurs. The Boyer-Moore algorithm uses preprocessing of the pattern to calculate shift amounts to skip alignments. The Bitap algorithm uses a bit array and bitwise operations to efficiently compare characters.
2. What is String Matching?
• Checking whether two or more strings are
same or not.
• Finding a string (pattern) into another string
(text). Looking for substring
Text ATGCTTATCG
Pattern ATC
6. Knuth–Morris–Pratt algorithm
Outline of the Algorithm
• The Knuth–Morris–Pratt string searching
algorithm (or KMP algorithm) searches for
occurrences of a "word" W within a main "text
string" S by employing the observation that
when a mismatch occurs.
7. Knuth–Morris–Pratt algorithm
Outline of the Algorithm
• The word itself embodies sufficient
information to determine where the next
match could begin.
• Thus bypassing re-examination of previously
matched characters.
8. Knuth–Morris–Pratt algorithm
Worked example
• Let, W = "ABCDABD" and
S = "ABC ABCDAB ABCDABCDABDE".
• At any given time, the algorithm is in a state
determined by two integers:
– m, denoting the position within S where the
prospective match for W begins,
– i, denoting the index of the currently considered
character in W.
10. Knuth–Morris–Pratt algorithm
Worked example
• We proceed by comparing successive
characters of W to "parallel" characters of S,
moving from one to the next if they match.
• In the fourth step, we get S[3] = ' ' and W[3] =
'D', a mismatch.
12. Knuth–Morris–Pratt algorithm
Worked example
• Hence, having checked all those characters
previously, we know that there is no chance of
finding the beginning of a match if we check
them again.
25. Boyer–Moore string search
Algorithm
Some Definitions Required
• S[i] refers to the character at index i of
string S, counting from 1.
• S[i..j] refers to the substring of string S starting
at index i and ending at j, inclusive.
• A prefix of S is a substring S[1..i] for some i in
range [1, n], where n is the length of S.
26. Boyer–Moore string search
Algorithm
Some Definitions Required
• A suffix of S is a substring S[i..n] for some i in
range [1, n], where n is the length of S.
• The string to be searched for is called
the pattern and is referred to with symbol P.
• The string being searched in is called
the text and is referred to with symbol T.
27. Boyer–Moore string search
Algorithm
Some Definitions Required
• The length of P is n.
• The length of T is m.
• An alignment of P to T is an index k in T such
that the last character of P is aligned with
index k of T.
• A match or occurrence of P occurs at an
alignment if P is equivalent to T[(k-n+1)..k].
28. Boyer–Moore string search
Algorithm
Explanation
The Boyer-Moore algorithm searches for
occurrences of P in T by performing explicit
character comparisons at different
alignments. Instead of a brute-force search of
all alignments (of which there are m - n + 1),
Boyer-Moore uses information gained by
preprocessing P to skip as many alignments as
possible.
29. Boyer–Moore string search
Algorithm
Explanation
The algorithm begins at alignment k = n,
so the start of P is aligned with the start of T.
Characters in P and T are then compared
starting at index n in P and k in T , moving
backward: the strings are matched from the
end of P to the start of P.
30. Boyer–Moore string search
Algorithm
Explanation
The comparisons continue until either the
beginning of P is reached (which means there
is a match)
Or a mismatch occurs upon which the
alignment is shifted to the right according to
the maximum value permitted by a number
of rules.
31. Boyer–Moore string search
Algorithm
Explanation
The comparisons are performed again at
the new alignment, and the process repeats
until the alignment is shifted past the end
of T, which means no further matches will be
found.
The shift rules are implemented as
constant-time table lookups, using tables
generated during the preprocessing of P.
32. Boyer–Moore string search
Algorithm
Explanation
Shift Rules
A shift is calculated by applying two rules:
the bad character rule and the good suffix
rule. The actual shifting offset is the maximum
of the shifts calculated by these rules.
33. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Bad Character Rule
The idea of Bad Character Rule is to shift P
more than 1 character when possible.
For each character x, let R(x) be the position
of the right-most occurrence of character x in
P.
34. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Bad Character Rule
R(x) is defined to be zero if x does not occur in
P.
Time to construct table R: O(n) – length of P.
Space used by R: O(|∑|)
Access time of R: O(1)
36. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Bad Character Rule
In a particular alignment of P against T
Let The rightmost n-i characters of P match the
corresponding characters in T and the character
P(i) does not match with T(k). Let the rightmost
position of character T(k) in P, R(T(k)), be j.
40. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: Suppose for a given alignment
of P and T, a substring t of T matches a suffix
of P, but a mismatch occurs at the next
comparison to the left.
T=
P=
t
G A A A G A A
A T G G C A A T T G G A A A G A A T T G A T
41. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: Then find, if it exists, the right-most
copy t' of t in P such that t' is not a suffix of P and the
character to the left of t' in P differs from the
character to the left of t in P.
T=
P=
t’ t
A T G G C A A T T G G A A A G A A T T G A T
G A A A G A A
42. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: Shift P to the right so that
substring t' in P aligns with substring t in T.
T=
P=
t’ t
A T G G C A A T T G G A A A G A A T T G A T
G A A A G A A
43. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: Shift P to the right so that
substring t' in P aligns with substring t in T.
T=
P=
t’ t
A T G G C A A T T G G A A A G A A T T G A T
G A A A G A A
44. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: If no such shift is possible, then
shift P by n places to the right.
(Example with different text and pattern)
T=
P=
A T G G C A T G A A G A A A G A A T T G A T
A G A A G A A
45. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: If no such shift is possible, then
shift P by n places to the right.
T=
P=
A T G G C A A T T G G A A A G A A T T G A T
G A A A G A A
46. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: If an occurrence of P is found, then
shift P by the least amount so that a proper prefix of
the shifted P matches a suffix of the occurrence
of P in T.
T=
P=
A T G G C A A T T G G A A A G A A T T G A T
G A A A G A A
47. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: If an occurrence of P is found, then
shift P by the least amount so that a proper prefix of
the shifted P matches a suffix of the occurrence
of P in T.
T=
P=
A T G G C A A T T G G A A A G A A T T G A T
G A A A G A A
48. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: If an occurrence of P is found, then
shift P by the least amount so that a proper prefix of
the shifted P matches a suffix of the occurrence
of P in T.
T=
P=
A T G G C A A T T G G A A A G A A T T G A T
G A A A G A A
49. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: If no such shift is possible, then
shift P by n places, that is, shift P past t.
(Example with different text and pattern)
T=
P=
A T G G C A A T G C G A A A G A A T T G A T
A T G C
50. Boyer–Moore string search
Algorithm
Explanation
Shift Rules: The Good Suffix Rule
Description: If no such shift is possible, then
shift P by n places, that is, shift P past t.
(Example with different text and pattern)
T=
P=
A T G G C A A T G C G A A A G A A T T G A T
A T G C
52. Bitap Algorithm
(for exact string searching)
Inventors
• The bitap algorithm for exact string searching
was invented by Bálint Dömölki in 1964
and
extended by R. K. Shyamasundar in 1977.
53. Bitap Algorithm
(for exact string searching)
Pseudo code
bitap_search(text : string, pattern : string)
m := length(pattern)
if m == 0 return -1
/* Initialize the bit array R. */
R := new array[m+1] of bit, initially all 0
R[0] = 1
54. Bitap Algorithm
(for exact string searching)
Pseudo code
bitap_search(text : string, pattern : string)
for i = 0; i < length(text); i += 1:
/* Update the bit array. */
for k = m; k >= 1; k -= 1:
R[k] = R[k-1] & (text[i] ==
pattern[k-1])
if R[m]: return i - m + 1
return -1
55. Bitap Algorithm
(for exact string searching)
Explanation of the Algorithm
The algorithm begins by pre-computing a set
of bitmasks (bit array) containing one bit for
each element of the pattern and an extra bit.
Then it is able to do most of the work
with bitwise operations, which are extremely
fast.
56. Bitap Algorithm
(for exact string searching)
Explanation of the Algorithm
Initially first position of the bit array contains 1
and all the remaining positions contains 0.
Now, try to update the bit array from end
position to the first position (1st, not 0th) for
every character of the text from start to end.
57. Bitap Algorithm
(for exact string searching)
Explanation of the Algorithm
The current bit array position will set to 1
if, the previous bit array position is 1 and the
text character & the pattern character of the
previous bit array position are same.
58. Bitap Algorithm
(for exact string searching)
Explanation of the Algorithm
Bit_array[current_position]=Bit_array[previous_position]
&
text[i]==pattern[previous_position]
for(i = 0; i < text.size(); i += 1)
for(k = m; k >= 1; k -= 1)
r[k] = r[k-1] & (text[i] == pattern[k-1]);
59. Bitap Algorithm
(for exact string searching)
Explanation of the Algorithm
A match is found when, the contents of the
last position of the bit array becomes 1.
if(Bit_array[last_position])
found a match!
60. Bitap Algorithm
(for exact string searching)
Explanation with an example
The text is: ATTGCAC
The pattern is: TGCA
m = 4 (pattern length)
i= index of the text
r= bit array
Initial bit array is: 1 0 0 0 0
67. Bitap Algorithm
(for exact string searching)
Properties
Due to the data structures required by the
algorithm, it performs best on patterns less than
a constant, and also prefers inputs over a small
alphabet. (Suitable for DNA strings)
It runs in O(mn) operations, no matter the
structure of the text or the pattern.