1. Semi-local string comparison:
Algorithmic techniques and applications
Alexander Tiskin
Department of Computer Science
University of Warwick
http://go.warwick.ac.uk/alextiskin
Alexander Tiskin (Warwick) Semi-local string comparison 1 / 132
2. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 2 / 132
3. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 3 / 132
4. Introduction
String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
Applications: computational biology, image recognition, . . .
Alexander Tiskin (Warwick) Semi-local string comparison 4 / 132
5. Introduction
String matching: finding an exact pattern in a string
String comparison: finding similar patterns in two strings
Applications: computational biology, image recognition, . . .
Standard types of string comparison:
global: whole string vs whole string
local: substrings vs substrings
Main focus of this work:
semi-local: whole string vs substrings; prefixes vs suffixes
Closely related to approximate string matching (no relation to
approximation algorithms!)
Main tool: implicit unit-Monge matrices (a.k.a. seaweed matrices)
Alexander Tiskin (Warwick) Semi-local string comparison 4 / 132
6. Introduction
Terminology and notation
x− = x − 1
2 x+ = x + 1
2
Integers: {. . . − 2, −1, 0, 1, 2, . . .}
Half-integers: . . . − 3 , − 1 , 1 , 2 , 2 , . . . = . . . (−2)+ , (−1)+ , 0+ , 1+ , 2+
2 2 2
3 5
(i, j) (i , j ) iff i < i and j < j (i, j) (i , j ) iff i > i and j < j
A permutation matrix is a 0/1 matrix with exactly one nonzero per row
and per column
0 1 0
1 0 0
0 0 1
Alexander Tiskin (Warwick) Semi-local string comparison 5 / 132
8. Introduction
Terminology and notation
Given matrix D, its distribution matrix is made up of -dominance sums:
Given matrix E , its density matrix is made up of quadrangle differences:
E (ˆ, ) = E (ˆ− , + ) − E (ˆ− , − ) − E (ˆ+ , + ) + E (ˆ+ , − )
ı ˆ ı ˆ ı ˆ ı ˆ ı ˆ
where D Σ , E over integers; D, E over half-integers
Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
9. Introduction
Terminology and notation
Given matrix D, its distribution matrix is made up of -dominance sums:
Given matrix E , its density matrix is made up of quadrangle differences:
E (ˆ, ) = E (ˆ− , + ) − E (ˆ− , − ) − E (ˆ+ , + ) + E (ˆ+ , − )
ı ˆ ı ˆ ı ˆ ı ˆ ı ˆ
where D Σ , E over integers; D, E over half-integers
Σ 0 1 2 3 0 1 2 3
0 1 0 0 1 1 2 0 1 1 2 0 1 0
1 0 0 =
0 0 0 1 = 1 0 0
0 0 0 1
0 0 1 0 0 1
0 0 0 0 0 0 0 0
Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
10. Introduction
Terminology and notation
Given matrix D, its distribution matrix is made up of -dominance sums:
Given matrix E , its density matrix is made up of quadrangle differences:
E (ˆ, ) = E (ˆ− , + ) − E (ˆ− , − ) − E (ˆ+ , + ) + E (ˆ+ , − )
ı ˆ ı ˆ ı ˆ ı ˆ ı ˆ
where D Σ , E over integers; D, E over half-integers
Σ 0 1 2 3 0 1 2 3
0 1 0 0 1 1 2 0 1 1 2 0 1 0
1 0 0 =
0 0 0 1 = 1 0 0
0 0 0 1
0 0 1 0 0 1
0 0 0 0 0 0 0 0
(D Σ ) = D for all D
Matrix E is simple, if (E )Σ = E ; equivalently, if it has all zeros in the left
column and bottom row
Alexander Tiskin (Warwick) Semi-local string comparison 6 / 132
11. Introduction
Terminology and notation
Matrix E is Monge, if E is nonnegative
Intuition: boundary-to-boundary distances in a (weighted) planar graph
Matrix E is unit-Monge, if E is a permutation matrix
Intuition: boundary-to-boundary distances in a grid-like graph
Alexander Tiskin (Warwick) Semi-local string comparison 7 / 132
12. Introduction
Terminology and notation
Matrix E is Monge, if E is nonnegative
Intuition: boundary-to-boundary distances in a (weighted) planar graph
Matrix E is unit-Monge, if E is a permutation matrix
Intuition: boundary-to-boundary distances in a grid-like graph
Simple unit-Monge matrix: P Σ , where P is a permutation matrix
Seaweed matrix: P used as an implicit representation of P Σ
Σ 0 1 2 3
0 1 0 0
1 0 0 = 1 1 2
0 0 0 1
0 0 1
0 0 0 0
Alexander Tiskin (Warwick) Semi-local string comparison 7 / 132
13. Introduction
Implicit unit-Monge matrices
Efficient P Σ queries: range tree on nonzeros of P [Bentley: 1980]
binary search tree by i-coordinate
under every node, binary search tree by j-coordinate
• • •
• −→ • −→ •
• • •
• • •
↓
• • •
• −→ • −→ •
• • •
• • •
↓
• • •
• −→ • −→ •
• • •
• • •
Alexander Tiskin (Warwick) Semi-local string comparison 8 / 132
14. Introduction
Implicit unit-Monge matrices
Efficient P Σ queries: (contd.)
Every node of the range tree represents a canonical range (rectangular
region), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A P Σ query is equivalent to -dominance counting: how many nonzeros
are -dominated by query point?
Answer: sum up nonzero counts in ≤ log2 n disjoint canonical ranges
Total size O(n log n), query time O(log2 n)
Alexander Tiskin (Warwick) Semi-local string comparison 9 / 132
15. Introduction
Implicit unit-Monge matrices
Efficient P Σ queries: (contd.)
Every node of the range tree represents a canonical range (rectangular
region), and stores its nonzero count
Overall, ≤ n log n canonical ranges are non-empty
A P Σ query is equivalent to -dominance counting: how many nonzeros
are -dominated by query point?
Answer: sum up nonzero counts in ≤ log2 n disjoint canonical ranges
Total size O(n log n), query time O(log2 n)
There are asymptotically more efficient (but less practical) data structures
log n
Total size O(n), query time O log log n [J´J´+: 2004]
a a
[Chan, Pˇtra¸cu: 2010]
a s
Alexander Tiskin (Warwick) Semi-local string comparison 9 / 132
16. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 10 / 132
17. Matrix distance multiplication
Seaweed braids
Distance algebra (a.k.a (min, +) or tropical algebra):
addition ⊕ given by min
multiplication given by +
Matrix -multiplication
A B=C C (i, k) = j A(i, j) B(j, k) = minj A(i, j) + B(j, k)
Matrix classes closed under -multiplication (for given n):
general numerical (integer, real) matrices
Monge matrices
simple unit-Monge matrices (!)
Alexander Tiskin (Warwick) Semi-local string comparison 11 / 132
18. Matrix distance multiplication
Seaweed braids
Recall that simple unit-Monge matrices are represented implicitly by
permutation (seaweed) matrices
Define PA Σ
PB = PC as PA Σ Σ
PB = PC
The seaweed monoid Tn :
simple unit-Monge matrices under
equivalently, permutation (seaweed) matrices under
Also known as the 0-Hecke monoid of the symmetric group H0 (Sn )
Alexander Tiskin (Warwick) Semi-local string comparison 12 / 132
19. Matrix distance multiplication
Seaweed braids
PA PB = PC can be seen as combing of seaweed braids
• • •
• • •
• • •
• • •
• • •
• • •
PA PB PC
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
20. Matrix distance multiplication
Seaweed braids
PA PB = PC can be seen as combing of seaweed braids
• • •
• • •
• • •
• • •
• • •
• • •
PA PB PC
PA
PB
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
21. Matrix distance multiplication
Seaweed braids
PA PB = PC can be seen as combing of seaweed braids
• • •
• • •
• • •
• • •
• • •
• • •
PA PB PC
PA
PB
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
22. Matrix distance multiplication
Seaweed braids
PA PB = PC can be seen as combing of seaweed braids
• • •
• • •
• • •
• • •
• • •
• • •
PA PB PC
PA
PC
PB
Alexander Tiskin (Warwick) Semi-local string comparison 13 / 132
23. Matrix distance multiplication
Seaweed braids
The seaweed monoid Tn :
n! elements (permutations of size n)
n − 1 generators g1 , g2 , . . . , gn−1 (elementary crossings)
Idempotence:
gi2 = gi for all i =
Far commutativity:
gi gj = gj gi j − i > 1 ··· = ···
Braid relations:
gi gj gi = gj gi gj j − i = 1 =
Alexander Tiskin (Warwick) Semi-local string comparison 14 / 132
25. Matrix distance multiplication
Seaweed braids
Related structures:
positive braids: far comm; braid relations
braids: gi gi−1 = 1; far comm; braid relations
Coxeter’s presentation of Sn : gi2 = 1; far comm; braid relations
locally free idempotent monoid: idem; far comm [Vershik+: 2000]
Generalisations:
general 0-Hecke monoids [Fomin, Greene: 1998; Buch+: 2008]
Coxeter monoids [Tsaranov: 1990; Richardson, Springer: 1990]
J -trivial monoids [Denton+: 2011]
Alexander Tiskin (Warwick) Semi-local string comparison 16 / 132
26. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
27. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)
T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0
aa → a bb → b bab → 0 aba → 0
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
28. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)
T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0
aa → a bb → b bab → 0 aba → 0
T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac,
bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0
aa → a ca → ac bab → aba cbac → bcba
bb → b cc → c cbc → bcb abacba → 0
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
29. Matrix distance multiplication
Seaweed braids
Computation in the seaweed monoid: a confluent rewriting system can be
obtained by software (Semigroupe, GAP)
T3 : 1, a = g1 , b = g2 ; ab, ba, aba = 0
aa → a bb → b bab → 0 aba → 0
T4 : 1, a = g1 , b = g2 , c = g3 ; ab, ac, ba, bc, cb, aba, abc, acb, bac,
bcb, cba, abac, abcb, acba, bacb, bcba, abacb, abcba, bacba, abacba = 0
aa → a ca → ac bab → aba cbac → bcba
bb → b cc → c cbc → bcb abacba → 0
Easy to use, but not an efficient algorithm
Alexander Tiskin (Warwick) Semi-local string comparison 17 / 132
30. Matrix distance multiplication
Seaweed matrix multiplication
The implicit unit-Monge matrix -multiplication problem
Given permutation matrices PA , PB , compute PC , such that
Σ Σ Σ
PA PB = PC (equivalently, PA PB = PC )
Alexander Tiskin (Warwick) Semi-local string comparison 18 / 132
31. Matrix distance multiplication
Seaweed matrix multiplication
The implicit unit-Monge matrix -multiplication problem
Given permutation matrices PA , PB , compute PC , such that
Σ Σ Σ
PA PB = PC (equivalently, PA PB = PC )
Matrix -multiplication: running time
type time
general O(n3 ) standard
3 3
O n (log log n)
log2 n
[Chan: 2007]
Monge O(n2 ) via [Aggarwal+: 1987]
implicit unit-Monge O(n1.5 ) [T: 2006]
O(n log n) [T: 2010]
Alexander Tiskin (Warwick) Semi-local string comparison 18 / 132
40. Matrix distance multiplication
Seaweed matrix multiplication
Implicit unit-Monge matrix -multiplication: the algorithm
Σ Σ Σ
PC (i, k) = minj PA (i, j) + PB (j, k)
Divide-and-conquer on the range of j
Divide PA horizontally, PB vertically: two subproblems of effective size n/2
Σ
PA,lo Σ Σ
PB,lo = PC ,lo Σ
PA,hi Σ Σ
PB,hi = PC ,hi
Conquer:
-low nonzeros of PC ,lo and -high nonzeros of PC ,hi appear in PC
The remaining nonzeros of PC ,lo and PC ,hi are “wrong”, and need to be
corrected to obtain the remaining nonzeros of PC
Correction can be done in time O(n) using the unit-Monge property
Overall time O(n log n)
Alexander Tiskin (Warwick) Semi-local string comparison 23 / 132
41. Matrix distance multiplication
Bruhat order
Comparing permutations by the “degree of sortedness”
Bruhat order
Permutation A is lower (“more sorted”) than permutation B in the Bruhat
order (A B), if B can be transformed to A by successive pairwise sorting
between arbitrary pairs of elements.
Permutation matrices: PA PB , if PB can be transformed to PA by
successive submatrix substitution: ( 0 1 )
10 (1 0)
01
Alexander Tiskin (Warwick) Semi-local string comparison 24 / 132
42. Matrix distance multiplication
Bruhat order
Bruhat comparability: running time
O(n2 ) folklore
O(n log n) [T: NEW]
PA PB iff PA ≤ PB elementwise, time O(n2 )
Σ Σ folklore
R R
PA PB iff PA PB = Id , time O(n log n) [T: NEW]
where P R denotes clockwise rotation of matrix P
Alexander Tiskin (Warwick) Semi-local string comparison 25 / 132
43. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 26 / 132
44. Semi-local string comparison
Semi-local LCS and edit distance
Consider strings (= sequences) over an alphabet of size σ
Distinguish contiguous substrings and not necessarily contiguous
subsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close
Alexander Tiskin (Warwick) Semi-local string comparison 27 / 132
45. Semi-local string comparison
Semi-local LCS and edit distance
Consider strings (= sequences) over an alphabet of size σ
Distinguish contiguous substrings and not necessarily contiguous
subsequences
Special cases of substring: prefix, suffix
Notation: strings a, b of length m, n respectively
Assume where necessary: m ≤ n; m, n reasonably close
The longest common subsequence (LCS) score:
length of longest string that is a subsequence of both a and b
equivalently, alignment score, where score(match) = 1 and
score(mismatch) = 0
In biological terms, “loss-free alignment” (unlike “lossy” BLAST)
Alexander Tiskin (Warwick) Semi-local string comparison 27 / 132
46. Semi-local string comparison
Semi-local LCS and edit distance
The LCS problem
Give the LCS score for a vs b
Alexander Tiskin (Warwick) Semi-local string comparison 28 / 132
47. Semi-local string comparison
Semi-local LCS and edit distance
The LCS problem
Give the LCS score for a vs b
LCS: running time
O(mn) [Wagner, Fischer: 1974]
mn
O log2 n σ = O(1) [Masek, Paterson: 1980]
[Crochemore+: 2003]
mn(log log n)2
O log2 n
[Paterson, Danˇ´cık: 1994]
[Bille, Farach-Colton: 2008]
Running time varies depending on the RAM model version
We assume word-RAM with word size log n (where it matters)
Alexander Tiskin (Warwick) Semi-local string comparison 28 / 132
48. Semi-local string comparison
Semi-local LCS and edit distance
LCS on the alignment graph (directed, acyclic)
B A A B C A B C A B A C A blue = 0
B red = 1
A
A
B
C
B
C
A
score(“BAABCBCA”, “BAABCABCABACA”) = len(“BAABCBCA”) = 8
LCS = highest-score path from top-left to bottom-right
Alexander Tiskin (Warwick) Semi-local string comparison 29 / 132
49. Semi-local string comparison
Semi-local LCS and edit distance
LCS: dynamic programming [WF: 1974]
Sweep cells in any -compatible order
Cell update: time O(1)
Overall time O(mn)
Alexander Tiskin (Warwick) Semi-local string comparison 30 / 132
50. Semi-local string comparison
Semi-local LCS and edit distance
‘Begin at the beginning,’ the King said
gravely, ‘and go on till you come to the end:
then stop.’
L. Carroll, Alice in Wonderland
(The standard approach in dynamic
programming)
Alexander Tiskin (Warwick) Semi-local string comparison 31 / 132
51. Semi-local string comparison
Semi-local LCS and edit distance
Sometimes dynamic programming can be run from both ends for extra
flexibility
Alexander Tiskin (Warwick) Semi-local string comparison 32 / 132
52. Semi-local string comparison
Semi-local LCS and edit distance
Sometimes dynamic programming can be run from both ends for extra
flexibility
Is there a better, fully flexible alternative (e.g. for comparing compressed
strings, comparing strings dynamically or in parallel, etc.)?
Alexander Tiskin (Warwick) Semi-local string comparison 32 / 132
53. Semi-local string comparison
Semi-local LCS and edit distance
LCS: micro-block dynamic programming [MP: 1980; BF: 2008]
Sweep cells in micro-blocks, in any -compatible order
Micro-block size:
t = O(log n) when σ = O(1)
log n
t=O log log n otherwise
Micro-block interface:
O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits
O(t) small integers, each O(1) bits
Micro-block update: time O(1), by precomputing all possible interfaces
mn mn(log log n)2
Overall time O log2 n
when σ = O(1), O log2 n
otherwise
Alexander Tiskin (Warwick) Semi-local string comparison 33 / 132
54. Semi-local string comparison
Semi-local LCS and edit distance
The semi-local LCS problem
Give the (implicit) matrix of O (m + n)2 LCS scores:
string-substring LCS: string a vs every substring of b
prefix-suffix LCS: every prefix of a vs every suffix of b
suffix-prefix LCS: every suffix of a vs every prefix of b
substring-string LCS: every substring of a vs string b
Alexander Tiskin (Warwick) Semi-local string comparison 34 / 132
55. Semi-local string comparison
Semi-local LCS and edit distance
The semi-local LCS problem
Give the (implicit) matrix of O (m + n)2 LCS scores:
string-substring LCS: string a vs every substring of b
prefix-suffix LCS: every prefix of a vs every suffix of b
suffix-prefix LCS: every suffix of a vs every prefix of b
substring-string LCS: every substring of a vs string b
Cf.: dynamic programming gives prefix-prefix LCS
Alexander Tiskin (Warwick) Semi-local string comparison 34 / 132
56. Semi-local string comparison
Semi-local LCS and edit distance
Semi-local LCS on the alignment graph
B A A B C A B C A B A C A blue = 0
B red = 1
A
A
B
C
B
C
A
score(“BAABCBCA”, “CABCABA”) = len(“ABCBA”) = 5
String-substring LCS: all highest-score top-to-bottom paths
Semi-local LCS: all highest-score boundary-to-boundary paths
Alexander Tiskin (Warwick) Semi-local string comparison 35 / 132
58. Semi-local string comparison
Score matrices and seaweed matrices
Semi-local LCS: output representation and running time
size query time
O(n2 ) O(1) trivial
O(m1/2 n) O(log n) string-substring [Alves+: 2003]
O(n) O(n) string-substring [Alves+: 2005]
O(n log n) O(log2 n) [T: 2006]
. . . or any 2D orthogonal range counting data structure
running time
O(mn2 ) naive
O(mn) string-substring [Schmidt: 1998; Alves+: 2005]
O(mn) [T: 2006]
mn
O log0.5 n [T: 2006]
mn(log log n)2
O log2 n
[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 37 / 132
59. Semi-local string comparison
Score matrices and seaweed matrices
The score matrix H and the seaweed matrix P
H(i, j): the number of matched characters for a vs substring b i : j
j − i − H(i, j): the number of unmatched characters
Properties of matrix j − i − H(i, j):
simple unit-Monge
therefore, = P Σ , where P = −H is a permutation matrix
P is the seaweed matrix, giving an implicit representation of H
Range tree for P: memory O(n log n), query time O(log2 n)
Alexander Tiskin (Warwick) Semi-local string comparison 38 / 132
63. Semi-local string comparison
Score matrices and seaweed matrices
The score matrix H and the seaweed matrix P
a = “BAABCBCA”
• b = “BAABCABCABACA”
•
H(4, 11) =
• 11 − 4 − P Σ (i, j) =
11 − 4 − 2 = 5
•
•
Alexander Tiskin (Warwick) Semi-local string comparison 40 / 132
64. Semi-local string comparison
Score matrices and seaweed matrices
The seaweed braid in the alignment graph
B A A B C A B C A B A C A a = “BAABCBCA”
B
A b = “BAABCABCABACA”
A H(4, 11) =
B 11 − 4 − P Σ (i, j) =
C 11 − 4 − 2 = 5
B
C
A
P(i, j) = 1 corresponds to seaweed top i bottom j
Alexander Tiskin (Warwick) Semi-local string comparison 41 / 132
65. Semi-local string comparison
Score matrices and seaweed matrices
The seaweed braid in the alignment graph
B A A B C A B C A B A C A a = “BAABCBCA”
B
A b = “BAABCABCABACA”
A H(4, 11) =
B 11 − 4 − P Σ (i, j) =
C 11 − 4 − 2 = 5
B
C
A
P(i, j) = 1 corresponds to seaweed top i bottom j
Also define top right, left right, left bottom seaweeds
Gives bijection between top-left and bottom-right graph boundaries
Alexander Tiskin (Warwick) Semi-local string comparison 41 / 132
66. Semi-local string comparison
Score matrices and seaweed matrices
Seaweed braid: a highly symmetric object (element of the 0-Hecke monoid
of the symmetric group)
Can be built recursively by assembling subbraids from separate parts
Highly flexible: local alignment, compression, parallel computation. . .
Alexander Tiskin (Warwick) Semi-local string comparison 42 / 132
67. Semi-local string comparison
Weighted alignment
The LCS problem is a special case of the weighted alignment score
problem with weighted matches (wM ), mismatches (wX ) and gaps (wG )
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
68. Semi-local string comparison
Weighted alignment
The LCS problem is a special case of the weighted alignment score
problem with weighted matches (wM ), mismatches (wX ) and gaps (wG )
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Alignment score is rational, if wM , wX , wG are rational numbers
Equivalent to LCS score on blown-up strings
Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
69. Semi-local string comparison
Weighted alignment
The LCS problem is a special case of the weighted alignment score
problem with weighted matches (wM ), mismatches (wX ) and gaps (wG )
LCS score: wM = 1, wX = wG = 0
Levenshtein score: wM = 2, wX = 1, wG = 0
Alignment score is rational, if wM , wX , wG are rational numbers
Equivalent to LCS score on blown-up strings
Edit distance: minimum cost to transform a into b by weighted character
edits (insertion, deletion, substitution)
Corresponds to weighted alignment score with wM = 0, insertion/deletion
weight −wG , substitution weight −wX
Alexander Tiskin (Warwick) Semi-local string comparison 43 / 132
70. Semi-local string comparison
Weighted alignment
Weighted alignment graph
B A A B C A B C A B A C A blue = 0
B red (solid) = 2
A red (dotted) = 1
A
B
C
B
C
A
Levenshtein score(“BAABCBCA”, “CABCABA”) = 11
Alexander Tiskin (Warwick) Semi-local string comparison 44 / 132
74. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 48 / 132
75. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 49 / 132
76. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
77. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
78. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
79. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
80. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
81. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
82. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
83. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
84. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
85. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
86. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
87. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
88. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
89. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
90. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
91. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
92. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
93. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
94. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
95. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
96. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
97. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
98. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
99. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
100. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
101. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
102. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
103. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
104. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
105. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
106. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
107. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
108. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
109. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
110. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
111. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
112. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 50 / 132
113. The seaweed method
Seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 51 / 132
114. The seaweed method
Seaweed combing
Semi-local LCS: seaweed combing [T: 2006]
Initialise seaweed braid: crossings in all mismatch cells
Sweep cells in any -compatible order
Match cell: two seaweeds uncrossed; skip
Mismatch cell: two seaweeds cross
if the same seaweeds crossed before, uncross them
otherwise skip, keep seaweeds crossed
Cell update: time O(1)
Overall time O(mn)
Alexander Tiskin (Warwick) Semi-local string comparison 52 / 132
115. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 53 / 132
116. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
117. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
118. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
119. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
120. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
121. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
122. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
123. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
124. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
125. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
126. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
127. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
128. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 54 / 132
129. The seaweed method
Micro-block seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 55 / 132
130. The seaweed method
Micro-block seaweed combing
Semi-local LCS: micro-block seaweed combing [T: 2007]
Initialise seaweed braid: crossings in all mismatch cells
Sweep cells in micro-blocks, in any -compatible order
log n
Micro-block size: t = O log log n
Micro-block interface:
O(t) characters, each O(log σ) bits, can be reduced to O(log t) bits
O(t) integers, each O(log n) bits, can be reduced to O(log t) bits
Micro-block update: time O(1), by precomputing all possible interfaces
mn(log log n)2
Overall time O log2 n
Alexander Tiskin (Warwick) Semi-local string comparison 56 / 132
131. The seaweed method
Cyclic LCS
The cyclic LCS problem
Give the maximum LCS score for a vs all cyclic rotations of b
Alexander Tiskin (Warwick) Semi-local string comparison 57 / 132
132. The seaweed method
Cyclic LCS
The cyclic LCS problem
Give the maximum LCS score for a vs all cyclic rotations of b
Cyclic LCS: running time
mn 2
O log n naive
O(mn log m) [Maes: 1990]
O(mn) [Bunke, B¨hler: 1993; Landau+: 1998; Schmidt: 1998]
u
2
O mn(log 2log n)
log n
[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 57 / 132
133. The seaweed method
Cyclic LCS
Cyclic LCS: the algorithm
mn(log log n)2
Micro-block seaweed combing on a vs bb, time O log2 n
Make n string-substring LCS queries, time negligible
Alexander Tiskin (Warwick) Semi-local string comparison 58 / 132
134. The seaweed method
Longest repeating subsequence
The longest repeating subsequence problem
Find the longest subsequence of a that is a square (a repetition of two
identical strings)
Alexander Tiskin (Warwick) Semi-local string comparison 59 / 132
135. The seaweed method
Longest repeating subsequence
The longest repeating subsequence problem
Find the longest subsequence of a that is a square (a repetition of two
identical strings)
Longest repeating subsequence: running time
O(m3 ) naive
O(m2 ) [Kosowski: 2004]
2 log 2
O m (log 2 m m)
log
[T: 2007]
Alexander Tiskin (Warwick) Semi-local string comparison 59 / 132
136. The seaweed method
Longest repeating subsequence
Longest repeating subsequence: the algorithm
m2 (log log m)2
Micro-block seaweed combing on a vs a, time O log2 m
Make m − 1 suffix-prefix LCS queries, time negligible
Alexander Tiskin (Warwick) Semi-local string comparison 60 / 132
137. The seaweed method
Approximate matching
The approximate pattern matching problem
Give the substring closest to a by alignment score, starting at each
position in b
Assume rational alignment score
Approximate pattern matching: running time
O(mn) [Sellers: 1980]
mn
O log n σ = O(1) via [Masek, Paterson: 1980]
mn(log log n)2
O log2 n
via [Bille, Farach-Colton: 2008]
Alexander Tiskin (Warwick) Semi-local string comparison 61 / 132
138. The seaweed method
Approximate matching
Approximate pattern matching: the algorithm
Micro-block seaweed combing on a vs b (with blow-up), time
2
O mn(log 2log n)
log n
The implicit semi-local edit score matrix:
an anti-Monge matrix
approximate pattern matching ∼ row minima
Row minima in O(n) element queries [Aggarwal+: 1987]
Each query in time O(log2 n) using the range tree representation,
combined query time negligible
mn(log log n)2
Overall running time O log2 n
, same as [Bille, Farach-Colton: 2008]
Alexander Tiskin (Warwick) Semi-local string comparison 62 / 132
140. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 64 / 132
141. Periodic string comparison
Wraparound seaweed combing
The periodic string-substring LCS problem
Give (implicit) LCS scores for a vs each substring of b = . . . uuu . . . = u ±∞
Let u be of length p
May assume that every character of a occurs in u
Only substrings of b of length at most mp (otherwise LCS score is m)
Alexander Tiskin (Warwick) Semi-local string comparison 65 / 132
142. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 66 / 132
143. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
144. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
145. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
146. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
147. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
148. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
149. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
150. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
151. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
152. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
153. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
154. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
155. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
156. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
157. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
158. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
159. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
160. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
161. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
162. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
163. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
164. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
165. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
166. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
167. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
168. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
169. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
170. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
171. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
172. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
173. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
174. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
175. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
176. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
177. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
178. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
179. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
180. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
181. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
182. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 67 / 132
183. Periodic string comparison
Wraparound seaweed combing
B A A B C A B C A B A C A
B
A
A
B
C
B
C
A
Alexander Tiskin (Warwick) Semi-local string comparison 68 / 132
184. Periodic string comparison
Wraparound seaweed combing
Periodic string-substring LCS: Wraparound seaweed combing
Initialise seaweed braid: crossings in all mismatch cells
Sweep cells row-by-row: each row starts at match cell, wraps at boundary
Match cell: two seaweeds uncrossed; skip
Mismatch cell: two seaweeds cross
if the same seaweeds crossed before (with wrapping), uncross them
otherwise skip, keep seaweeds crossed
Cell update: time O(1)
Overall time O(mn)
String-substring LCS score: count seaweeds with multiplicities
Alexander Tiskin (Warwick) Semi-local string comparison 69 / 132
185. Periodic string comparison
Wraparound seaweed combing
The tandem LCS problem
Give LCS score for a vs b = u k
We have n = kp; may assume k ≤ m
Tandem LCS: running time
O(mkp) naive
O m(k + p) [Landau, Ziv-Ukelson: 2001]
O(mp) [T: 2009]
Direct application of wraparound seaweed combing
Alexander Tiskin (Warwick) Semi-local string comparison 70 / 132
186. Periodic string comparison
Wraparound seaweed combing
The tandem alignment problem
Give the substring closest to a by alignment score among certain
substrings of b = u ±∞ :
global: substrings u k of length kp across all k
cyclic: substrings of length kp across all k
local: substrings of any length
Tandem alignment: running time
O(m2 p) all naive
O(mp) global [Myers, Miller: 1989]
O(mp log p) cyclic [Benson: 2005]
O(mp) cyclic [T: 2009]
O(mp) local [Myers, Miller: 1989]
Alexander Tiskin (Warwick) Semi-local string comparison 71 / 132
187. Periodic string comparison
Wraparound seaweed combing
Cyclic tandem alignment: the algorithm
Periodic seaweed combing for a vs b (with blow-up), time O(mp)
For each k ∈ [1 : m]:
solve tandem LCS (under given alignment score) for a vs u k
obtain scores for a vs p successive substrings of b of length kp by LCS
batch query: time O(1) per substring
Running time O(mp)
Alexander Tiskin (Warwick) Semi-local string comparison 72 / 132
189. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 74 / 132
190. The transposition network method
Transposition networks
Comparison network: a circuit of comparators
A comparator sorts two inputs and outputs them in prescribed order
Comparison networks traditionally used for non-branching merging/sorting
Classical comparison networks
# comparators
merging O(n log n) [Batcher: 1968]
sorting O(n log2 n) [Batcher: 1968]
O(n log n) [Ajtai+: 1983]
Alexander Tiskin (Warwick) Semi-local string comparison 75 / 132
191. The transposition network method
Transposition networks
Comparison network: a circuit of comparators
A comparator sorts two inputs and outputs them in prescribed order
Comparison networks traditionally used for non-branching merging/sorting
Classical comparison networks
# comparators
merging O(n log n) [Batcher: 1968]
sorting O(n log2 n) [Batcher: 1968]
O(n log n) [Ajtai+: 1983]
Comparison networks are visualised by wire diagrams
Transposition network: all comparisons are between adjacent wires
Alexander Tiskin (Warwick) Semi-local string comparison 75 / 132
192. The transposition network method
Transposition networks
Seaweed combing as a transposition network
−7
−5
−3
−1
A B C A +1
A +3
+5
C +7
B −7
C −1
+3
−3
−5
+7
+5
+1
Character mismatches correspond to comparators
Inputs anti-sorted (sorted in reverse); each value traces a seaweed
Alexander Tiskin (Warwick) Semi-local string comparison 76 / 132
193. The transposition network method
Transposition networks
Global LCS: transposition network with binary input
0
0
0
0
A B C A 1
A 1
1
C 1
B 0
0
C 1
0
0
1
1
1
Inputs still anti-sorted, but may not be distinct
Comparison between equal values is indeterminate
Alexander Tiskin (Warwick) Semi-local string comparison 77 / 132
194. The transposition network method
Parameterised string comparison
Parameterised string comparison
String comparison sensitive e.g. to
low similarity: small λ = LCS(a, b)
high similarity: small κ = dist LCS (a, b) = m + n − 2λ
Can also use weighted alignment score or edit distance
Assume m = n, therefore κ = 2(n − λ)
Alexander Tiskin (Warwick) Semi-local string comparison 78 / 132
195. The transposition network method
Parameterised string comparison
Low-similarity comparison: small λ
sparse set of matches, may need to look at them all
preprocess matches for fast searching, time O(n log σ)
High-similarity comparison: small κ
set of matches may be dense, but only need to look at small subset
no need to preprocess, linear search is OK
Flexible comparison: sensitive to both high and low similarity, e.g. by both
comparison types running alongside each other
Alexander Tiskin (Warwick) Semi-local string comparison 79 / 132
196. The transposition network method
Parameterised string comparison
Parameterised string comparison: running time
Low-similarity, after preprocessing in O(n log σ)
O(nλ) [Hirschberg: 1977]
[Apostolico, Guerra: 1985]
[Apostolico+: 1992]
High-similarity, no preprocessing
O(n · κ) [Ukkonen: 1985]
[Myers: 1986]
Flexible
O(λ · κ · log n) no preproc [Myers: 1986; Wu+: 1990]
O(λ · κ) after preproc [Rick: 1995]
Alexander Tiskin (Warwick) Semi-local string comparison 80 / 132
198. The transposition network method
Dynamic string comparison
The dynamic LCS problem
Maintain current LCS score under updates to one or both input strings
Both input strings are streams, updated on-line:
appending characters at left or right
deleting characters at left or right
Assume for simplicity m ≈ n, i.e. m = Θ(n)
Goal: linear time per update
O(n) per update of a (n = |b|)
O(m) per update of b (m = |a|)
Alexander Tiskin (Warwick) Semi-local string comparison 82 / 132
199. The transposition network method
Dynamic string comparison
Dynamic LCS in linear time: update models
left right
– app+del standard DP [Wagner, Fischer: 1974]
app app a fixed [Landau+: 1998], [Kim, Park: 2004]
app app [Ishida+: 2005]
app+del app+del [T: NEW]
Main idea:
for append only, maintain seaweed matrix Pa,b
for append+delete, maintain partial seaweed layout by tracing a
transposition network
Alexander Tiskin (Warwick) Semi-local string comparison 83 / 132
200. The transposition network method
Bit-parallel string comparison
Bit-parallel string comparison
String comparison using standard instructions on words of size w
Bit-parallel string comparison: running time
O(mn/w ) [Allison, Dix: 1986; Myers: 1999; Crochemore+: 2001]
Alexander Tiskin (Warwick) Semi-local string comparison 84 / 132
201. The transposition network method
Bit-parallel string comparison
Bit-parallel string comparison: binary transposition network
In every cell: input bits s, c; output bits s , c ; match/mismatch flag µ
c s 0 1 0 1 0 1 0 1
µ ¬ c 0 0 1 1 0 0 1 1
µ 0 0 0 0 1 1 1 1
s s s 0 1 1 1 0 0 1 1
c 0 0 0 1 0 1 0 1
c
c s 0 1 0 1 0 1 0 1
µ
∧ c 0 0 1 1 0 0 1 1
µ 0 0 0 0 1 1 1 1
s + s s 0 1 1 0 0 0 1 1
c 0 0 0 1 0 1 0 1
c
Alexander Tiskin (Warwick) Semi-local string comparison 85 / 132
202. The transposition network method
Bit-parallel string comparison
Bit-parallel string comparison: binary transposition network
In every cell: input bits s, c; output bits s , c ; match/mismatch flag µ
c s 0 1 0 1 0 1 0 1
µ ¬ c 0 0 1 1 0 0 1 1
µ 0 0 0 0 1 1 1 1
s s s 0 1 1 1 0 0 1 1
c 0 0 0 1 0 1 0 1
c
c s 0 1 0 1 0 1 0 1
µ
∧ c 0 0 1 1 0 0 1 1
µ 0 0 0 0 1 1 1 1
s + s s 0 1 1 0 0 0 1 1
c 0 0 0 1 0 1 0 1
c
2c + s ← (s + (s ∧ µ) + c) ∨ (s ∧ ¬µ)
S ← (S + (S ∧ M)) ∨ (S ∧ ¬M), where S, M are words of bits s, µ
Alexander Tiskin (Warwick) Semi-local string comparison 85 / 132
204. 1 Introduction 7 Sparse string comparison
2 Matrix distance multiplication 8 Compressed string comparison
3 Semi-local string comparison 9 Beyond semi-locality
4 The seaweed method 10 Conclusions and future work
5 Periodic string comparison
6 The transposition network
method
Alexander Tiskin (Warwick) Semi-local string comparison 87 / 132
205. Sparse string comparison
Semi-local LCS between permutations
The LCS problem on permutation strings
Give LCS score for a vs b
In each of a, b all characters distinct: total m = n matches
Equivalent to
longest increasing subsequence (LIS) in a string
maximum clique in a permutation graph
maximum planar matching in an embedded bipartite graph
LCS on permutation strings: running time
O(n log n) implicit in [Erd¨s, Szekeres:
o 1935]
[Robinson: 1938; Knuth: 1970; Dijkstra: 1980]
O(n log log n) unit-RAM [Chang, Wang: 1992]
[Bespamyatnikh, Segal: 2000]
Alexander Tiskin (Warwick) Semi-local string comparison 88 / 132
206. Sparse string comparison
Semi-local LCS between permutations
The semi-local LCS problem on permutation strings
Give semi-local LCS scores of a vs b
In each of a, b all characters distinct: total m = n matches
Equivalent to
longest increasing subsequence (LIS) in every substring of a string
Semi-local LCS on permutation strings: running time
O(n2 log n) naive
O(n2 ) restricted [Albert+: 2003; Chen+: 2005]
O(n1.5 log n) randomised, restricted [Albert+: 2007]
O(n1.5 ) [T: 2006]
O(n log2 n) [T: NEW]
Alexander Tiskin (Warwick) Semi-local string comparison 89 / 132
207. Sparse string comparison
Semi-local LCS between permutations
D E H C B A F G
C
F
A
E
D
H
G
B
Alexander Tiskin (Warwick) Semi-local string comparison 90 / 132
208. Sparse string comparison
Semi-local LCS between permutations
D E H C B A F G
C
F
A
E
D
H
G
B
Alexander Tiskin (Warwick) Semi-local string comparison 90 / 132
209. Sparse string comparison
Semi-local LCS between permutations
D E H C B A F G
C
F
A
E
D
H
G
B
Alexander Tiskin (Warwick) Semi-local string comparison 91 / 132
210. Sparse string comparison
Semi-local LCS between permutations
D E H C B A F G
C
F
A
E
D
H
G
B
Alexander Tiskin (Warwick) Semi-local string comparison 91 / 132
211. Sparse string comparison
Semi-local LCS between permutations
D E H C B A F G
C
F
A
E
D
H
G
B
Alexander Tiskin (Warwick) Semi-local string comparison 92 / 132
212. Sparse string comparison
Semi-local LCS between permutations
D E H C B A F G
C
F
A
E
D
H
G
B
Alexander Tiskin (Warwick) Semi-local string comparison 92 / 132
213. Sparse string comparison
Semi-local LCS between permutations
Semi-local LCS on permutation strings: the algorithm
Divide-and-conquer on the alignment graph
Divide graph (say) horizontally; two subproblems of effective size n/2
Conquer: seaweed matrix -multiplication, time O(n log n)
Overall time O(n log2 n)
Alexander Tiskin (Warwick) Semi-local string comparison 93 / 132
214. Sparse string comparison
Longest piecewise monotone subsequences
A k-increasing sequence: a concatenation of k increasing sequences
A k-modal sequence: a concatenation of k alternating increasing and
decreasing sequences
The longest k-increasing (k-modal) subsequence problem
Give the longest k-increasing (k-modal) subsequence of string b
Longest k-increasing (k-modal) subsequence: running time
O(nk log n) k-modal [Demange+: 2007]
O(nk log n) via [Hunt, Szymanski: 1977]
O(n log2 n) [T: NEW]
Main idea: LCS for id k (respectively, (idid)k/2 ) vs b
Alexander Tiskin (Warwick) Semi-local string comparison 94 / 132