6. The Text
C G A C G C T
Suffix Tree
A C G T
G T A C
A C
Suffix Array
Sorted List of
Suffixes 3 1 4 6 2 5 7
7. The Text
C G A C G C T
Burrows-Wheeler
Index (an array)
Suffix Array
3 1 4 6 2 5 7
8. How can one compute the
Suffix Array in Linear Time?
9. Task
String of length n
with characters
in the range 1..n
Sort these
suffixes
lexicographically
Obtain two arrays, O(n log n)
f[i]: sorted order of ith comparisons
suffix, g[i]: which each taking up
suffix is ith highest to n time
11. Sorting Even Suffixes
A1 A2
A3 A4
Sort these n/2
pairs and map
them to single
chars in the range
1..n/2
New text of half
the length; sort
suffixes
recursively
12. Sorting Odd Suffixes
O1 O2 O3 O4
A1,E1 A2,E2 A3,E3 A4,E4
Sort these n/2
pairs, E’s are the
even suffixes,
whose order we
know
20. Generalization
Set D of indices mod v
v 2v 3v
Sorting suffixes of
this string gives the
This string has size Time taken to create sorted order of all
|D|n/v this string is O(n |D|) suffixes which begin
at indices j such that
j mod v is in D
21. Key Property of D
x<v
x<v
For any 2 indices i and j
i-j mod v is the distance between some two beads in D
D is a Difference Cover if
distances between beads in
D generate 0,1…,v-1
22. Size of D
sqrt(v)
sqrt(v)
There exists a Difference
Cover of size 1.5*sqrt(v)!
23. Time Complexity
T(n) = O(n|D|) + T(|D|n/v) + O(nv)
T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv)
For |D|=2.5 sqrt(v)