SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
.
                        Algorithm of Suffix Tree
                           by farseerfc@gmail.com
.

                                   Jiachen Yang1
                           1 Research   Student in Osaka University


                               November 12, 2011




                                                                .     .   .        .      .       .

    jc-yang (Osaka-U)                       SuffixTree                         November 12, 2011       1 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .        .      .       .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011       2 / 25
What can we do with suffix tree?
.
                                           Find properties of the strings
Linear algorithms for exact string           . Find the longest common substrings of the
                                             1
matching
                                               string Si and Sj in θ (ni + nj ) time .
    like KMP                                 . Find all maximal pairs, maximal repeats or
                                             2

Search for strings                             supermaximal repeats in θ (n + z) time, if there
  . Check if a string P of length m is
  1                                            are z such repeats .
                                             . Find the Lempel-Ziv decomposition in θ (n)
                                             3
    a substring in O(m) time.
  . Find all z occurrences of the
  2                                            time .
                                             . Find the longest repeated substrings in θ (n)
    patterns P1 , · · · , Pq of total        4


    length m as substrings in                  time.
                                             . Find the most frequently occurring substrings
                                             5
    O(m + z) time.
  . Search for a regular expression P
  3                                            of a minimum length in θ (n) time.
                                             . Find the shortest strings from ∑ that do not
                                             6
    in time expected sublinear in n .
  . Find for each suffix of a pattern
  4                                            occur in D, in O(n + z) time, if there are z such
    P, the length of the longest               strings.
                                             . Find the shortest substrings occurring only
                                             7
    match between a prefix of
    P[i . . . m] and a substring in D in       once in θ (n) time.
                                             . Find, for each i, the shortest substrings of Si
    θ (m) time .                             8

  . ...
  5                                            not occurring elsewhere in D in θ (n) time.
                                             . ...
                                             9


                                                               .     .     .        .      .       .

      jc-yang (Osaka-U)                     SuffixTree                          November 12, 2011       3 / 25
Trie, Radix Tree, Suffix Trie & Suffix Tree

         trie1                    is a dictionary tree.
                   (AKA prefix tree)

                             stores a set of words.
                             each node represents a character except that root is empty string.
                             words with common prefix share same parent nodes.
                             minimal deterministic finite automaton that accepts all words.
  radix tree                            is a trie with compressed chain of nodes.
                   (AKA patricia trie or radix trie)

                             Each internal node has at least 2 children.
  suffix trie is a trie which stores all suffix of a given string.
  suffix tree is a suffix radix tree.
                   that enables linear time construction and fast
                   algorithms of other problems on a string.
   1 pronounced as in word retrieval by its inventor, /tri:/ “tree”, but
pronounced /traI/ “try” by other authors                           .   .    .        .      .       .

      jc-yang (Osaka-U)                                SuffixTree                November 12, 2011       4 / 25
Trie & Radix Tree

.
Trie of “A to tea ted ten i
in inn”
.
                                 .
                                 Radix tree example
                                 .




                                 .

.
                                          .   .   .        .      .       .

    jc-yang (Osaka-U)         SuffixTree               November 12, 2011       5 / 25
Suffix Trie & Suffix Tree of “banana”
                   ^




     b        a          n       $
                                                                                0:6


 a       n         $         a                                  banana$ a             na         $


                                                      1:0             8:6                 6:6            10:6
 n       a                   n       $

                                                                na     $                   na$       $
 a       n         $         a
                                                      4:6             9:6                 3:2            7:6

 n       a                   $
                                                      na$   $


 a       $
                                                2:1         5:6



 $


                                                                  .         .         .          .         .    .

     jc-yang (Osaka-U)                   SuffixTree                                         November 12, 2011        6 / 25
Suffix Tree of “mississippi”
                                                    mississippi$

                                                            0:11


                            (1,2)                     (0,...) (8,9)            (11,...)      (2,3)
                              i                     mississi... p                $...          s


                  12:8                        1:0          15:10                  18:11              4:4


            (8,...) (2,5)          (11,...)                   (10,...)         (9,...)
            ppi$... ssi              $...                       i$...          pi$...

                                                                                             (3,5)           (4,5)
  13:8           6:8                 17:11                 16:10                  14:8
                                                                                               si              i

                (8,...)       (5,...)
                ppi$...     ssippi$...


          7:8                2:1                                         8:8                                 10:8


                                                               (8,...) (5,...)                           (8,...)   (5,...)
                                                               ppi$... ssippi$...                        ppi$... ssippi$...


                                                     9:8                 3:2              11:8                  5:4



                                                                                                     .           .            .        .      .       .

     jc-yang (Osaka-U)                                             SuffixTree                                                      November 12, 2011       7 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .        .      .       .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011       8 / 25
History of Suffix Tree Algorithms
.
                                                           M. Farach. Optimal suffix tree
    First linear algorithm was introduced by Weiner            construction with large
                                                               alphabets. In focs, page
    1973 as position tree. Awarded by Donald Knuth             137. Published by the IEEE
                                                               Computer Society, 1997.
    as “Algorithm of the year 1973”.                       E. M. McCreight. A
                                                               space-economical suffix
    Greatly simplified by McCreight 1976.                       tree construction algorithm.
                                                               J. ACM, 23:262–272, April
Above two algorithms are processing string backward.           1976. ISSN 0004-5411.
                                                               doi: http://doi.acm.org/10.
                                                               1145/321941.321946. URL
    First online construction by Ukkonen 1995, which           http:
                                                               //doi.acm.org/10.
    is easier to understand.                                   1145/321941.321946.
                                                           E. Ukkonen. On-line
                                                               construction of suffix trees.
Above algorithms assume size of alphabet as fixed               Algorithmica, 14:249–260,
                                                               1995. ISSN 0178-4617.
constant .                                                     URL
                                                               http://dx.doi.org/10.
    Limitation was break by Farach 1997, optimal for           1007/BF01206331.
                                                               10.1007/BF01206331.
    all alphabets.                                         P. Weiner. Linear pattern
                                                               matching algorithms. In
                                                               Switching and Automata
Further study are continued to scale to scenarios when         Theory, 1973. SWAT ’08.
                                                               IEEE Conference Record of
the whole suffix tree or even input string cannot fit into       14th Annual Symposium on,
                                                               pages 1 –11, oct. 1973. doi:
memory.                                           .   .    .
                                                               10.1109/SWAT.1973.13.
                                                                    .        .        .

     jc-yang (Osaka-U)             SuffixTree                   November 12, 2011          9 / 25
Backward Construction of Suffix Tree
   1                       2                        3                                4


  banana$            banana$ anana$      banana$ anana$       nana$        banana$ ana nana$



                                                                                    na$ $




                 7                              6                               5


         banana$ a na          $      banana$       a    na           banana$ ana                na



              na $         na$ $         na $           na$ $             na$ $                na$ $


        na$ $                         na$ $


                                                                      .     .            .         .       .     .

       jc-yang (Osaka-U)                        SuffixTree                                    November 12, 2011   10 / 25
Construct Suffix Tree by Sorting Suffix
Suffix :                  Sorted suffix :          Tree of sorted suffix :
mississippi              i                       | -i - >| -
ississippi               ippi                    |       | - ppi
ssissippi                issippi                 |       | - ssi - >| - ppi
sissippi                 ississippi              |                   | - ssippi
issippi                  mississippi             | - mississippi
ssippi                   pi                      | -p - >| - i
sippi                    ppi                     |       | - pi
ippi                     sippi                   | -s - >| -i - - >| - ppi
ppi                      sissippi                        |         | - ssippi
pi                       ssippi                          | - si - >| - ppi
i                        ssissippi                                 | - ssippi
Time complexity will be O(N 2 log N) .
Space complexity will be O(N 2 ) .
                                                        .   .    .         .       .     .

     jc-yang (Osaka-U)               SuffixTree                       November 12, 2011   11 / 25
Naïve Algorithm

S UFFIX T REE(string)
1 for i ← 1 to length(string)
2       do U PDATE(treei )    £ Phrase i

U PDATE(treei )
1 for j ← 1 to i
2       do node ← treei .F IND(suffix[j to i − 1])
3          E XTEND(node, string[i])       £ Extension j

Time complexity will be O(N 3 ) .
Space complexity will be O(N 2 ) .
The challenge is to make sure treei is updated to treei+1 efficiently.
                                                .   .    .         .       .     .

     jc-yang (Osaka-U)           SuffixTree                   November 12, 2011   12 / 25
Suffix Extend Cases of Naïve Algorithm
.
     Case I If path Sj...i ends at leaf, append a char Si+1 to end of edge into leaf.
         Sj...i = . . . na                                       Sj...i+1 = . . . nan

                    na                                                  nan

    Case II If path Sj...i ends in the middle of an edge , and next char Si+1 is not equal to
              the next char in the edge, split that edge, create a internal node, add a new
              edge to a new leaf.
         Sj...i = . . . na                                        Sj...i+1 = . . . nay
                                                                                         n
                    nan                                                 na
                                                                                         y
    Case III If path Sj...i ends in the middle of an edge , and next char Si+1 is equal to the
               next char in the edge, do nothing, extenstion has done.
          Sj...i = . . . na                                      Sj...i+1 = . . . nan
                    nan                                                 nan

                                                              .     .     .         .        .    .

       jc-yang (Osaka-U)                   SuffixTree                          November 12, 2011   13 / 25
Naïve Online Construction of Suffix Tree
   1                   2            3                                  4


       b           ba      a     ban an n                       bana ana na



               7                     6                             5


       banana$ a na        $   banana anana     nana           banan anan nan



           na $        na$ $



       na$ $


                                                       .   .       .         .       .     .

   jc-yang (Osaka-U)                SuffixTree                          November 12, 2011   14 / 25
Properties of Suffix Tree
.
    . Each update will add exactly 1
    1

      leaf node .
                                                                                          0:6
              nr_leaf = N
    . Suffix tree is full tree.
    2
                                                                           banana$ a              na          $
              Each internal node has at least 2
              children.                                      1:0                    8:6             6:6               10:6
              nr_internal < N
              nr_node < 2N                                                 na        $                  na$       $
    . Worst case Fabonacci word
    3
                                                             4:6                    9:6             3:2               7:6
              abaababaabaab
    . Suffix is either explicit or implicit.
    4
                                                             na$       $

              Explicit when it ends at a node.         2:1             5:6
              Implicit when it ends in the
              middle of an edge.

                                                                   .            .         .         .         .        .

        jc-yang (Osaka-U)                  SuffixTree                                          November 12, 2011         15 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .         .       .     .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011   16 / 25
Optimization of Naïve Algorithm
.
     . Substrings can be represented as (start, end) pair of their index in
     1

       orignal string.
                    Reduce space complexity to O(N) if size of alphabet is fixed constant.
     . Once a leaf, Always a leaf
     2

                    Represent edge that links to a leaf as (start, · · · ).
                    Extend leaf nodes for free. We do not need Extend Case I.
                               0:6
                                                                                                                   0:6


                    banana$ a        na         $                                                    (1,2)        (0,...)  (6,...)           (2,4)
                                                                                                       a        banana$... $...               na

          1:0            8:6          6:6               10:6                      8:6                 1:0                   10:6                    6:6


                                                                                 (6,...)   (2,4)                                                   (6,...)   (4,...)
                    na    $               na$       $                             $...      na                                                      $...     na$...


          4:6            9:6          3:2               7:6                9:6                 4:6                                     7:6                    3:2


                                                                                           (6,...)    (4,...)
          na$   $                                                                           $...      na$...


                                                                                   5:6                  2:1
    2:1         5:6                                                                        .            .            .             .           .              .

           jc-yang (Osaka-U)                                   SuffixTree                                                 November 12, 2011                    17 / 25
Active Point

                                     During a phrase, if we meet Extend Case III, that is if we found
                                     S[i + 1] already exists in suffix[j . . . i] then S[i + 1] will exists in
                                     ∀suffix[k . . . i], k ∈ j . . . i.
                                     Thus Case III is a sign that means update of this phrase is finished.
                                     During phrase i if we stopped at suffix[k . . . i] by Case III, then in
                                     next phrase we can start from suffix[k . . . i + 1] because all suffix
                                     start with 1 . . . k − 1 will end at Case I.
                                     We called this point(current internal node, current position k in
                                     string) as Active Point.
a                    ab                       aba                      abaa                   abaab                    abaaba                                  abaabab                                              abaababa                                              abaababaa                                           abaababaab                                          abaababaaba                                                         abaababaabaa                                                                       abaababaaba ab


0:0               0:1                       0:2                      0:3                      0:4                      0:5                                        0:6                                                   0:7                                                 0:8                                                0:9                                                 0:10                                                                             0:11                                                                          0:12


 (0,...)         (0,...)   (1,...)         (0,...)   (1,...)         (0,1)    (1,...)     (0,1)        (1,...)     (0,1)         (1,...)                       (0,1)      (1,3)                                     (0,1)      (1,3)                                       (0,1)    (1,3)                                     (0,1)     (1,3)                                     (0,1)     (1,3)                                                                   (0,1)       (1,3)                                                            (0,1)         (1,3)
  a...            ab...     b...           aba...     ba...            a      baa...        a         baab...        a          baaba...                         a         ba                                         a         ba                                           a       ba                                         a        ba                                         a        ba                                                                       a          ba                                                                a            ba


1:0        1:0              2:1      1:0              2:1      3:3              2:1     3:3             2:1      3:3             2:1                    3:3               7:6                                 3:3              7:6                                  3:3             7:6                                 3:3             7:6                                 3:3             7:6                                                              3:3                  7:6                                                   3:3                          7:6


                                                                (3,...)       (1,...)    (3,...)       (1,...)    (3,...)     (1,...)             (3,...) (1,3)              (3,...)   (6,...)          (3,...) (1,3)             (3,...)   (6,...)           (3,...) (1,3)             (3,...)   (6,...)       (3,...)  (1,3)            (3,...)     (6,...)       (3,...)  (1,3)             (3,...)     (6,...)                                    (3,6) (1,3)                  (3,6)       (6,...)                             (3,6) (1,3)                        (3,6)         (6,...)
                                                                 a...         baa...      ab...       baab...     aba...     baaba...            abab... ba                 abab...     b...           ababa... ba               ababa...    ba...          ababaa... ba              ababaa...   baa...      ababaab... ba             ababaab...   baab...     ababaaba... ba             ababaaba...   baaba...                                     aba   ba                     aba       baabaa...                             aba ba                             aba        baabaab...


                                                               4:3              1:0     4:3             1:0      4:3             1:0       4:3          5:6               2:1          8:6       4:3          5:6              2:1          8:6       4:3           5:6              2:1          8:6       4:3          5:6                2:1       8:6       4:3         5:6                 2:1       8:6                        13:11                   5:6                11:11           8:6                   13:11             5:6                         11:11           8:6


                                                                                                                                                     (3,...)    (6,...)                                    (3,...) (6,...)                                       (3,...) (6,...)                                     (3,...)   (6,...)                                    (3,...)   (6,...)                                      (11,...) (6,...)         (3,6)      (6,...)       (11,...)     (6,...)            (11,...) (6,...)  (3,6)               (6,...)       (11,...)      (6,...)
                                                                                                                                                    abab...      b...                                     ababa... ba...                                       ababaa... baa...                                    ababaab... baab...                                  ababaaba... baaba...                                        a... baabaa...          aba      baabaa...        a...      baabaa...            ab... baabaab... aba               baabaab...       ab...      baabaab...


                                                                                                                                                 1:0              6:6                                  1:0              6:6                                   1:0             6:6                                 1:0                 6:6                             1:0                 6:6                            14:11        4:3            9:11            6:6           12:11           2:1     14:11         4:3           9:11              6:6           12:11           2:1


                                                                                                                                                                                                                                                                                                                                                                                                                                                (11,...)      (6,...)                                                              (11,...)     (6,...)
                                                                                                                                                                                                                                                                                                                                                                                                                                                  a...       baabaa...                                                              ab...     baabaab...


                                                                                                                                                                                                                                                                                                                                                                                                                                             10:11            1:0                                                              10:11             1:0




                                                                                                                                                                                                                                                                                                                                                                            .                                        .                                .                                        .                               .                                            .

                                     jc-yang (Osaka-U)                                                                                                                                                                                                       SuffixTree                                                                                                                                                                                      November 12, 2011                                                                                                 18 / 25
Ukk’s Update using Active Point

U PDATE(treei )
 1 current_suffix ← active_point
 2 next_char ← string[i]
 3 while True
 4      do if there exists edge start with next_char
 5              then break       £ Case III
 6              else
 7                   split current edge if implicit
 8                   create new leaf with new edge labelled next_char
 9          if current_suffix is empty
10              then break
11              else current_suffix ← next shorter suffix
12 active_point ← current_suffix
                                              .   .   .         .       .     .

     jc-yang (Osaka-U)          SuffixTree                 November 12, 2011   19 / 25
Suffix Link to find next shorter suffix
            Suffix link
                             Internal node of suffix X α has a link to node α .
                             If α is empty, suffix link points to root.
            How to create suffix link
                             Link together every internal nodes that are created by splitting in
                             same phrase.
                            banana$                                                     banana$

                             0:6                                                        0:6


            (1,2) (0,...)  (6,...)               (2,4)                  (0,...)  (1,2)             (6,...)     (2,4)
              a banana$... $...                   na                  banana$...   a                $...        na


 8:6                 1:0               10:6         6:6         1:0              8:6               10:6            6:6


  (6,...)         (2,4)                       (6,...) (4,...)            (6,...) (2,4)                       (6,...) (4,...)
   $...            na                          $... na$...                $... na                             $... na$...


 9:6                 4:6               7:6          3:2         9:6              4:6               7:6             3:2


                  (6,...)    (4,...)                                          (6,...)    (4,...)
                   $...      na$...                                            $...      na$...


            5:6               2:1                                       5:6               2:1
                                                                                                                               .   .   .         .       .     .

             jc-yang (Osaka-U)                                                                SuffixTree                                    November 12, 2011   20 / 25
Fast jump using Suffix Link


   Assume we are at Suffix X αβ , whose parent internal node
   represent X α .
      1. Go back to parent internal node,
      2. Jumping follow the node’s suffix link to the node represent Suffix
         α
      3. Go down to Suffix αβ .

   Even jump down in step 3 because we already know length of β .
   (Skip/Count trick)
   Combining all these tricks we can Extend a character in O(1)
   time.


                                                .    .   .         .       .     .

   jc-yang (Osaka-U)             SuffixTree                   November 12, 2011   21 / 25
Part Outline


.
1   What is Suffix Tree

.
2   History & Naïve Algorithm

.
3   Optimization of Naïve Algorithm

.
4   Examples & Analysis




                                            .   .   .         .       .     .

      jc-yang (Osaka-U)         SuffixTree               November 12, 2011   22 / 25
Experiment – mississippi
.

                        m

                        0:0


                         (0,...)
                          m...


                        1:0




                                               .   .   .         .       .     .

    jc-yang (Osaka-U)              SuffixTree               November 12, 2011   23 / 25
Experiment – mississippi
.
                             mi

                         0:1


                        (1,...)   (0,...)
                          i...     mi...


               2:1                    1:0




                                               .   .   .         .       .     .

    jc-yang (Osaka-U)              SuffixTree               November 12, 2011   23 / 25
Experiment – mississippi
.
                             mis

                              0:2


                        (1,...) (2,...)        (0,...)
                         is... s...            mis...


        2:1                   3:2                     1:0




                                                            .   .   .         .       .     .

    jc-yang (Osaka-U)                     SuffixTree                     November 12, 2011   23 / 25
Experiment – mississippi
.
                            miss

                              0:3


                        (1,...) (2,...)        (0,...)
                        iss... ss...           miss...


        2:1                   3:2                     1:0




                                                            .   .   .         .       .     .

    jc-yang (Osaka-U)                     SuffixTree                     November 12, 2011   23 / 25
Experiment – mississippi
.
                             miss i

                              0:4


                        (1,...) (0,...)        (2,3)
                        issi... missi...         s


         2:1                  1:0                      4:4


                                               (4,...) (3,...)
                                                 i...   si...


                             5:4                       3:2




                                                                 .   .   .         .       .     .

    jc-yang (Osaka-U)                      SuffixTree                         November 12, 2011   23 / 25
Experiment – mississippi
.
                              miss is

                              0:5


                         (1,...) (0,...)          (2,3)
                        issis... missis...          s


         2:1                   1:0                        4:4


                                                 (4,...) (3,...)
                                                  is... sis...


                              5:4                         3:2




                                                                   .   .   .         .       .     .

    jc-yang (Osaka-U)                        SuffixTree                         November 12, 2011   23 / 25
Experiment – mississippi
.
                               miss iss

                             0:6


                         (1,...)   (0,...)          (2,3)
                        ississ... mississ...          s


         2:1                       1:0                      4:4


                                                  (4,...) (3,...)
                                                  iss... siss...


                                5:4                        3:2




                                                                    .   .   .         .       .     .

    jc-yang (Osaka-U)                          SuffixTree                        November 12, 2011   23 / 25
Experiment – mississippi
.
                                mississi

                              0:7


                          (1,...)    (0,...)           (2,3)
                        ississi... mississi...           s


         2:1                        1:0                       4:4


                                                    (4,...) (3,...)
                                                    issi... sissi...


                                 5:4                         3:2




                                                                       .   .   .         .       .     .

    jc-yang (Osaka-U)                            SuffixTree                         November 12, 2011   23 / 25
Experiment – mississippi
.
                                               mississip

                                       0:8


                        (8,...) (1,2)                   (0,...)             (2,3)
                         p...     i                   mississi...             s


    14:8                     12:8                           1:0                      4:4


                   (8,...)     (2,5)
                    p...        ssi

                                                                            (3,5)             (4,5)
    13:8                     6:8
                                                                              si                i

                   (8,...)     (5,...)
                    p...       ssip...


     7:8                     2:1                            8:8                               10:8


                                          (8,...)     (5,...)                       (8,...)      (5,...)
                                           p...       ssip...                        p...        ssip...


           9:8                          3:2                         11:8                       5:4




                                                                                                           .   .   .         .       .     .

    jc-yang (Osaka-U)                                                  SuffixTree                                       November 12, 2011   23 / 25
Experiment – mississippi
.
                                              mississip p

                                       0:9


                        (8,...) (1,2)                   (0,...)             (2,3)
                        pp...     i                   mississi...             s


    14:8                     12:8                           1:0                      4:4


                   (8,...)     (2,5)
                   pp...        ssi

                                                                            (3,5)             (4,5)
    13:8                     6:8
                                                                              si                i

                   (8,...)      (5,...)
                   pp...       ssipp...


     7:8                     2:1                            8:8                               10:8


                                          (8,...)      (5,...)                      (8,...)       (5,...)
                                          pp...       ssipp...                      pp...        ssipp...


           9:8                          3:2                         11:8                       5:4




                                                                                                            .   .   .         .       .     .

    jc-yang (Osaka-U)                                                  SuffixTree                                        November 12, 2011   23 / 25
Experiment – mississippi
.
                                                       mississippi

                                                        0:10


                                              (1,2)       (0,...)         (8,9)             (2,3)
                                                i       mississi...         p                 s


                             12:8              1:0                       15:10                    4:4


               (2,5)                (8,...)                           (9,...) (10,...)
                ssi                 ppi...                             pi...    i...

                                                                                         (3,5)          (4,5)
    6:8                            13:8               14:8                   16:10
                                                                                           si             i

     (8,...)            (5,...)
     ppi...            ssippi...


    7:8                     2:1                                        8:8                                 10:8


                                                                (8,...) (5,...)                         (8,...)    (5,...)
                                                                ppi... ssippi...                        ppi...    ssippi...


                                              9:8                      3:2                 11:8                   5:4




                                                                                                                        .     .   .         .       .     .

      jc-yang (Osaka-U)                                                              SuffixTree                                        November 12, 2011   23 / 25
Experiment – mississippi
.
                                                        mississippi$

                                                                0:11


                                (1,2)                     (0,...) (8,9)            (11,...)      (2,3)
                                  i                     mississi... p                $...          s


                      12:8                        1:0          15:10                  18:11              4:4


                (8,...) (2,5)          (11,...)                   (10,...)         (9,...)
                ppi$... ssi              $...                       i$...          pi$...

                                                                                                 (3,5)         (4,5)
    13:8             6:8                 17:11                 16:10                  14:8
                                                                                                   si            i

                    (8,...)       (5,...)
                    ppi$...     ssippi$...


              7:8                2:1                                         8:8                               10:8


                                                                   (8,...) (5,...)                       (8,...)   (5,...)
                                                                   ppi$... ssippi$...                    ppi$... ssippi$...


                                                         9:8                 3:2              11:8                5:4




                                                                                                                        .     .   .         .       .     .

           jc-yang (Osaka-U)                                                         SuffixTree                                        November 12, 2011   23 / 25
Time Complexity Analysis
$
i
p
p
i
s
s
i
s
s
i
m
 .   m        i          s   s   i   s     s         i   p       p       i          $
Time complexity is 2N = O(N) .
                                                             .       .       .          .      .     .

     jc-yang (Osaka-U)                   SuffixTree                               November 12, 2011   24 / 25
Experiment – English text
.

                           16
                                                                "data" using 1:3
                           14
Construction Time (sec.)




                           12

                           10

                           8

                           6

                           4

                           2

                           0
                                0      20000 40000   60000 80000 100000 120000 140000
                                                     File size (byte)
                                                                             .     .   .         .       .     .

                           jc-yang (Osaka-U)               SuffixTree                       November 12, 2011   25 / 25

Mais conteúdo relacionado

Mais de Jiachen Yang

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性Jiachen Yang
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Jiachen Yang
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テストJiachen Yang
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsJiachen Yang
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object OwnershipJiachen Yang
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究Jiachen Yang
 
Output fica.beamer.43
Output fica.beamer.43Output fica.beamer.43
Output fica.beamer.43Jiachen Yang
 

Mais de Jiachen Yang (8)

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テスト
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly Reports
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object Ownership
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究
 
Output fica.beamer.43
Output fica.beamer.43Output fica.beamer.43
Output fica.beamer.43
 
Cloud sim report
Cloud sim reportCloud sim report
Cloud sim report
 

Último

VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...Suhani Kapoor
 
Fashion trends before and after covid.pptx
Fashion trends before and after covid.pptxFashion trends before and after covid.pptx
Fashion trends before and after covid.pptxVanshNarang19
 
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...ranjana rawat
 
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call GirlsCBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girlsmodelanjalisharma4
 
VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130
VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130
VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130Suhani Kapoor
 
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...Call Girls in Nagpur High Profile
 
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun serviceCALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun serviceanilsa9823
 
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service BhiwandiVIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service BhiwandiSuhani Kapoor
 
Stark Industries Marketing Plan (1).pptx
Stark Industries Marketing Plan (1).pptxStark Industries Marketing Plan (1).pptx
Stark Industries Marketing Plan (1).pptxjeswinjees
 
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdfThe_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdfAmirYakdi
 
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...kumaririma588
 
Petrosains Drama Competition (PSDC).pptx
Petrosains Drama Competition (PSDC).pptxPetrosains Drama Competition (PSDC).pptx
Petrosains Drama Competition (PSDC).pptxIgnatiusAbrahamBalin
 
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779Delhi Call girls
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Call Girls in Nagpur High Profile
 
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130  Available With RoomVIP Kolkata Call Girl Gariahat 👉 8250192130  Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Roomdivyansh0kumar0
 
SD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxSD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxjanettecruzeiro1
 
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...Pooja Nehwal
 

Último (20)

VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
VIP Russian Call Girls in Saharanpur Deepika 8250192130 Independent Escort Se...
 
Fashion trends before and after covid.pptx
Fashion trends before and after covid.pptxFashion trends before and after covid.pptx
Fashion trends before and after covid.pptx
 
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
(AISHA) Ambegaon Khurd Call Girls Just Call 7001035870 [ Cash on Delivery ] P...
 
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call GirlsCBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
CBD Belapur Individual Call Girls In 08976425520 Panvel Only Genuine Call Girls
 
VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130
VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130
VIP Call Girls Service Mehdipatnam Hyderabad Call +91-8250192130
 
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
 
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun serviceCALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
CALL ON ➥8923113531 🔝Call Girls Aminabad Lucknow best Night Fun service
 
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
 
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service BhiwandiVIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
VIP Call Girls Bhiwandi Ananya 8250192130 Independent Escort Service Bhiwandi
 
Stark Industries Marketing Plan (1).pptx
Stark Industries Marketing Plan (1).pptxStark Industries Marketing Plan (1).pptx
Stark Industries Marketing Plan (1).pptx
 
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdfThe_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
 
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...Verified Trusted Call Girls Adugodi💘 9352852248  Good Looking standard Profil...
Verified Trusted Call Girls Adugodi💘 9352852248 Good Looking standard Profil...
 
Petrosains Drama Competition (PSDC).pptx
Petrosains Drama Competition (PSDC).pptxPetrosains Drama Competition (PSDC).pptx
Petrosains Drama Competition (PSDC).pptx
 
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
Best VIP Call Girls Noida Sector 44 Call Me: 8448380779
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
 
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130  Available With RoomVIP Kolkata Call Girl Gariahat 👉 8250192130  Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Room
 
SD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxSD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptx
 
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
B. Smith. (Architectural Portfolio.).pdf
B. Smith. (Architectural Portfolio.).pdfB. Smith. (Architectural Portfolio.).pdf
B. Smith. (Architectural Portfolio.).pdf
 
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...
 

Ukk's Algorithm of Suffix Tree

  • 1. . Algorithm of Suffix Tree by farseerfc@gmail.com . Jiachen Yang1 1 Research Student in Osaka University November 12, 2011 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 1 / 25
  • 2. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 2 / 25
  • 3. What can we do with suffix tree? . Find properties of the strings Linear algorithms for exact string . Find the longest common substrings of the 1 matching string Si and Sj in θ (ni + nj ) time . like KMP . Find all maximal pairs, maximal repeats or 2 Search for strings supermaximal repeats in θ (n + z) time, if there . Check if a string P of length m is 1 are z such repeats . . Find the Lempel-Ziv decomposition in θ (n) 3 a substring in O(m) time. . Find all z occurrences of the 2 time . . Find the longest repeated substrings in θ (n) patterns P1 , · · · , Pq of total 4 length m as substrings in time. . Find the most frequently occurring substrings 5 O(m + z) time. . Search for a regular expression P 3 of a minimum length in θ (n) time. . Find the shortest strings from ∑ that do not 6 in time expected sublinear in n . . Find for each suffix of a pattern 4 occur in D, in O(n + z) time, if there are z such P, the length of the longest strings. . Find the shortest substrings occurring only 7 match between a prefix of P[i . . . m] and a substring in D in once in θ (n) time. . Find, for each i, the shortest substrings of Si θ (m) time . 8 . ... 5 not occurring elsewhere in D in θ (n) time. . ... 9 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 3 / 25
  • 4. Trie, Radix Tree, Suffix Trie & Suffix Tree trie1 is a dictionary tree. (AKA prefix tree) stores a set of words. each node represents a character except that root is empty string. words with common prefix share same parent nodes. minimal deterministic finite automaton that accepts all words. radix tree is a trie with compressed chain of nodes. (AKA patricia trie or radix trie) Each internal node has at least 2 children. suffix trie is a trie which stores all suffix of a given string. suffix tree is a suffix radix tree. that enables linear time construction and fast algorithms of other problems on a string. 1 pronounced as in word retrieval by its inventor, /tri:/ “tree”, but pronounced /traI/ “try” by other authors . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 4 / 25
  • 5. Trie & Radix Tree . Trie of “A to tea ted ten i in inn” . . Radix tree example . . . . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 5 / 25
  • 6. Suffix Trie & Suffix Tree of “banana” ^ b a n $ 0:6 a n $ a banana$ a na $ 1:0 8:6 6:6 10:6 n a n $ na $ na$ $ a n $ a 4:6 9:6 3:2 7:6 n a $ na$ $ a $ 2:1 5:6 $ . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 6 / 25
  • 7. Suffix Tree of “mississippi” mississippi$ 0:11 (1,2) (0,...) (8,9) (11,...) (2,3) i mississi... p $... s 12:8 1:0 15:10 18:11 4:4 (8,...) (2,5) (11,...) (10,...) (9,...) ppi$... ssi $... i$... pi$... (3,5) (4,5) 13:8 6:8 17:11 16:10 14:8 si i (8,...) (5,...) ppi$... ssippi$... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) ppi$... ssippi$... ppi$... ssippi$... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 7 / 25
  • 8. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 8 / 25
  • 9. History of Suffix Tree Algorithms . M. Farach. Optimal suffix tree First linear algorithm was introduced by Weiner construction with large alphabets. In focs, page 1973 as position tree. Awarded by Donald Knuth 137. Published by the IEEE Computer Society, 1997. as “Algorithm of the year 1973”. E. M. McCreight. A space-economical suffix Greatly simplified by McCreight 1976. tree construction algorithm. J. ACM, 23:262–272, April Above two algorithms are processing string backward. 1976. ISSN 0004-5411. doi: http://doi.acm.org/10. 1145/321941.321946. URL First online construction by Ukkonen 1995, which http: //doi.acm.org/10. is easier to understand. 1145/321941.321946. E. Ukkonen. On-line construction of suffix trees. Above algorithms assume size of alphabet as fixed Algorithmica, 14:249–260, 1995. ISSN 0178-4617. constant . URL http://dx.doi.org/10. Limitation was break by Farach 1997, optimal for 1007/BF01206331. 10.1007/BF01206331. all alphabets. P. Weiner. Linear pattern matching algorithms. In Switching and Automata Further study are continued to scale to scenarios when Theory, 1973. SWAT ’08. IEEE Conference Record of the whole suffix tree or even input string cannot fit into 14th Annual Symposium on, pages 1 –11, oct. 1973. doi: memory. . . . 10.1109/SWAT.1973.13. . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 9 / 25
  • 10. Backward Construction of Suffix Tree 1 2 3 4 banana$ banana$ anana$ banana$ anana$ nana$ banana$ ana nana$ na$ $ 7 6 5 banana$ a na $ banana$ a na banana$ ana na na $ na$ $ na $ na$ $ na$ $ na$ $ na$ $ na$ $ . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 10 / 25
  • 11. Construct Suffix Tree by Sorting Suffix Suffix : Sorted suffix : Tree of sorted suffix : mississippi i | -i - >| - ississippi ippi | | - ppi ssissippi issippi | | - ssi - >| - ppi sissippi ississippi | | - ssippi issippi mississippi | - mississippi ssippi pi | -p - >| - i sippi ppi | | - pi ippi sippi | -s - >| -i - - >| - ppi ppi sissippi | | - ssippi pi ssippi | - si - >| - ppi i ssissippi | - ssippi Time complexity will be O(N 2 log N) . Space complexity will be O(N 2 ) . . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 11 / 25
  • 12. Naïve Algorithm S UFFIX T REE(string) 1 for i ← 1 to length(string) 2 do U PDATE(treei ) £ Phrase i U PDATE(treei ) 1 for j ← 1 to i 2 do node ← treei .F IND(suffix[j to i − 1]) 3 E XTEND(node, string[i]) £ Extension j Time complexity will be O(N 3 ) . Space complexity will be O(N 2 ) . The challenge is to make sure treei is updated to treei+1 efficiently. . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 12 / 25
  • 13. Suffix Extend Cases of Naïve Algorithm . Case I If path Sj...i ends at leaf, append a char Si+1 to end of edge into leaf. Sj...i = . . . na Sj...i+1 = . . . nan na nan Case II If path Sj...i ends in the middle of an edge , and next char Si+1 is not equal to the next char in the edge, split that edge, create a internal node, add a new edge to a new leaf. Sj...i = . . . na Sj...i+1 = . . . nay n nan na y Case III If path Sj...i ends in the middle of an edge , and next char Si+1 is equal to the next char in the edge, do nothing, extenstion has done. Sj...i = . . . na Sj...i+1 = . . . nan nan nan . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 13 / 25
  • 14. Naïve Online Construction of Suffix Tree 1 2 3 4 b ba a ban an n bana ana na 7 6 5 banana$ a na $ banana anana nana banan anan nan na $ na$ $ na$ $ . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 14 / 25
  • 15. Properties of Suffix Tree . . Each update will add exactly 1 1 leaf node . 0:6 nr_leaf = N . Suffix tree is full tree. 2 banana$ a na $ Each internal node has at least 2 children. 1:0 8:6 6:6 10:6 nr_internal < N nr_node < 2N na $ na$ $ . Worst case Fabonacci word 3 4:6 9:6 3:2 7:6 abaababaabaab . Suffix is either explicit or implicit. 4 na$ $ Explicit when it ends at a node. 2:1 5:6 Implicit when it ends in the middle of an edge. . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 15 / 25
  • 16. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 16 / 25
  • 17. Optimization of Naïve Algorithm . . Substrings can be represented as (start, end) pair of their index in 1 orignal string. Reduce space complexity to O(N) if size of alphabet is fixed constant. . Once a leaf, Always a leaf 2 Represent edge that links to a leaf as (start, · · · ). Extend leaf nodes for free. We do not need Extend Case I. 0:6 0:6 banana$ a na $ (1,2) (0,...) (6,...) (2,4) a banana$... $... na 1:0 8:6 6:6 10:6 8:6 1:0 10:6 6:6 (6,...) (2,4) (6,...) (4,...) na $ na$ $ $... na $... na$... 4:6 9:6 3:2 7:6 9:6 4:6 7:6 3:2 (6,...) (4,...) na$ $ $... na$... 5:6 2:1 2:1 5:6 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 17 / 25
  • 18. Active Point During a phrase, if we meet Extend Case III, that is if we found S[i + 1] already exists in suffix[j . . . i] then S[i + 1] will exists in ∀suffix[k . . . i], k ∈ j . . . i. Thus Case III is a sign that means update of this phrase is finished. During phrase i if we stopped at suffix[k . . . i] by Case III, then in next phrase we can start from suffix[k . . . i + 1] because all suffix start with 1 . . . k − 1 will end at Case I. We called this point(current internal node, current position k in string) as Active Point. a ab aba abaa abaab abaaba abaabab abaababa abaababaa abaababaab abaababaaba abaababaabaa abaababaaba ab 0:0 0:1 0:2 0:3 0:4 0:5 0:6 0:7 0:8 0:9 0:10 0:11 0:12 (0,...) (0,...) (1,...) (0,...) (1,...) (0,1) (1,...) (0,1) (1,...) (0,1) (1,...) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) (0,1) (1,3) a... ab... b... aba... ba... a baa... a baab... a baaba... a ba a ba a ba a ba a ba a ba a ba 1:0 1:0 2:1 1:0 2:1 3:3 2:1 3:3 2:1 3:3 2:1 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 3:3 7:6 (3,...) (1,...) (3,...) (1,...) (3,...) (1,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,...) (1,3) (3,...) (6,...) (3,6) (1,3) (3,6) (6,...) (3,6) (1,3) (3,6) (6,...) a... baa... ab... baab... aba... baaba... abab... ba abab... b... ababa... ba ababa... ba... ababaa... ba ababaa... baa... ababaab... ba ababaab... baab... ababaaba... ba ababaaba... baaba... aba ba aba baabaa... aba ba aba baabaab... 4:3 1:0 4:3 1:0 4:3 1:0 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 4:3 5:6 2:1 8:6 13:11 5:6 11:11 8:6 13:11 5:6 11:11 8:6 (3,...) (6,...) (3,...) (6,...) (3,...) (6,...) (3,...) (6,...) (3,...) (6,...) (11,...) (6,...) (3,6) (6,...) (11,...) (6,...) (11,...) (6,...) (3,6) (6,...) (11,...) (6,...) abab... b... ababa... ba... ababaa... baa... ababaab... baab... ababaaba... baaba... a... baabaa... aba baabaa... a... baabaa... ab... baabaab... aba baabaab... ab... baabaab... 1:0 6:6 1:0 6:6 1:0 6:6 1:0 6:6 1:0 6:6 14:11 4:3 9:11 6:6 12:11 2:1 14:11 4:3 9:11 6:6 12:11 2:1 (11,...) (6,...) (11,...) (6,...) a... baabaa... ab... baabaab... 10:11 1:0 10:11 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 18 / 25
  • 19. Ukk’s Update using Active Point U PDATE(treei ) 1 current_suffix ← active_point 2 next_char ← string[i] 3 while True 4 do if there exists edge start with next_char 5 then break £ Case III 6 else 7 split current edge if implicit 8 create new leaf with new edge labelled next_char 9 if current_suffix is empty 10 then break 11 else current_suffix ← next shorter suffix 12 active_point ← current_suffix . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 19 / 25
  • 20. Suffix Link to find next shorter suffix Suffix link Internal node of suffix X α has a link to node α . If α is empty, suffix link points to root. How to create suffix link Link together every internal nodes that are created by splitting in same phrase. banana$ banana$ 0:6 0:6 (1,2) (0,...) (6,...) (2,4) (0,...) (1,2) (6,...) (2,4) a banana$... $... na banana$... a $... na 8:6 1:0 10:6 6:6 1:0 8:6 10:6 6:6 (6,...) (2,4) (6,...) (4,...) (6,...) (2,4) (6,...) (4,...) $... na $... na$... $... na $... na$... 9:6 4:6 7:6 3:2 9:6 4:6 7:6 3:2 (6,...) (4,...) (6,...) (4,...) $... na$... $... na$... 5:6 2:1 5:6 2:1 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 20 / 25
  • 21. Fast jump using Suffix Link Assume we are at Suffix X αβ , whose parent internal node represent X α . 1. Go back to parent internal node, 2. Jumping follow the node’s suffix link to the node represent Suffix α 3. Go down to Suffix αβ . Even jump down in step 3 because we already know length of β . (Skip/Count trick) Combining all these tricks we can Extend a character in O(1) time. . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 21 / 25
  • 22. Part Outline . 1 What is Suffix Tree . 2 History & Naïve Algorithm . 3 Optimization of Naïve Algorithm . 4 Examples & Analysis . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 22 / 25
  • 23. Experiment – mississippi . m 0:0 (0,...) m... 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 24. Experiment – mississippi . mi 0:1 (1,...) (0,...) i... mi... 2:1 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 25. Experiment – mississippi . mis 0:2 (1,...) (2,...) (0,...) is... s... mis... 2:1 3:2 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 26. Experiment – mississippi . miss 0:3 (1,...) (2,...) (0,...) iss... ss... miss... 2:1 3:2 1:0 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 27. Experiment – mississippi . miss i 0:4 (1,...) (0,...) (2,3) issi... missi... s 2:1 1:0 4:4 (4,...) (3,...) i... si... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 28. Experiment – mississippi . miss is 0:5 (1,...) (0,...) (2,3) issis... missis... s 2:1 1:0 4:4 (4,...) (3,...) is... sis... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 29. Experiment – mississippi . miss iss 0:6 (1,...) (0,...) (2,3) ississ... mississ... s 2:1 1:0 4:4 (4,...) (3,...) iss... siss... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 30. Experiment – mississippi . mississi 0:7 (1,...) (0,...) (2,3) ississi... mississi... s 2:1 1:0 4:4 (4,...) (3,...) issi... sissi... 5:4 3:2 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 31. Experiment – mississippi . mississip 0:8 (8,...) (1,2) (0,...) (2,3) p... i mississi... s 14:8 12:8 1:0 4:4 (8,...) (2,5) p... ssi (3,5) (4,5) 13:8 6:8 si i (8,...) (5,...) p... ssip... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) p... ssip... p... ssip... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 32. Experiment – mississippi . mississip p 0:9 (8,...) (1,2) (0,...) (2,3) pp... i mississi... s 14:8 12:8 1:0 4:4 (8,...) (2,5) pp... ssi (3,5) (4,5) 13:8 6:8 si i (8,...) (5,...) pp... ssipp... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) pp... ssipp... pp... ssipp... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 33. Experiment – mississippi . mississippi 0:10 (1,2) (0,...) (8,9) (2,3) i mississi... p s 12:8 1:0 15:10 4:4 (2,5) (8,...) (9,...) (10,...) ssi ppi... pi... i... (3,5) (4,5) 6:8 13:8 14:8 16:10 si i (8,...) (5,...) ppi... ssippi... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) ppi... ssippi... ppi... ssippi... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 34. Experiment – mississippi . mississippi$ 0:11 (1,2) (0,...) (8,9) (11,...) (2,3) i mississi... p $... s 12:8 1:0 15:10 18:11 4:4 (8,...) (2,5) (11,...) (10,...) (9,...) ppi$... ssi $... i$... pi$... (3,5) (4,5) 13:8 6:8 17:11 16:10 14:8 si i (8,...) (5,...) ppi$... ssippi$... 7:8 2:1 8:8 10:8 (8,...) (5,...) (8,...) (5,...) ppi$... ssippi$... ppi$... ssippi$... 9:8 3:2 11:8 5:4 . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 23 / 25
  • 35. Time Complexity Analysis $ i p p i s s i s s i m . m i s s i s s i p p i $ Time complexity is 2N = O(N) . . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 24 / 25
  • 36. Experiment – English text . 16 "data" using 1:3 14 Construction Time (sec.) 12 10 8 6 4 2 0 0 20000 40000 60000 80000 100000 120000 140000 File size (byte) . . . . . . jc-yang (Osaka-U) SuffixTree November 12, 2011 25 / 25