SlideShare uma empresa Scribd logo
1 de 23
Suffix Arrays in Linear Time
Index text, so substring
queries can be answered fast
The Text

                                 C       G   A       C       G       C   T


Suffix Tree




        A                    C                           G                   T


                   G                 T           A               C


              A          C
The Text

                       C       G   A       C       G       C      T




A                  C                           G                       T


         G                 T           A               C


    A          C

                                                               Substring
                               C   G   C                        Query
Trees take too much space.
Are there smaller indices?
The Text

                                               C           G       A       C       G       C   T


Suffix Tree




        A                              C                                       G                   T


                         G                         T                   A               C


                    A              C
    Suffix Array
   Sorted List of
     Suffixes                  3           1           4       6       2       5       7
The Text

                 C       G       A       C       G       C       T




                                 Burrows-Wheeler
                                  Index (an array)



  Suffix Array

                     3       1       4       6       2       5       7
How can one compute the
Suffix Array in Linear Time?
Task
String of length n
 with characters
in the range 1..n




          Sort these
           suffixes
      lexicographically




                    Obtain two arrays,          O(n log n)
                 f[i]: sorted order of ith     comparisons
                     suffix, g[i]: which      each taking up
                   suffix is ith highest         to n time
Divide and Conquer




Separate odd and
even suffixes; sort
 each recursively,
  then combine
Sorting Even Suffixes



                     A1 A2
                             A3 A4

  Sort these n/2
  pairs and map
  them to single
chars in the range
      1..n/2


                                 New text of half
                                 the length; sort
                                     suffixes
                                   recursively
Sorting Odd Suffixes


                        O1      O2      O3      O4

                       A1,E1   A2,E2   A3,E3   A4,E4



 Sort these n/2
pairs, E’s are the
 even suffixes,
whose order we
      know
Time Complexity


T(n) = O(n) + T(n/2) + Time for merging even and odd suffixes




O(n)
Merging


                          O     E

                          A,E   B,O


 Do we have any info
   to determine the
  relative order of an
odd suffix and an even
          one?
The Trick
                   Sanders, Karkkainnen




                      0      1      2


 Split suffixes
  into 3 groups
instead of 2, so
0 mod 3, 1 mod
 3 and 2 mod 3
Sorting 0 and 1 Together

                   ABCDEFGHIJKL


 Sort these 2n/3
triplets and map
 them to single
      chars


                      New text of
                   length 2n/3; sort
                        suffixes
                      recursively
Sorting Suffixes in 2


                         21     22      23     24

                       A1,01   A2,02   A3,03   A4,04



 Sort these n/3
pairs, 0’s are the
 mod 0 suffixes,
whose order we
      know
Merging


                     1      2

                    AB,0   CD,1



 We know the
order of all 0,1
   suffixes!
Time Complexity


  T(n) = O(n) + T(2n/3) + O(n)




  O(n)
Generalization
Set D of indices mod v


                           v                     2v         3v




                                                            Sorting suffixes of
                                                           this string gives the
    This string has size         Time taken to create       sorted order of all
           |D|n/v                this string is O(n |D|)   suffixes which begin
                                                           at indices j such that
                                                               j mod v is in D
Key Property of D



                        x<v
                                        x<v

For any 2 indices i and j
            i-j mod v is the distance between some two beads in D



                          D is a Difference Cover if
                         distances between beads in
                             D generate 0,1…,v-1
Size of D
                                       sqrt(v)




sqrt(v)




          There exists a Difference
          Cover of size 1.5*sqrt(v)!
Time Complexity

 T(n) = O(n|D|) + T(|D|n/v) + O(nv)

  T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv)




     For |D|=2.5 sqrt(v)

Mais conteúdo relacionado

Mais procurados

Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterAdila Krisnadhi
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaRushabh2428
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdfDilouar Hossain
 
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...parmeet834
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefixSanjeev Gupta
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Math63032modal
Math63032modalMath63032modal
Math63032modalHanibei
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Db31463471
Db31463471Db31463471
Db31463471IJMER
 
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...inventionjournals
 
AlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm 2012
 

Mais procurados (18)

Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 Poster
 
AI Lesson 15
AI Lesson 15AI Lesson 15
AI Lesson 15
 
AI Lesson 14
AI Lesson 14AI Lesson 14
AI Lesson 14
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdf
 
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefix
 
Unit i
Unit iUnit i
Unit i
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Math63032modal
Math63032modalMath63032modal
Math63032modal
 
Biconnectivity
BiconnectivityBiconnectivity
Biconnectivity
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Db31463471
Db31463471Db31463471
Db31463471
 
Unit ii
Unit iiUnit ii
Unit ii
 
Theory of computation Lec3 dfa
Theory of computation Lec3 dfaTheory of computation Lec3 dfa
Theory of computation Lec3 dfa
 
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
 
Mcs 031
Mcs 031Mcs 031
Mcs 031
 
AlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier Goaoc
 

Destaque

Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesBenjamin Sach
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common AncestorBenjamin Sach
 
Ukk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeUkk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeJiachen Yang
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAmrith Krishna
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonDavide Eynard
 

Destaque (7)

Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
 
Ukk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeUkk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix Tree
 
Ch09 combinatorialpatternmatching
Ch09 combinatorialpatternmatchingCh09 combinatorialpatternmatching
Ch09 combinatorialpatternmatching
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problem
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparison
 

Semelhante a Suffix arrays

Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningBag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningTomonari Masada
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCSR2011
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxjainaaru59
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationAttaporn Ninsuwan
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-posterFeynman Liang
 

Semelhante a Suffix arrays (18)

Linear sorting
Linear sortingLinear sorting
Linear sorting
 
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
 
Algorithm Exam Help
Algorithm Exam HelpAlgorithm Exam Help
Algorithm Exam Help
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningBag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
PHP Cheatsheet
PHP CheatsheetPHP Cheatsheet
PHP Cheatsheet
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskin
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
ALG5.1.ppt
ALG5.1.pptALG5.1.ppt
ALG5.1.ppt
 
AJMS_476_23.pdf
AJMS_476_23.pdfAJMS_476_23.pdf
AJMS_476_23.pdf
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-poster
 

Mais de Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Último

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Suffix arrays

  • 1. Suffix Arrays in Linear Time
  • 2. Index text, so substring queries can be answered fast
  • 3. The Text C G A C G C T Suffix Tree A C G T G T A C A C
  • 4. The Text C G A C G C T A C G T G T A C A C Substring C G C Query
  • 5. Trees take too much space. Are there smaller indices?
  • 6. The Text C G A C G C T Suffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
  • 7. The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
  • 8. How can one compute the Suffix Array in Linear Time?
  • 9. Task String of length n with characters in the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
  • 10. Divide and Conquer Separate odd and even suffixes; sort each recursively, then combine
  • 11. Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to single chars in the range 1..n/2 New text of half the length; sort suffixes recursively
  • 12. Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2 pairs, E’s are the even suffixes, whose order we know
  • 13. Time Complexity T(n) = O(n) + T(n/2) + Time for merging even and odd suffixes O(n)
  • 14. Merging O E A,E B,O Do we have any info to determine the relative order of an odd suffix and an even one?
  • 15. The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groups instead of 2, so 0 mod 3, 1 mod 3 and 2 mod 3
  • 16. Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3 triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
  • 17. Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3 pairs, 0’s are the mod 0 suffixes, whose order we know
  • 18. Merging 1 2 AB,0 CD,1 We know the order of all 0,1 suffixes!
  • 19. Time Complexity T(n) = O(n) + T(2n/3) + O(n) O(n)
  • 20. Generalization Set D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
  • 21. Key Property of D x<v x<v For any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
  • 22. Size of D sqrt(v) sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
  • 23. Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)