SlideShare uma empresa Scribd logo
1 de 23
Suffix Arrays in Linear Time
Index text, so substring
queries can be answered fast
The Text

                                 C       G   A       C       G       C   T


Suffix Tree




        A                    C                           G                   T


                   G                 T           A               C


              A          C
The Text

                       C       G   A       C       G       C      T




A                  C                           G                       T


         G                 T           A               C


    A          C

                                                               Substring
                               C   G   C                        Query
Trees take too much space.
Are there smaller indices?
The Text

                                               C           G       A       C       G       C   T


Suffix Tree




        A                              C                                       G                   T


                         G                         T                   A               C


                    A              C
    Suffix Array
   Sorted List of
     Suffixes                  3           1           4       6       2       5       7
The Text

                 C       G       A       C       G       C       T




                                 Burrows-Wheeler
                                  Index (an array)



  Suffix Array

                     3       1       4       6       2       5       7
How can one compute the
Suffix Array in Linear Time?
Task
String of length n
 with characters
in the range 1..n




          Sort these
           suffixes
      lexicographically




                    Obtain two arrays,          O(n log n)
                 f[i]: sorted order of ith     comparisons
                     suffix, g[i]: which      each taking up
                   suffix is ith highest         to n time
Divide and Conquer




Separate odd and
even suffixes; sort
 each recursively,
  then combine
Sorting Even Suffixes



                     A1 A2
                             A3 A4

  Sort these n/2
  pairs and map
  them to single
chars in the range
      1..n/2


                                 New text of half
                                 the length; sort
                                     suffixes
                                   recursively
Sorting Odd Suffixes


                        O1      O2      O3      O4

                       A1,E1   A2,E2   A3,E3   A4,E4



 Sort these n/2
pairs, E’s are the
 even suffixes,
whose order we
      know
Time Complexity


T(n) = O(n) + T(n/2) + Time for merging even and odd suffixes




O(n)
Merging


                          O     E

                          A,E   B,O


 Do we have any info
   to determine the
  relative order of an
odd suffix and an even
          one?
The Trick
                   Sanders, Karkkainnen




                      0      1      2


 Split suffixes
  into 3 groups
instead of 2, so
0 mod 3, 1 mod
 3 and 2 mod 3
Sorting 0 and 1 Together

                   ABCDEFGHIJKL


 Sort these 2n/3
triplets and map
 them to single
      chars


                      New text of
                   length 2n/3; sort
                        suffixes
                      recursively
Sorting Suffixes in 2


                         21     22      23     24

                       A1,01   A2,02   A3,03   A4,04



 Sort these n/3
pairs, 0’s are the
 mod 0 suffixes,
whose order we
      know
Merging


                     1      2

                    AB,0   CD,1



 We know the
order of all 0,1
   suffixes!
Time Complexity


  T(n) = O(n) + T(2n/3) + O(n)




  O(n)
Generalization
Set D of indices mod v


                           v                     2v         3v




                                                            Sorting suffixes of
                                                           this string gives the
    This string has size         Time taken to create       sorted order of all
           |D|n/v                this string is O(n |D|)   suffixes which begin
                                                           at indices j such that
                                                               j mod v is in D
Key Property of D



                        x<v
                                        x<v

For any 2 indices i and j
            i-j mod v is the distance between some two beads in D



                          D is a Difference Cover if
                         distances between beads in
                             D generate 0,1…,v-1
Size of D
                                       sqrt(v)




sqrt(v)




          There exists a Difference
          Cover of size 1.5*sqrt(v)!
Time Complexity

 T(n) = O(n|D|) + T(|D|n/v) + O(nv)

  T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv)




     For |D|=2.5 sqrt(v)

Mais conteúdo relacionado

Mais procurados

Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterAdila Krisnadhi
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaRushabh2428
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdfDilouar Hossain
 
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...parmeet834
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefixSanjeev Gupta
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Seokhwan Kim
 
Math63032modal
Math63032modalMath63032modal
Math63032modalHanibei
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Db31463471
Db31463471Db31463471
Db31463471IJMER
 
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...inventionjournals
 
AlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm 2012
 

Mais procurados (18)

Local Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 PosterLocal Closed World Semantics - DL 2011 Poster
Local Closed World Semantics - DL 2011 Poster
 
AI Lesson 15
AI Lesson 15AI Lesson 15
AI Lesson 15
 
AI Lesson 14
AI Lesson 14AI Lesson 14
AI Lesson 14
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdf
 
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
CS 162 Fall 2015 Homework 1 Problems September 29, 2015 Timothy Johnson 1. Ex...
 
32 -longest-common-prefix
32 -longest-common-prefix32 -longest-common-prefix
32 -longest-common-prefix
 
Unit i
Unit iUnit i
Unit i
 
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punc...
 
Math63032modal
Math63032modalMath63032modal
Math63032modal
 
Biconnectivity
BiconnectivityBiconnectivity
Biconnectivity
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Db31463471
Db31463471Db31463471
Db31463471
 
Unit ii
Unit iiUnit ii
Unit ii
 
Theory of computation Lec3 dfa
Theory of computation Lec3 dfaTheory of computation Lec3 dfa
Theory of computation Lec3 dfa
 
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
Common Fixed Point Theorem for Weakly Compatible Maps in Intuitionistic Fuzzy...
 
Mcs 031
Mcs 031Mcs 031
Mcs 031
 
AlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier GoaocAlgoPerm2012 - 10 Xavier Goaoc
AlgoPerm2012 - 10 Xavier Goaoc
 

Destaque

Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesBenjamin Sach
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common AncestorBenjamin Sach
 
Ukk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeUkk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeJiachen Yang
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAmrith Krishna
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonDavide Eynard
 

Destaque (7)

Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
 
Ukk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix TreeUkk's Algorithm of Suffix Tree
Ukk's Algorithm of Suffix Tree
 
Ch09 combinatorialpatternmatching
Ch09 combinatorialpatternmatchingCh09 combinatorialpatternmatching
Ch09 combinatorialpatternmatching
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problem
 
Fast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparisonFast algorithms for large scale genome alignment and comparison
Fast algorithms for large scale genome alignment and comparison
 

Semelhante a Suffix arrays

Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningBag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningTomonari Masada
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCSR2011
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxjainaaru59
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationAttaporn Ninsuwan
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-posterFeynman Liang
 

Semelhante a Suffix arrays (18)

Linear sorting
Linear sortingLinear sorting
Linear sorting
 
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
 
Algorithm Exam Help
Algorithm Exam HelpAlgorithm Exam Help
Algorithm Exam Help
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
 
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological MiningBag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
PHP Cheatsheet
PHP CheatsheetPHP Cheatsheet
PHP Cheatsheet
 
Csr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskinCsr2011 june18 11_00_tiskin
Csr2011 june18 11_00_tiskin
 
Gwt sdm public
Gwt sdm publicGwt sdm public
Gwt sdm public
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
ALG5.1.ppt
ALG5.1.pptALG5.1.ppt
ALG5.1.ppt
 
AJMS_476_23.pdf
AJMS_476_23.pdfAJMS_476_23.pdf
AJMS_476_23.pdf
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-poster
 

Mais de Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Alignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGSAlignment of raw reads in Avadis NGS
Alignment of raw reads in Avadis NGS
 

Último

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Suffix arrays

  • 1. Suffix Arrays in Linear Time
  • 2. Index text, so substring queries can be answered fast
  • 3. The Text C G A C G C T Suffix Tree A C G T G T A C A C
  • 4. The Text C G A C G C T A C G T G T A C A C Substring C G C Query
  • 5. Trees take too much space. Are there smaller indices?
  • 6. The Text C G A C G C T Suffix Tree A C G T G T A C A C Suffix Array Sorted List of Suffixes 3 1 4 6 2 5 7
  • 7. The Text C G A C G C T Burrows-Wheeler Index (an array) Suffix Array 3 1 4 6 2 5 7
  • 8. How can one compute the Suffix Array in Linear Time?
  • 9. Task String of length n with characters in the range 1..n Sort these suffixes lexicographically Obtain two arrays, O(n log n) f[i]: sorted order of ith comparisons suffix, g[i]: which each taking up suffix is ith highest to n time
  • 10. Divide and Conquer Separate odd and even suffixes; sort each recursively, then combine
  • 11. Sorting Even Suffixes A1 A2 A3 A4 Sort these n/2 pairs and map them to single chars in the range 1..n/2 New text of half the length; sort suffixes recursively
  • 12. Sorting Odd Suffixes O1 O2 O3 O4 A1,E1 A2,E2 A3,E3 A4,E4 Sort these n/2 pairs, E’s are the even suffixes, whose order we know
  • 13. Time Complexity T(n) = O(n) + T(n/2) + Time for merging even and odd suffixes O(n)
  • 14. Merging O E A,E B,O Do we have any info to determine the relative order of an odd suffix and an even one?
  • 15. The Trick Sanders, Karkkainnen 0 1 2 Split suffixes into 3 groups instead of 2, so 0 mod 3, 1 mod 3 and 2 mod 3
  • 16. Sorting 0 and 1 Together ABCDEFGHIJKL Sort these 2n/3 triplets and map them to single chars New text of length 2n/3; sort suffixes recursively
  • 17. Sorting Suffixes in 2 21 22 23 24 A1,01 A2,02 A3,03 A4,04 Sort these n/3 pairs, 0’s are the mod 0 suffixes, whose order we know
  • 18. Merging 1 2 AB,0 CD,1 We know the order of all 0,1 suffixes!
  • 19. Time Complexity T(n) = O(n) + T(2n/3) + O(n) O(n)
  • 20. Generalization Set D of indices mod v v 2v 3v Sorting suffixes of this string gives the This string has size Time taken to create sorted order of all |D|n/v this string is O(n |D|) suffixes which begin at indices j such that j mod v is in D
  • 21. Key Property of D x<v x<v For any 2 indices i and j i-j mod v is the distance between some two beads in D D is a Difference Cover if distances between beads in D generate 0,1…,v-1
  • 22. Size of D sqrt(v) sqrt(v) There exists a Difference Cover of size 1.5*sqrt(v)!
  • 23. Time Complexity T(n) = O(n|D|) + T(|D|n/v) + O(nv) T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv) For |D|=2.5 sqrt(v)