SlideShare a Scribd company logo
1 of 26
Pioneering
                                 Scientific Intelligence




DNA/Small RNA Alignment
in Avadis NGS 1.3

Strictly Confidential   © Strand Life Sciences
How does CoBWeb compare with other
 What is an Alignment algorithm?                  algorithms?

  What issues must an Alignment         How is CoBWeb exposed in Avadis
      algorithm consider?                           NGS?

                                         What is the future evolution of
How do Alignment algorithms work?                  CoBWeb?



    How does CoBWeb work?



        Questions we will seek to answer in this presentation




                                                       © Strand
What is an Alignment algorithm?




                            © Strand
Subject’s
                                          Genome
AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC




AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC
                                          Reference
                                        Genome, close
                                         but not quite
                                       the same as the
                                           Subject’s
                                           Genome



                                    © Strand
What issues must an Alignment
    algorithm consider?




                           © Strand
Mismatches and
     Gaps
                                      Reference
                                       Genome




Deletion




             Reads
                     SNP
                           © Strand
Handling paired
    reads
                                           Subject’s
                                           Genome




                                    ×

                                              Reference
                                               Genome
                  Repeat   Repeat
                  Region   Region




                                        © Strand
A variety of
Read Lengths

                Short reads
                 ~50, few
                mismatches
                 and gaps

                                               Long
                                            reads, few
                                           hundreds to
                                         thousands, ma
                                             ny more
                                           mismatches
                                             and gaps




                              © Strand
Speed and
 Memory




                   Run in 4GB
                     RAM          Allow use of
                                    multiple
     Billions of                 cores/process
      reads.                          ors
                   Scale speed
                    with more
                     memory




                                   © Strand
How do Alignment algorithms work?




                             © Strand
Indexing the
    Genome to find
    Seed Matches                                          Scanning the
                                                         Reference for
                                                           each Read
                                                         takes too long




                      The Reference
                          Index
                                                   The Index very
                                                    quickly yields
                                                   locations in the
                                                  Reference where
                                                 some part (seed) of
                                                 the Read matches.
This Seed occurs at        This Seed occurs at
Reference locations        Reference locations
      x1, x2…                    x3, x4…


                                                   © Strand
Detailed
 Alignment at
 Seed Match
  Locations


                                 Seed
Reference                        Match




                                            Read




        How many Mismatches
        and Gaps are needed
         for the Read to match
           around the Seed?
          Smith-Waterman or
        Dynamic Programming




                                 © Strand
The Burrows-
Wheeler based
   Index

                          The original
                          Reference
                                             C    G      A      C    $
       All its circular
       shifts, sorted                        A    C      $      C    G              This column is
                                         2                                            the BWT
     lexicographically
                                         0   C    G      A      C    $
                                         3   C    $      C      G    A
                                         1   G    A      C      $    C
  Circular Shift
     Indices                             4   $    C      G      A    C



                                                     The Index
   These can be sampled                           comprises these
     to fit into reduced                          along with some
   memory at the expense                         housekeeping data
      of speed without                               structures
   sacrificing correctness


                                                                         © Strand
The Burrows-
Wheeler based
   Index




                                            EXACT
      Reference                             Match




                                                    Read




        All Exact Matches of a Read (NO
           Mismatches or Gaps) in the
        Reference can be found in time
        proportional to the length of the
        Read and largely independent of
            the size of the Reference.




                                             © Strand
How does CoBWeb work?




                        © Strand
Seeding
Strategy




     This 15-mer occurs   This 15-mer occurs
         at locations         at locations
           x1, x2…              x3, x4…              This whole 30-mer
                                                     occurs at location
                                                            x5
   Use the BW based
   index, augmented
  with additional data
     structures for
  speed, to find one or
    more Long Seed
     Matches in the
       Reference
                               Justification: Most long
                                  Reads do not have
                               Mismatches and Gaps
                             strewn across their length;            And Long Seeds
                                there are usually long               will have few
                                 stretches that match              matching locations.
                                        exactly.
                                                            © Strand
Advantages




                                   Separating the Smith-
          Seed length is not       Waterman phase from
        specified in advance, so   the BW Index search
       Long and Short reads can     allows an unlimited
        be handled seamlessly.      number of gaps and
                                        mismatches.




                                                     © Strand
How does CoBWeb compare with other
            algorithms?




                             © Strand
Comparison
 with BWA                    CoBWeb:
                                94%                BWA: 4%
                             Alignment           error + 1 gap
                  Read      Score with up         of possibly
                Length 50    to 2 Gaps           multiple length




               Read
             Length 150




                                             A little faster than
                                                  BWA with
                                            comparable results


                                                © Strand
How is CoBWeb exposed in Avadis
            NGS?




                            © Strand
Entry




             Two new experiment
            types, DNA Alignment
               and Small-RNA
                  Alignment




        © Strand
The Alignment
  Workflow




                Run Alignment, and then
                create a DNA Variant or
                 ChIP-Seq Experiment
                   from the results.




                          © Strand
Specify number of
 Alignment     Mismatches and
Parameters   Gaps, and handling of
              Multiple Matching.




                      Specify Adaptor
                  Trimming (only for Small
                  RNA) and 3’,5’ trimming
                      based on quality




                     Screen against
                 Contaminant Databases.




                © Strand
What is the future evolution of
          CoBWeb?




                             © Strand
ToDos




        Chimeric
         Reads
                          RNA-Seq
                          Alignment




                   Base Quality
                   recalibration


                                      Affine Gap
                                        Costs




                                                   © Strand
http://www.avadis-ngs.com




                      © Strand

More Related Content

More from Strand Life Sciences Pvt Ltd (12)

Strand genomics features in CIO review
Strand genomics features in CIO reviewStrand genomics features in CIO review
Strand genomics features in CIO review
 
Rules of a Quantum World
Rules of  a Quantum WorldRules of  a Quantum World
Rules of a Quantum World
 
Least common ancestors in constant time
Least common ancestors in constant timeLeast common ancestors in constant time
Least common ancestors in constant time
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Introduction to statistics ii
Introduction to statistics iiIntroduction to statistics ii
Introduction to statistics ii
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Dynamic programming for simd
Dynamic programming for simdDynamic programming for simd
Dynamic programming for simd
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Searching using Quantum Rules
Searching using Quantum RulesSearching using Quantum Rules
Searching using Quantum Rules
 
Randomized algorithms
Randomized algorithmsRandomized algorithms
Randomized algorithms
 
Suffix arrays
Suffix arraysSuffix arrays
Suffix arrays
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Alignment of raw reads in Avadis NGS

  • 1. Pioneering Scientific Intelligence DNA/Small RNA Alignment in Avadis NGS 1.3 Strictly Confidential © Strand Life Sciences
  • 2. How does CoBWeb compare with other What is an Alignment algorithm? algorithms? What issues must an Alignment How is CoBWeb exposed in Avadis algorithm consider? NGS? What is the future evolution of How do Alignment algorithms work? CoBWeb? How does CoBWeb work? Questions we will seek to answer in this presentation © Strand
  • 3. What is an Alignment algorithm? © Strand
  • 4. Subject’s Genome AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome, close but not quite the same as the Subject’s Genome © Strand
  • 5. What issues must an Alignment algorithm consider? © Strand
  • 6. Mismatches and Gaps Reference Genome Deletion Reads SNP © Strand
  • 7. Handling paired reads Subject’s Genome × Reference Genome Repeat Repeat Region Region © Strand
  • 8. A variety of Read Lengths Short reads ~50, few mismatches and gaps Long reads, few hundreds to thousands, ma ny more mismatches and gaps © Strand
  • 9. Speed and Memory Run in 4GB RAM Allow use of multiple Billions of cores/process reads. ors Scale speed with more memory © Strand
  • 10. How do Alignment algorithms work? © Strand
  • 11. Indexing the Genome to find Seed Matches Scanning the Reference for each Read takes too long The Reference Index The Index very quickly yields locations in the Reference where some part (seed) of the Read matches. This Seed occurs at This Seed occurs at Reference locations Reference locations x1, x2… x3, x4… © Strand
  • 12. Detailed Alignment at Seed Match Locations Seed Reference Match Read How many Mismatches and Gaps are needed for the Read to match around the Seed? Smith-Waterman or Dynamic Programming © Strand
  • 13. The Burrows- Wheeler based Index The original Reference C G A C $ All its circular shifts, sorted A C $ C G This column is 2 the BWT lexicographically 0 C G A C $ 3 C $ C G A 1 G A C $ C Circular Shift Indices 4 $ C G A C The Index These can be sampled comprises these to fit into reduced along with some memory at the expense housekeeping data of speed without structures sacrificing correctness © Strand
  • 14. The Burrows- Wheeler based Index EXACT Reference Match Read All Exact Matches of a Read (NO Mismatches or Gaps) in the Reference can be found in time proportional to the length of the Read and largely independent of the size of the Reference. © Strand
  • 15. How does CoBWeb work? © Strand
  • 16. Seeding Strategy This 15-mer occurs This 15-mer occurs at locations at locations x1, x2… x3, x4… This whole 30-mer occurs at location x5 Use the BW based index, augmented with additional data structures for speed, to find one or more Long Seed Matches in the Reference Justification: Most long Reads do not have Mismatches and Gaps strewn across their length; And Long Seeds there are usually long will have few stretches that match matching locations. exactly. © Strand
  • 17. Advantages Separating the Smith- Seed length is not Waterman phase from specified in advance, so the BW Index search Long and Short reads can allows an unlimited be handled seamlessly. number of gaps and mismatches. © Strand
  • 18. How does CoBWeb compare with other algorithms? © Strand
  • 19. Comparison with BWA CoBWeb: 94% BWA: 4% Alignment error + 1 gap Read Score with up of possibly Length 50 to 2 Gaps multiple length Read Length 150 A little faster than BWA with comparable results © Strand
  • 20. How is CoBWeb exposed in Avadis NGS? © Strand
  • 21. Entry Two new experiment types, DNA Alignment and Small-RNA Alignment © Strand
  • 22. The Alignment Workflow Run Alignment, and then create a DNA Variant or ChIP-Seq Experiment from the results. © Strand
  • 23. Specify number of Alignment Mismatches and Parameters Gaps, and handling of Multiple Matching. Specify Adaptor Trimming (only for Small RNA) and 3’,5’ trimming based on quality Screen against Contaminant Databases. © Strand
  • 24. What is the future evolution of CoBWeb? © Strand
  • 25. ToDos Chimeric Reads RNA-Seq Alignment Base Quality recalibration Affine Gap Costs © Strand