SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
SSAHA_pileup:
A Genome Variation Detection Pipeline for
     Various Sequencing Platforms




             Photo Credit: saynine on flickr.com


          Ben Blackburne
     Wellcome Trust Sanger Institute
Acknowledgments


●Zemin Ning
●Yong Gu
●Antony Cox
●Adam Spargo
●Hannes Ponstingl
Introduction
●New sequencing technologies
  – More data
  – Different kinds of data
     ●Solexa, 454
     ●capillary, too
  – Diploid genomes
  – SNPs, indels, VNTRs




                              Photo Credit: mknowles on flickr.com
SSAHA_pileup
●Sequence Search and Alignment by Hashing
 Algorithm
●SSAHA_SNP
  – Global positioning with SSAHA algorithm
  – Fast Smith-Waterman implementation (from
    Cross_Match)
  – Identification of best match
●SSAHA_pileup
  – Determines SNPs from set of best alignments
●Works on Solexa, 454, and capillary reads
The Toolchain
Reference
 Genome




            SSAHA_snp/
                         Alignments      SSAHA_pileup
             SSAHA2




                                           variations
 Reads


                            refinement
SSAHA_SNP
●Reference genome is “hashed”
  – table made of all k-mer words
  – overlapping or not, at user's option
SSAHA_SNP
●k-mer matches found for query in reference


  chr n




  chr m
SSAHA_SNP


chr n

        Global Mapping


chr m
SSAHA_SNP


chr n
                           score: 126
        Local Mapping
        (Smith-Waterman)
                           score: 113
chr m
SSAHA_SNP


chr n
                            score: 126
        Select best match

                            score: 113
chr m
SSAHA_SNP
●Read pair information
  – currently possible with
    extra step using SSAHA2
  – being integrated into
    SSAHA_SNP
  – Removes incorrectly
    mapped pairs




                              Photo Credit: Matthew Fang on flickr.com
SSAHA_pileup
Reference
 Genome




            SSAHA_snp/
                         Alignments      SSAHA_pileup
             SSAHA2




                                           variations
 Reads


                            refinement
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
   GGTCCCACGGAGCTGGAG
        CCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
                     Aligned reads
 Homozygous SNP
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
    GGTCCCACAGAGCTGGAG
          CCACAGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
                       Aligned reads
 Heterozygous SNP
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
     GGTCCCACAGAGCTGGAG
           CCACAGAGCTGGAGAAAGCCT
        TCCCACggagCTGGAGAAAGCCT
        TCCCACggagcTGGAGAAAGCCT
        TCCCacggagcTGGAGAAAGCCT
                             Aligned reads
Heterozygous SNP??
                   (Probably not)
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
   GGTCCCAC-----TGGAG
        CCAC-----TGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
                       Aligned reads
    Heterozygous indel
How well does it work?
Datasets
●Venter: ABI capillary reads
  – Celera: 19,397,599     55% in pairs
  – JCVI: 12,541,352       98% in pairs
  – Total: 31,938,951    72% in pairs (90% mapped)
●Watson: 454 GS FLX reads
  – Baylor & Roche 74,198,831 (90.5% mapped)
  – single end reads with length 150 – 280 bps
●Chromosome X Illumina reads
  – 278,557,156 reads (71.6% mapped)
  – (paired with insert size 200bps)
How conservative should we
           be?
How conservative should we
           be?
Or....




How liberal should we be?
How do we even know if we are
         winning?
dbSNP
(but not ideal)
Filtering
●Processes that cause bogus SNPs
  – Incorrect global mapping
  – Incorrect local alignment
  – Poor quality reads
  – Sequence amplification errors
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                                        chr n




                `          `               `
                       `               `            `
                                   `
                ` ``       `   `               ``
                                           `
                                                        chr m
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                                        chr n




                `          `               `
                       `               `            `
                                   `
                ` ``       `   `               ``
                                           `
                                                        chr m
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                              chr n
              `
                               `
             `  ``
              `
              `
                  `
                `
                          `
                      `            `
                          ``
                  `
SNPs
Solution:
 Filter out SNPs called from
abnormally high read depths
Global Mapping Problems
●Incorrectly aligned reads


                                  chr n
               `     score: 132




               `     score: 136
                                  chr m
Solution:
                          nd
Filter out SNPs where 2 best
       score is too close
Local Alignment Problems
●Misalignment
  – Uncaught incorrect global alignment
  – Variations in short repeats
Local Misalignment
                      Reference
...GGTCCCACAGAGCTGGAGAAAA...
    GGTCCCACT---CTAGTG
        CCACT---CTAGTGAAAA
      TCCCACT---CTAGTGAAAA


                       Aligned reads
 Real SNPs?
Local Misalignment
                      Reference
..TAATAATAATAATAATAATAAGAAG..
    AATAATAAGAAGAAGAAGAAGAAG
    AATAATAAGAAGAAGAAGAAGAAG
    AATAATAAGAAGAAGAAGAAGAAG


                       Aligned reads
 Real SNPs?
Solution:
Filter out short blocks of many
             SNPs
Venter SNP Calling (Capillary)

                 count     fraction in dbSNP

Homozygous SNPs 1 347 806 97.1%

Heterozygous SNPs 1 857 167 90.9%

Total SNPs       3 204 973 93.5%
Watson SNP Calling (454)

                  count    fraction in
                           dbSNP

Homozygous SNPs   1 298 309 93.0%

Heterozygous SNPs 1 767 951 63.9%

Total SNPs        3 066 260 76.3%
X Chromosome SNPs (Solexa)

                  count    fraction in dbSNP

Homozygous SNPs 27 708     92.8%

Heterozygous SNPs 63 197   81.8%

Total SNPs        90 905   85.1%
Venter-Watson Overlap



  1 593 791   1 611 182   1 455 078




   Venter                     Watson
X Chromosome Overlap

             Solexa X reads
                  40 625


         19 978            12 590

                  17 712


    26 502        6 588       22 872

    Venter                    Watson
Conclusions
●SSAHA_pileup is effective across both new and
 old sequencing technologies
●Questions
  – When is a SNP not a SNP?
  – Homozygous/Heterozygous SNPs
Conclusions
●SSAHA_pileup is effective across both new and
 old sequencing technologies
●Questions
  – When is a SNP not a SNP?
  – Homozygous/Heterozygous SNPs
●Length matters...?
  – But it's what you do with it that counts
Obtaining SSAHA_pileup
                 SSAHA_pileup:
    ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/

                    SSAHA2:
http://www.sanger.ac.uk/Software/analysis/SSAHA2/
                   These Slides:
             http://slideshare.net/bpb/

Mais conteúdo relacionado

Destaque

Osmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring ToolOsmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring Toolosmius
 
B A U T I S M O2
B A U T I S M O2B A U T I S M O2
B A U T I S M O2gloriaysela
 
E X P O R T A N D O M I S D I B U J O S
E X P O R T A N D O  M I S  D I B U J O SE X P O R T A N D O  M I S  D I B U J O S
E X P O R T A N D O M I S D I B U J O SYrianat
 
Abschlusspräsentation
AbschlusspräsentationAbschlusspräsentation
AbschlusspräsentationHerr_Poffo
 
Carnaval de San Diego
Carnaval de San DiegoCarnaval de San Diego
Carnaval de San Diegoguest990cbb
 
Musica1eso
Musica1esoMusica1eso
Musica1esocarloshc
 
Colonus - rock -
Colonus - rock -Colonus - rock -
Colonus - rock -colonusrock
 
La France 2140 C O N T E X T O
La  France 2140 C O N T E X T OLa  France 2140 C O N T E X T O
La France 2140 C O N T E X T Olosdehinojosos
 
Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2umystic
 
аэг нов с домиками
аэг нов с домикамиаэг нов с домиками
аэг нов с домикамиVictor Gridnev
 

Destaque (20)

Osmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring ToolOsmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring Tool
 
Day Two
Day TwoDay Two
Day Two
 
Internet
InternetInternet
Internet
 
B A U T I S M O2
B A U T I S M O2B A U T I S M O2
B A U T I S M O2
 
E X P O R T A N D O M I S D I B U J O S
E X P O R T A N D O  M I S  D I B U J O SE X P O R T A N D O  M I S  D I B U J O S
E X P O R T A N D O M I S D I B U J O S
 
Grabalo
GrabaloGrabalo
Grabalo
 
Abschlusspräsentation
AbschlusspräsentationAbschlusspräsentation
Abschlusspräsentation
 
Cuento 1
Cuento 1Cuento 1
Cuento 1
 
Mashuta Mashuta
Mashuta MashutaMashuta Mashuta
Mashuta Mashuta
 
Carnaval de San Diego
Carnaval de San DiegoCarnaval de San Diego
Carnaval de San Diego
 
Mellorconhumor
MellorconhumorMellorconhumor
Mellorconhumor
 
Musica1eso
Musica1esoMusica1eso
Musica1eso
 
flickr + slide + animoto
flickr + slide + animotoflickr + slide + animoto
flickr + slide + animoto
 
Abusoinfantil
AbusoinfantilAbusoinfantil
Abusoinfantil
 
Quase
QuaseQuase
Quase
 
Colonus - rock -
Colonus - rock -Colonus - rock -
Colonus - rock -
 
La France 2140 C O N T E X T O
La  France 2140 C O N T E X T OLa  France 2140 C O N T E X T O
La France 2140 C O N T E X T O
 
Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2
 
Sesion 05 WinForm
Sesion 05 WinFormSesion 05 WinForm
Sesion 05 WinForm
 
аэг нов с домиками
аэг нов с домикамиаэг нов с домиками
аэг нов с домиками
 

Semelhante a SSAHA_pileup

Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3iainj88
 
Genotype Imputation via Matrix Completion
Genotype Imputation via Matrix CompletionGenotype Imputation via Matrix Completion
Genotype Imputation via Matrix Completionechi99
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesChirag Jain
 
NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09Anthony Salvagno
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analysesfnothaft
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for educationaryajayakottarathil
 
Photomorphogenesis talk
Photomorphogenesis talkPhotomorphogenesis talk
Photomorphogenesis talkHugh Shanahan
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingIntegrated DNA Technologies
 
Fly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelFly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelSanju K. Sinha
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 

Semelhante a SSAHA_pileup (20)

Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3
 
Genotype Imputation via Matrix Completion
Genotype Imputation via Matrix CompletionGenotype Imputation via Matrix Completion
Genotype Imputation via Matrix Completion
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
CQNCER
CQNCERCQNCER
CQNCER
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequences
 
NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 
Photomorphogenesis talk
Photomorphogenesis talkPhotomorphogenesis talk
Photomorphogenesis talk
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editing
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Fly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelFly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov model
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 

Último

8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 

Último (20)

8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 

SSAHA_pileup

  • 1. SSAHA_pileup: A Genome Variation Detection Pipeline for Various Sequencing Platforms Photo Credit: saynine on flickr.com Ben Blackburne Wellcome Trust Sanger Institute
  • 2. Acknowledgments ●Zemin Ning ●Yong Gu ●Antony Cox ●Adam Spargo ●Hannes Ponstingl
  • 3. Introduction ●New sequencing technologies – More data – Different kinds of data ●Solexa, 454 ●capillary, too – Diploid genomes – SNPs, indels, VNTRs Photo Credit: mknowles on flickr.com
  • 4.
  • 5.
  • 6.
  • 7. SSAHA_pileup ●Sequence Search and Alignment by Hashing Algorithm ●SSAHA_SNP – Global positioning with SSAHA algorithm – Fast Smith-Waterman implementation (from Cross_Match) – Identification of best match ●SSAHA_pileup – Determines SNPs from set of best alignments ●Works on Solexa, 454, and capillary reads
  • 8. The Toolchain Reference Genome SSAHA_snp/ Alignments SSAHA_pileup SSAHA2 variations Reads refinement
  • 9. SSAHA_SNP ●Reference genome is “hashed” – table made of all k-mer words – overlapping or not, at user's option
  • 10. SSAHA_SNP ●k-mer matches found for query in reference chr n chr m
  • 11. SSAHA_SNP chr n Global Mapping chr m
  • 12. SSAHA_SNP chr n score: 126 Local Mapping (Smith-Waterman) score: 113 chr m
  • 13. SSAHA_SNP chr n score: 126 Select best match score: 113 chr m
  • 14. SSAHA_SNP ●Read pair information – currently possible with extra step using SSAHA2 – being integrated into SSAHA_SNP – Removes incorrectly mapped pairs Photo Credit: Matthew Fang on flickr.com
  • 15. SSAHA_pileup Reference Genome SSAHA_snp/ Alignments SSAHA_pileup SSAHA2 variations Reads refinement
  • 16. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACGGAGCTGGAG CCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Homozygous SNP
  • 17. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACAGAGCTGGAG CCACAGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Heterozygous SNP
  • 18. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACAGAGCTGGAG CCACAGAGCTGGAGAAAGCCT TCCCACggagCTGGAGAAAGCCT TCCCACggagcTGGAGAAAGCCT TCCCacggagcTGGAGAAAGCCT Aligned reads Heterozygous SNP?? (Probably not)
  • 19. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCAC-----TGGAG CCAC-----TGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Heterozygous indel
  • 20. How well does it work?
  • 21. Datasets ●Venter: ABI capillary reads – Celera: 19,397,599 55% in pairs – JCVI: 12,541,352 98% in pairs – Total: 31,938,951 72% in pairs (90% mapped) ●Watson: 454 GS FLX reads – Baylor & Roche 74,198,831 (90.5% mapped) – single end reads with length 150 – 280 bps ●Chromosome X Illumina reads – 278,557,156 reads (71.6% mapped) – (paired with insert size 200bps)
  • 25. How do we even know if we are winning?
  • 26.
  • 28. Filtering ●Processes that cause bogus SNPs – Incorrect global mapping – Incorrect local alignment – Poor quality reads – Sequence amplification errors
  • 29. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` ` ` ` ` ` `` ` ` `` ` chr m
  • 30. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` ` ` ` ` ` `` ` ` `` ` chr m
  • 31. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` `` ` ` ` ` ` ` ` `` `
  • 32. SNPs
  • 33. Solution: Filter out SNPs called from abnormally high read depths
  • 34. Global Mapping Problems ●Incorrectly aligned reads chr n ` score: 132 ` score: 136 chr m
  • 35. Solution: nd Filter out SNPs where 2 best score is too close
  • 36. Local Alignment Problems ●Misalignment – Uncaught incorrect global alignment – Variations in short repeats
  • 37. Local Misalignment Reference ...GGTCCCACAGAGCTGGAGAAAA... GGTCCCACT---CTAGTG CCACT---CTAGTGAAAA TCCCACT---CTAGTGAAAA Aligned reads Real SNPs?
  • 38. Local Misalignment Reference ..TAATAATAATAATAATAATAAGAAG.. AATAATAAGAAGAAGAAGAAGAAG AATAATAAGAAGAAGAAGAAGAAG AATAATAAGAAGAAGAAGAAGAAG Aligned reads Real SNPs?
  • 39. Solution: Filter out short blocks of many SNPs
  • 40. Venter SNP Calling (Capillary) count fraction in dbSNP Homozygous SNPs 1 347 806 97.1% Heterozygous SNPs 1 857 167 90.9% Total SNPs 3 204 973 93.5%
  • 41. Watson SNP Calling (454) count fraction in dbSNP Homozygous SNPs 1 298 309 93.0% Heterozygous SNPs 1 767 951 63.9% Total SNPs 3 066 260 76.3%
  • 42. X Chromosome SNPs (Solexa) count fraction in dbSNP Homozygous SNPs 27 708 92.8% Heterozygous SNPs 63 197 81.8% Total SNPs 90 905 85.1%
  • 43. Venter-Watson Overlap 1 593 791 1 611 182 1 455 078 Venter Watson
  • 44. X Chromosome Overlap Solexa X reads 40 625 19 978 12 590 17 712 26 502 6 588 22 872 Venter Watson
  • 45. Conclusions ●SSAHA_pileup is effective across both new and old sequencing technologies ●Questions – When is a SNP not a SNP? – Homozygous/Heterozygous SNPs
  • 46. Conclusions ●SSAHA_pileup is effective across both new and old sequencing technologies ●Questions – When is a SNP not a SNP? – Homozygous/Heterozygous SNPs ●Length matters...? – But it's what you do with it that counts
  • 47. Obtaining SSAHA_pileup SSAHA_pileup: ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/ SSAHA2: http://www.sanger.ac.uk/Software/analysis/SSAHA2/ These Slides: http://slideshare.net/bpb/