SlideShare a Scribd company logo
1 of 29
Download to read offline
Performance Metric & Figures of Merit


         David Jenkins on behalf of
             Justin H. Johnson
         Director of Bioinformatics
CLIA #21D2039005   MD State License 1853
Who are we?

• Justin Johnson
  – Managing Director of Services
  – Director of Bioinformatics
  – 10 Years at JCVI before EdgeBio
  – Project Manager - Archon Genomics XPrize
• EdgeBio
  – CLIA Lab
  – Illumina Hiseq & Miseq, Ion Proton & PGM
Overview – GIAB as I See It.

•   Which genomes?
•   How do we sequence them?
•   How do we analyze them?
•   How do we enable their usage?
Overview
 Bioinformatics                              Experimental Data
Data Integration                             • Sequence Data & Variation
                                             • Metadata
/ Representation


                                                                            Database
                   Refine and Feedback                                      • RM vs. Reference
                                                                            • Every Base




                       Compare and Report                            Visualize and Filter
                       • Single Genome Browser                       • Browser over DB
                       • ValidationProtocol.org                      • Query by Experiment Data




         Experimental Data = Combination of Prep / Sequencing / Analysis
Experimental Data
• GetRM Model for Collection
    – http://www.ncbi.nlm.nih.gov/projects/variation/get-rm/
• Preparation
   – Link to published prep protocol
   – ROI in Bed/GFF/GBK Format
• Sequencing
   – Platform Information (Minimally - Name)
   – Chemistry (Minimally - Version)
• Analysis
   – Link to published analysis protocol or best practices
   – Read Data (fastq, sra, hdf5, others)
   – Alignment/Assembly Data (bam)
       • Minimal Tag Set TBD
   – Variation (VCF or gVCF)
       • Minimal Tag Set TBD in INFO field of VCF or define external XSD
       • https://sites.google.com/site/gvcftools/home/about-gvcf
gVCF




       https://sites.google.com/site/gvcftools/home/about-gvcf
Meta Data
• All required fields in VCF 4.1
• Others (Examples)
   –   AA : ancestral allele
   –   AC : allele count in genotypes, for each ALT allele, in the same order as listed
   –   AF : allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not
       called genotypes
   –   AN : total number of alleles in called genotypes
   –   BQ : RMS base quality at this position
   –   CIGAR : cigar string describing how to align an alternate allele to the reference allele
   –   DB : dbSNP membership
   –   DP : combined depth across samples, e.g. DP=154
   –   END : end position of the variant described in this record (for use with symbolic alleles)
   –   H2 : membership in hapmap2
   –   VALIDATED : validated by follow-up experiment

• Reference Block Implementations
• Handle Indel Conflicts and Resolution
• Genotype Quality for non-variant sites (GQX)
Database

• Store Each Base + Meta of RM versus Reference for each
  Experiment from gVCF
   – Distinguish missing versus homozygous reference
   – Include copy number and phasing when available, not
     required
• Engine that drives front end visualization (Genome Browser)
• Build on GetRM/NCBI Database Work
Visualize and Filter
•   Build on GetRM/NCBI Browser Work
•   Single RM -> Many Experiments
•   Not all metadata will be visual, but most/all will be filterable
•   Filter data to generate ROI or VOI
    – Canned: i.e. Intersect of All Platforms + Analysis, All OMIM SNPs,
      Clinical Cert SNV List, etc
    – Dynamic: allowing people to explore prep, sequence, or analysis bias
• Slice, Dice, Export VOI to compare and reporting SW
• Allow user defined tracks
• By product is community educational resource
    – I have a ROI for a test and want to know what platform, prep, exome
      kit version, etc covers it best. What do I do?
Parallel Database, Filter Effort (Gemini)
     Quinlan Lab at UVA - https://github.com/arq5x/gemini


 • Gemini – simple, flexible, and powerful
   framework for exploring genetic variation
 • Basic browser capabilities being developed
 • Flexible custom annotation and metadata
   addition to DB
 • Leverage the expressive power of SQL while
   overcoming fundamental challenges associated
   with using databases for very large datasets
Gemini




         http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf
Gemini




         http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf
Gemini




         http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf
Compare and Reporting

• Take in ROI or VOI from the visualize and filter stage
• Take in user defined VOI or VOI + ROI
• Leverage SW under ValidationProtocol.org to generate reports
  and files including BNLT:
   – Summary of completeness, accuracy, phasing
   – Discordant variants in VCF
   – Concordant variants in VCF
   – Phasing errors in VCF
• Provide intuitive way to feed these resultants in downstream
  analysis SW (VarinatViz, IO8) or back into browser (User
  Defined Track)
• $10 million prize competition to showcase whole
  genome sequencing technology
• Award to the team(s) who can most completely,
  accurately and affordably sequence 100 human
  genomes in 30 days or less
• Competing Teams will sequence the genomes of the
  100 centenarians who have evaded the usual
  diseases of aging such as heart disease, diabetes,
  cancer and Alzheimer’s
AGXP Validation Study Overview
AGXP Validation Study Analysis

• 2 Major Phases using NA19239 and NA12878
   – Develop Reference Standards
    • Fosmid Reconstruction, Variation Discovery
    • Technology Comparison and Bias Removal
  – Develop Performance Metrics
    • Software Development
    • Help labs use the data
Compare and Report

• The validationprotocol.org website provides a
  simple way for anyone to compare their
  variant calls against the public reference
  genomes.
• Encourages submission and analysis in public
  tools like Galaxy through transparent
  interoperability with GenomeSpace.
Compare and Report
Compare and Report
Compare and Report
Follow On

• Export different categories
  (Concordant/Discordant/Phasing Error)
  variants to VariantViz IO8
• Visualize Quality, Allele Frequencies, Depth,
  etc Info to detect patterns in and between
  variant categories
Concordant SNPs




Potential false positives
Xprize Team
• Justin H. Johnson and Team - EdgeBio

• Brad Chapman Harvard: automated high-throughput analysis pipelines
  with custom visualization and processing tools

• Gabor Marth Boston College: Read mapping, single-nucleotide and
  insertion-deletion polymorphism detection, and discovery of structural
  variants.

• Aaron Quinlin University of Virginia: structural variation (SV)

• Granger Sutton JCVI: Oversight Committee

• Victor Jongeneel University of Illinois and NCSA: Oversight Committee

• Larry Kedes UCLA: Oversight Committee
EdgeBio Team

• LAB                    • IFX
  – Joy Adigun              – David Jenkins
  – Ryan Mease              – Anju Varadarajan
  – Jennifer Sheffield      – Vani Rajan
  – Aaron Johnson           – Karthik Kota
  – Jackie Jackson          – Phil Dagasto
              • Adam Bennett
              • Isabel Llorente
Thank You!
      More info available at
       http://bit.ly/agxpval
http://www.genomeinabottle.org

More Related Content

Similar to Mar2013 Performance Metrics Working Group

Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012GenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisDmitry Grapov
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slidesGenomeInABottle
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGenomeInABottle
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreCGIAR Generation Challenge Programme
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsAltuna Akalin
 
Functional ANNOTATION OF GENOME.pptx
Functional ANNOTATION OF GENOME.pptxFunctional ANNOTATION OF GENOME.pptx
Functional ANNOTATION OF GENOME.pptxUmerjibranRaza
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 

Similar to Mar2013 Performance Metrics Working Group (20)

Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012Performance Metrics and Figures of Merit Working Group Summary Aug2012
Performance Metrics and Figures of Merit Working Group Summary Aug2012
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
Multivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysisMultivarite and network tools for biological data analysis
Multivarite and network tools for biological data analysis
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
 
Gnc march 2012
Gnc march 2012Gnc march 2012
Gnc march 2012
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomics
 
Functional ANNOTATION OF GENOME.pptx
Functional ANNOTATION OF GENOME.pptxFunctional ANNOTATION OF GENOME.pptx
Functional ANNOTATION OF GENOME.pptx
 
Amy Driskell - Information management and data Quality
Amy Driskell - Information management and data QualityAmy Driskell - Information management and data Quality
Amy Driskell - Information management and data Quality
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 

More from GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyGenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GenomeInABottle
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphsGenomeInABottle
 

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 

Mar2013 Performance Metrics Working Group

  • 1. Performance Metric & Figures of Merit David Jenkins on behalf of Justin H. Johnson Director of Bioinformatics
  • 2. CLIA #21D2039005 MD State License 1853
  • 3. Who are we? • Justin Johnson – Managing Director of Services – Director of Bioinformatics – 10 Years at JCVI before EdgeBio – Project Manager - Archon Genomics XPrize • EdgeBio – CLIA Lab – Illumina Hiseq & Miseq, Ion Proton & PGM
  • 4. Overview – GIAB as I See It. • Which genomes? • How do we sequence them? • How do we analyze them? • How do we enable their usage?
  • 5. Overview Bioinformatics Experimental Data Data Integration • Sequence Data & Variation • Metadata / Representation Database Refine and Feedback • RM vs. Reference • Every Base Compare and Report Visualize and Filter • Single Genome Browser • Browser over DB • ValidationProtocol.org • Query by Experiment Data Experimental Data = Combination of Prep / Sequencing / Analysis
  • 6. Experimental Data • GetRM Model for Collection – http://www.ncbi.nlm.nih.gov/projects/variation/get-rm/ • Preparation – Link to published prep protocol – ROI in Bed/GFF/GBK Format • Sequencing – Platform Information (Minimally - Name) – Chemistry (Minimally - Version) • Analysis – Link to published analysis protocol or best practices – Read Data (fastq, sra, hdf5, others) – Alignment/Assembly Data (bam) • Minimal Tag Set TBD – Variation (VCF or gVCF) • Minimal Tag Set TBD in INFO field of VCF or define external XSD • https://sites.google.com/site/gvcftools/home/about-gvcf
  • 7. gVCF https://sites.google.com/site/gvcftools/home/about-gvcf
  • 8. Meta Data • All required fields in VCF 4.1 • Others (Examples) – AA : ancestral allele – AC : allele count in genotypes, for each ALT allele, in the same order as listed – AF : allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes – AN : total number of alleles in called genotypes – BQ : RMS base quality at this position – CIGAR : cigar string describing how to align an alternate allele to the reference allele – DB : dbSNP membership – DP : combined depth across samples, e.g. DP=154 – END : end position of the variant described in this record (for use with symbolic alleles) – H2 : membership in hapmap2 – VALIDATED : validated by follow-up experiment • Reference Block Implementations • Handle Indel Conflicts and Resolution • Genotype Quality for non-variant sites (GQX)
  • 9. Database • Store Each Base + Meta of RM versus Reference for each Experiment from gVCF – Distinguish missing versus homozygous reference – Include copy number and phasing when available, not required • Engine that drives front end visualization (Genome Browser) • Build on GetRM/NCBI Database Work
  • 10. Visualize and Filter • Build on GetRM/NCBI Browser Work • Single RM -> Many Experiments • Not all metadata will be visual, but most/all will be filterable • Filter data to generate ROI or VOI – Canned: i.e. Intersect of All Platforms + Analysis, All OMIM SNPs, Clinical Cert SNV List, etc – Dynamic: allowing people to explore prep, sequence, or analysis bias • Slice, Dice, Export VOI to compare and reporting SW • Allow user defined tracks • By product is community educational resource – I have a ROI for a test and want to know what platform, prep, exome kit version, etc covers it best. What do I do?
  • 11. Parallel Database, Filter Effort (Gemini) Quinlan Lab at UVA - https://github.com/arq5x/gemini • Gemini – simple, flexible, and powerful framework for exploring genetic variation • Basic browser capabilities being developed • Flexible custom annotation and metadata addition to DB • Leverage the expressive power of SQL while overcoming fundamental challenges associated with using databases for very large datasets
  • 12. Gemini http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf
  • 13. Gemini http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf
  • 14. Gemini http://dl.dropbox.com/u/515640/posters_and_slides/Quinlan-Gemini-Poster.pdf
  • 15. Compare and Reporting • Take in ROI or VOI from the visualize and filter stage • Take in user defined VOI or VOI + ROI • Leverage SW under ValidationProtocol.org to generate reports and files including BNLT: – Summary of completeness, accuracy, phasing – Discordant variants in VCF – Concordant variants in VCF – Phasing errors in VCF • Provide intuitive way to feed these resultants in downstream analysis SW (VarinatViz, IO8) or back into browser (User Defined Track)
  • 16. • $10 million prize competition to showcase whole genome sequencing technology • Award to the team(s) who can most completely, accurately and affordably sequence 100 human genomes in 30 days or less • Competing Teams will sequence the genomes of the 100 centenarians who have evaded the usual diseases of aging such as heart disease, diabetes, cancer and Alzheimer’s
  • 18. AGXP Validation Study Analysis • 2 Major Phases using NA19239 and NA12878 – Develop Reference Standards • Fosmid Reconstruction, Variation Discovery • Technology Comparison and Bias Removal – Develop Performance Metrics • Software Development • Help labs use the data
  • 19. Compare and Report • The validationprotocol.org website provides a simple way for anyone to compare their variant calls against the public reference genomes. • Encourages submission and analysis in public tools like Galaxy through transparent interoperability with GenomeSpace.
  • 23. Follow On • Export different categories (Concordant/Discordant/Phasing Error) variants to VariantViz IO8 • Visualize Quality, Allele Frequencies, Depth, etc Info to detect patterns in and between variant categories
  • 25.
  • 26.
  • 27. Xprize Team • Justin H. Johnson and Team - EdgeBio • Brad Chapman Harvard: automated high-throughput analysis pipelines with custom visualization and processing tools • Gabor Marth Boston College: Read mapping, single-nucleotide and insertion-deletion polymorphism detection, and discovery of structural variants. • Aaron Quinlin University of Virginia: structural variation (SV) • Granger Sutton JCVI: Oversight Committee • Victor Jongeneel University of Illinois and NCSA: Oversight Committee • Larry Kedes UCLA: Oversight Committee
  • 28. EdgeBio Team • LAB • IFX – Joy Adigun – David Jenkins – Ryan Mease – Anju Varadarajan – Jennifer Sheffield – Vani Rajan – Aaron Johnson – Karthik Kota – Jackie Jackson – Phil Dagasto • Adam Bennett • Isabel Llorente
  • 29. Thank You! More info available at http://bit.ly/agxpval http://www.genomeinabottle.org