SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
Benchmarking results for the GIAB Ashkenazim Jewish Trio demon-
strating our recommendations for visualizing benchmarking results.
See Zook et al. 2019 for methods used to generate the dataset.
Figure 6. Example IGV Session for Manual Curation.
Interpreting Small Variant Benchmarking Results
Figure 5. Precision-Recall scatter plots. Points represent different
stratifications. Poor performing stratifications are in the top right
corner of the plots. Colors used to indicate low performing stratifi-
cations.
Manually curating a subset of FPs and FNs ensures they are errors in
the query set and helps determine biases responsible for the error.
Figure 4. Performance
metrics for high-level
stratifications. Colored
circles indicate the
observed performance
for the three genomes.
The gray bars repre-
sent the 95% highest
density intervals (HDI)
and black crossbars
the estimated mean
calculated using hap-
pyR (https://illumina.
github.io/happyR/).
1- Precision and 1-Recall scatter plots help
identify of poor performing stratifications.
For better separation plot 1 minus Metric on a log10
scale.
Confidence intervals aid stratification comparisons by accounting
for stratification size differences.
Structural variant (SV) benchmark set developed using variant calls and as-
semblies from multiple technologies (Fig. 8, Zook et al. 2019).
•	 Methods for structural variant benchmarking
•	 TRUVARI (https://github.com/spiralgenetics/truvari)
•	 SVanalyzer (https://svanalyzer.readthedocs.io/en/latest/)
Figure 8. SV Benchmark set development process. Theobjectiveisthat,when
comparedtonewcallset,mostFPandFNareerrorsinthecallset,notthebenchmark.
Structural Variant Benchmarking
New stratifications enabling
benchmarking of GRCh38 variant
calls.
Validating GRCh38 stratifications
by comparing GRCh37 and GRCh38
stratification genome coverage (Fig.
8), region size distributions (Fig. 9),
and benchmarking results.
GRCh38 Stratifications
Figure 9. Region size distribution comparison
between GRCh37 and GRCh38 functional region
stratifications.
Figure 9. Ratio of bases covered by the GRCh37andGRCh38functionalregionstratifications.
We extended the small variant benchmark set to include more challenging
regions such as low complexity and segmental duplications.
•	 We extended the benchmark set by using long- and linked-read data to char-
acterize low complexity regions and redefined filtering parameters for more
precise method specific filtering.
•	 The V4 benchmark set covers ~93.2% of the reference genome, a 5.4% in-
crease over V3.3.2 with GRCh37, including more SNPs, Indels, and bases in
segmental duplications (Fig . 7).
GIAB Small Variant V4 Benchmark Set
Subset v3.3.2 FNs v4 FNs
All SNPs 8,594 30,229
Low mappability 6,708 25,295
Segmental duplications 1,429 14,008
Benchmarking results are benchmark set dependent. For example, there
are ~3X more false negatives in the Illumina Real Time Genomics HG002
variant callset when compared to V4 versus V3.3.2 (Table 3).
Figure 7. Comparison of GIAB HG002 V3.3.2 and V4 beta benchmark set.
Table 3. Number of Illumina Real Time Genomes HG002 variant callset false negatives (FNs) total and in
low mappability and segmental duplications for V3.3.2 and V4.
The GIAB benchmark includes benchmark small variant calls and
regions (Fig. 1). When users compare a query variant call set to the
benchmark set differences are likely errors in the query set.
Figure 1. Small variant integration pipeline (Zook et al. 2019). Colored squares
represent different input small variant callsets. Tan hexagons indicate callset
specific parameters used to filter low confidence variants. A machine learning
outlier detection algorithm is used to identify callset specific high confidence
variants. Finally, a heuristic based arbitration algorithm is used to generate the
benchmark callset.
Outlier
Detection
Arbitration
Expert-Defined Parameters
and Heuristics
Computational Process
Legend
Input Callsets Filters High ConfidenceVariants
Benchmark Set
GIAB Small Variant Benchmark Set
•	 The GA4GH Benchmarking team has defined best practices for small variant
benchmarking including (Fig. 2);
•	 Variant comparison methods that account for differences in variant repre-
sentation (Fig. 3),
•	 Defining matching stringencies and performance metrics (Table 1), and
•	 Stratifying performance metrics by variant type and genomic context.
•	 The GA4GH best practices are summarized in Table 2 (Supplementary Table 1 of
Krusche et al. 2019).
•	 The benchmarking team implemented these best practices in the tool hap.py
(https://github.com/Illumina/hap.py), which is available as a web application on
the precisionFDA platform (https://precision.fda.gov, account required for ac-
cess, available upon request).
Small Variant Benchmarking
Figure 2. Benchmarking workflow described in Krusche et al. 2019 and imple-
mented in hap.py, https://github.com/Illumina/hap.py (Krusche et al. 2019).
Figure 3. Example al-
ternative variant rep-
resentations (Krusche
et al. 2019).
Table 1. GA4GH
match definitions
for true positives
(TP), false posi-
tives (FP), false
negatives (FN),
and matching
stringencies allele
match (AL) and
genotype match
(GT) (Krusche et
al. 2019).
Table 2. GA4GH germline variant call benchmarking best
practices (Krusche et al. 2019).
Started in 2012, over the last seven years the Genome in a Bottle
Consortium has developed and characterized the first set of human
whole genome reference materials. GIAB has developed small vari-
ant benchmark sets for 7 genomes (HG001 - HG007), NA12878 and
two trios from the Personal Genome Project. Approximately 90% of
non-gapped bases, primarily easy to characterize regions, of the ge-
nome are covered by the small variant benchmark sets.
2012 GIAB Consortium formed – no human genome Reference Materials
2014 Small variant genotypes for ~77% of pilot genome NA12878
2015 NIST releases first human genome Reference Material
2016 Small variants for 90% of 7 genomes for GRCh37/38
2018 Draft SV Benchmark
2019+ Characterizing difficult variants and regions
Genome In A Bottle
Integrating variant calls across laboratories and sequencing methods requires a comprehensive
understanding of false positive and false negative rates for different types of variants in diverse ge-
nome contexts. Similarly, the clinical use of genomics requires validation of genome sequencing
and variant calling methods. Clinical laboratories can validate their sequencing and variant calling
methods using the GA4GH benchmarking tool and the NIST human genome reference materials de-
veloped with the Genome In A Bottle (GIAB) Consortium. The GIAB reference materials are author-
itatively characterized for differences to the human reference genome and provided as benchmark
sets enabling variant call benchmarking. The benchmark sets include benchmark calls, well-char-
acterized sequence differences from the reference genome, and benchmark regions, positions that
agree with the reference genome or a benchmark call. The benchmarking tool compares a user-pro-
vided variant callset to the GIAB benchmark set (benchmark variant calls and benchmark regions) in
a sophisticated manner that accounts for variant representation differences and disagreement types.
The tool calculates performance metrics stratified by genomic context from the comparison results.
Stratified performance metrics allow users to critically evaluate method performance. However, the
benchmarking tool’s capabilities are limited due to; a lack of resources for benchmarking against
GRCh38, difficult to interpret output, and a limited challenging variants in the GIAB benchmark sets.
To address these challenges we are developing resourcesforbenchmarkingagainstGRCh38anda sum-
mary report to simplify interpretation of the benchmarking results. We are also expanding the GIAB
characterization genomic coverage to include more challenging regions such as low complexity re-
gions and segmental duplications. Moreover, the GIAB consortium is developing benchmark sets
and benchmarking methods for validating structural variants, allowing for high confidence structur-
al variant detection. Expanding the GIAB genome characterization and improving the interoperabili-
ty of the GA4GH benchmarking tool will inevitably help clinical genomics reach its full potential.
Abstract
Assuring Data Quality with Variant Benchmarking tools from GIAB and GA4GH
N. D. Olson1
, J. McDaniel1
,J. Wagner1
, J. M. Zook1
, the GA4GH Benchmarking Team, and the Genome In A Bottle Consortium
1
Biosystems and Biomaterials Division, Materials Measurement Laboratory, NIST
Benchmarking Results Interpretation
•	 Developing a benchmarking report document will guide the identifica-
tion of challenging genomic context and variant types.
•	 Development of a new stratification approach accounting for genome
context interactions and data driven stratification definitions.
GIAB Small Variant V4 Benchmark Set
•	 Currently released as a draft, with official release later this year for HG002.
•	 V4 benchmark sets for the other 6 GIAB genomes is expected next year.
GRCh38 Stratifications
•	 Currently undergoing validation and will release soon.
Structural Variant Benchmarking
•	 Working on a new structural variant benchmark set utilizing recent se-
quencing technology advances, including PacBio HiFi sequencing and Ox-
ford Nanopore ultra-long reads.
•	 Integration pipeline development is underway enabling the transparent
and streamlined generation of new benchmark sets for all 7 GIAB ge-
nomes against both GRCh37 and GRCh38 reference genomes.
Somatic variant and diploid assembly benchmarking.
Future Work
We would like to thank our collaborators on the GA4GH Benchmarking Team and Genome in a Bottle Con-
sortium.
Opinions expressed in this paper are the authors’and do not necessarily reflect the policies and views of
NIST, or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this
poster only to specify the experimental procedure adequately. Such identification is not intended to imply
recommendation or endorsement by the NIST, nor is it intended to imply that the materials or equipment
identified are necessarily the best available for the purpose.
Official contribution of NIST; not subject to copyrights in USA.
Acknowledgments
•	 Krusche, Peter, et al.“Best practices for benchmarking germline small-variant
calls in human genomes.” Nature biotechnology 37.5 (2019): 555.
•	 Zook, Justin M., et al.“An open resource for accurately benchmarking small
variant and reference calls.” Nature biotechnology 37.5 (2019): 561.
•	 Zook, Justin M., et al.“A robust benchmark for germline structural variant de-
tection.” BioRxiv (2019): 664623.
References

Mais conteúdo relacionado

Mais procurados

Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
GenomeInABottle
 

Mais procurados (20)

GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 

Semelhante a Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH

Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
Cameron Locker, MPH
 
Final Report
Final ReportFinal Report
Final Report
imu409
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
javed khan
 

Semelhante a Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH (20)

Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
 
Final Report
Final ReportFinal Report
Final Report
 
Classification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s EntropyClassification accuracy analyses using Shannon’s Entropy
Classification accuracy analyses using Shannon’s Entropy
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
 
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
Optimization techniques on fuzzy inference systems to detect Xanthomonas camp...
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Kk341721880
Kk341721880Kk341721880
Kk341721880
 

Mais de GenomeInABottle

Mais de GenomeInABottle (11)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
 
How giab fits in the rest of the world introduction
How giab fits in the rest of the world introductionHow giab fits in the rest of the world introduction
How giab fits in the rest of the world introduction
 

Último

Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Sheetaleventcompany
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
amritaverma53
 
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Sheetaleventcompany
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan 087776558899
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
Sheetaleventcompany
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Sheetaleventcompany
 

Último (20)

❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
❤️Call Girl Service In Chandigarh☎️9814379184☎️ Call Girl in Chandigarh☎️ Cha...
 
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
Exclusive Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangal...
 
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
 
Cardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationCardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their Regulation
 
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
Call Girls Bangalore - 450+ Call Girl Cash Payment 💯Call Us 🔝 6378878445 🔝 💃 ...
 
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
Call Girl In Indore 📞9235973566📞 Just📲 Call Inaaya Indore Call Girls Service ...
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
 
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
Goa Call Girl Service 📞9xx000xx09📞Just Call Divya📲 Call Girl In Goa No💰Advanc...
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
 
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
 
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptxANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
 
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
💰Call Girl In Bangalore☎️7304373326💰 Call Girl service in Bangalore☎️Bangalor...
 
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowChennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryCall 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
 
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
❤️Chandigarh Escorts Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ ...
 
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
Chandigarh Call Girls Service ❤️🍑 9809698092 👄🫦Independent Escort Service Cha...
 
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)🎄Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
 
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
Kolkata Call Girls Service ❤️🍑 9xx000xx09 👄🫦 Independent Escort Service Kolka...
 

Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH

  • 1. Benchmarking results for the GIAB Ashkenazim Jewish Trio demon- strating our recommendations for visualizing benchmarking results. See Zook et al. 2019 for methods used to generate the dataset. Figure 6. Example IGV Session for Manual Curation. Interpreting Small Variant Benchmarking Results Figure 5. Precision-Recall scatter plots. Points represent different stratifications. Poor performing stratifications are in the top right corner of the plots. Colors used to indicate low performing stratifi- cations. Manually curating a subset of FPs and FNs ensures they are errors in the query set and helps determine biases responsible for the error. Figure 4. Performance metrics for high-level stratifications. Colored circles indicate the observed performance for the three genomes. The gray bars repre- sent the 95% highest density intervals (HDI) and black crossbars the estimated mean calculated using hap- pyR (https://illumina. github.io/happyR/). 1- Precision and 1-Recall scatter plots help identify of poor performing stratifications. For better separation plot 1 minus Metric on a log10 scale. Confidence intervals aid stratification comparisons by accounting for stratification size differences. Structural variant (SV) benchmark set developed using variant calls and as- semblies from multiple technologies (Fig. 8, Zook et al. 2019). • Methods for structural variant benchmarking • TRUVARI (https://github.com/spiralgenetics/truvari) • SVanalyzer (https://svanalyzer.readthedocs.io/en/latest/) Figure 8. SV Benchmark set development process. Theobjectiveisthat,when comparedtonewcallset,mostFPandFNareerrorsinthecallset,notthebenchmark. Structural Variant Benchmarking New stratifications enabling benchmarking of GRCh38 variant calls. Validating GRCh38 stratifications by comparing GRCh37 and GRCh38 stratification genome coverage (Fig. 8), region size distributions (Fig. 9), and benchmarking results. GRCh38 Stratifications Figure 9. Region size distribution comparison between GRCh37 and GRCh38 functional region stratifications. Figure 9. Ratio of bases covered by the GRCh37andGRCh38functionalregionstratifications. We extended the small variant benchmark set to include more challenging regions such as low complexity and segmental duplications. • We extended the benchmark set by using long- and linked-read data to char- acterize low complexity regions and redefined filtering parameters for more precise method specific filtering. • The V4 benchmark set covers ~93.2% of the reference genome, a 5.4% in- crease over V3.3.2 with GRCh37, including more SNPs, Indels, and bases in segmental duplications (Fig . 7). GIAB Small Variant V4 Benchmark Set Subset v3.3.2 FNs v4 FNs All SNPs 8,594 30,229 Low mappability 6,708 25,295 Segmental duplications 1,429 14,008 Benchmarking results are benchmark set dependent. For example, there are ~3X more false negatives in the Illumina Real Time Genomics HG002 variant callset when compared to V4 versus V3.3.2 (Table 3). Figure 7. Comparison of GIAB HG002 V3.3.2 and V4 beta benchmark set. Table 3. Number of Illumina Real Time Genomes HG002 variant callset false negatives (FNs) total and in low mappability and segmental duplications for V3.3.2 and V4. The GIAB benchmark includes benchmark small variant calls and regions (Fig. 1). When users compare a query variant call set to the benchmark set differences are likely errors in the query set. Figure 1. Small variant integration pipeline (Zook et al. 2019). Colored squares represent different input small variant callsets. Tan hexagons indicate callset specific parameters used to filter low confidence variants. A machine learning outlier detection algorithm is used to identify callset specific high confidence variants. Finally, a heuristic based arbitration algorithm is used to generate the benchmark callset. Outlier Detection Arbitration Expert-Defined Parameters and Heuristics Computational Process Legend Input Callsets Filters High ConfidenceVariants Benchmark Set GIAB Small Variant Benchmark Set • The GA4GH Benchmarking team has defined best practices for small variant benchmarking including (Fig. 2); • Variant comparison methods that account for differences in variant repre- sentation (Fig. 3), • Defining matching stringencies and performance metrics (Table 1), and • Stratifying performance metrics by variant type and genomic context. • The GA4GH best practices are summarized in Table 2 (Supplementary Table 1 of Krusche et al. 2019). • The benchmarking team implemented these best practices in the tool hap.py (https://github.com/Illumina/hap.py), which is available as a web application on the precisionFDA platform (https://precision.fda.gov, account required for ac- cess, available upon request). Small Variant Benchmarking Figure 2. Benchmarking workflow described in Krusche et al. 2019 and imple- mented in hap.py, https://github.com/Illumina/hap.py (Krusche et al. 2019). Figure 3. Example al- ternative variant rep- resentations (Krusche et al. 2019). Table 1. GA4GH match definitions for true positives (TP), false posi- tives (FP), false negatives (FN), and matching stringencies allele match (AL) and genotype match (GT) (Krusche et al. 2019). Table 2. GA4GH germline variant call benchmarking best practices (Krusche et al. 2019). Started in 2012, over the last seven years the Genome in a Bottle Consortium has developed and characterized the first set of human whole genome reference materials. GIAB has developed small vari- ant benchmark sets for 7 genomes (HG001 - HG007), NA12878 and two trios from the Personal Genome Project. Approximately 90% of non-gapped bases, primarily easy to characterize regions, of the ge- nome are covered by the small variant benchmark sets. 2012 GIAB Consortium formed – no human genome Reference Materials 2014 Small variant genotypes for ~77% of pilot genome NA12878 2015 NIST releases first human genome Reference Material 2016 Small variants for 90% of 7 genomes for GRCh37/38 2018 Draft SV Benchmark 2019+ Characterizing difficult variants and regions Genome In A Bottle Integrating variant calls across laboratories and sequencing methods requires a comprehensive understanding of false positive and false negative rates for different types of variants in diverse ge- nome contexts. Similarly, the clinical use of genomics requires validation of genome sequencing and variant calling methods. Clinical laboratories can validate their sequencing and variant calling methods using the GA4GH benchmarking tool and the NIST human genome reference materials de- veloped with the Genome In A Bottle (GIAB) Consortium. The GIAB reference materials are author- itatively characterized for differences to the human reference genome and provided as benchmark sets enabling variant call benchmarking. The benchmark sets include benchmark calls, well-char- acterized sequence differences from the reference genome, and benchmark regions, positions that agree with the reference genome or a benchmark call. The benchmarking tool compares a user-pro- vided variant callset to the GIAB benchmark set (benchmark variant calls and benchmark regions) in a sophisticated manner that accounts for variant representation differences and disagreement types. The tool calculates performance metrics stratified by genomic context from the comparison results. Stratified performance metrics allow users to critically evaluate method performance. However, the benchmarking tool’s capabilities are limited due to; a lack of resources for benchmarking against GRCh38, difficult to interpret output, and a limited challenging variants in the GIAB benchmark sets. To address these challenges we are developing resourcesforbenchmarkingagainstGRCh38anda sum- mary report to simplify interpretation of the benchmarking results. We are also expanding the GIAB characterization genomic coverage to include more challenging regions such as low complexity re- gions and segmental duplications. Moreover, the GIAB consortium is developing benchmark sets and benchmarking methods for validating structural variants, allowing for high confidence structur- al variant detection. Expanding the GIAB genome characterization and improving the interoperabili- ty of the GA4GH benchmarking tool will inevitably help clinical genomics reach its full potential. Abstract Assuring Data Quality with Variant Benchmarking tools from GIAB and GA4GH N. D. Olson1 , J. McDaniel1 ,J. Wagner1 , J. M. Zook1 , the GA4GH Benchmarking Team, and the Genome In A Bottle Consortium 1 Biosystems and Biomaterials Division, Materials Measurement Laboratory, NIST Benchmarking Results Interpretation • Developing a benchmarking report document will guide the identifica- tion of challenging genomic context and variant types. • Development of a new stratification approach accounting for genome context interactions and data driven stratification definitions. GIAB Small Variant V4 Benchmark Set • Currently released as a draft, with official release later this year for HG002. • V4 benchmark sets for the other 6 GIAB genomes is expected next year. GRCh38 Stratifications • Currently undergoing validation and will release soon. Structural Variant Benchmarking • Working on a new structural variant benchmark set utilizing recent se- quencing technology advances, including PacBio HiFi sequencing and Ox- ford Nanopore ultra-long reads. • Integration pipeline development is underway enabling the transparent and streamlined generation of new benchmark sets for all 7 GIAB ge- nomes against both GRCh37 and GRCh38 reference genomes. Somatic variant and diploid assembly benchmarking. Future Work We would like to thank our collaborators on the GA4GH Benchmarking Team and Genome in a Bottle Con- sortium. Opinions expressed in this paper are the authors’and do not necessarily reflect the policies and views of NIST, or affiliated venues. Certain commercial equipment, instruments, or materials are identified in this poster only to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose. Official contribution of NIST; not subject to copyrights in USA. Acknowledgments • Krusche, Peter, et al.“Best practices for benchmarking germline small-variant calls in human genomes.” Nature biotechnology 37.5 (2019): 555. • Zook, Justin M., et al.“An open resource for accurately benchmarking small variant and reference calls.” Nature biotechnology 37.5 (2019): 561. • Zook, Justin M., et al.“A robust benchmark for germline structural variant de- tection.” BioRxiv (2019): 664623. References