Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Giab aug2015 intro and update 150821.pptx
1. genomeinabottle.org
Genome in a Bottle Consortium
August 2015
NIST, Gaithersburg, MD
Reference Materials for Clinical Applications of
Human Genome Sequencing
Marc Salit, Ph.D. and Justin Zook, Ph.D
National Institute of Standards and Technology
3. genomeinabottle.org
GIAB Scope
• The Genome in a Bottle Consortium is
developing the reference materials, reference
methods, and reference data needed to
assess confidence in human whole genome
variant calls.
• A principal motivation for this consortium is to
enable performance assessment of
sequencing and science-based regulatory
oversight of clinical sequencing.
4. genomeinabottle.org
Genome in a Bottle
Consortium Development
• NIST met with sequencing
technology developers to assess
standards needs
– Stanford, June 2011
• Open, exploratory workshop
– ASHG, Montreal, Canada
– October 2011
• Small workshop at NIST to develop
consortium for human genome
reference materials
– FDA, NCBI, NHGRI, NCI, CDC, Wash
U, Broad, technology developers,
clinical labs, CAP, PGP, Partners,
ABRF, others
– developed draft work plan
– April 2012
• Open, public meetings of GIAB
– August 2012 at NIST
– March 2013 at Xgen
– August 2013 at NIST
– January 2014 at Stanford
– August 2014 at NIST
– January 2015 at Stanford
– August 2015 at NIST
– January 28-29, 2015 at Stanford
• Website
– www.genomeinabottle.org
5. genomeinabottle.org
Well-characterized, stable RMs
• Obtain metrics for validation,
QC, QA, PT
• Determine sources and types
of bias/error
• Learn to resolve difficult
structural variants
• Improve reference genome
assembly
• Optimization
– integration of data from
multiple platforms
– sequencing and analysis
• Enable regulated applications Comparison of SNP Calls for
NA12878 on 2 platforms, 3
analysis methods
6. genomeinabottle.org
NGS Validation Process using
Genomes in Bottles
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence
Estimates
Downstream
Analysis
Analytical Process
Genome in a Bottle Scope
Pre-Analytical Process
Clinical Interpretation
GIAB
Data
7. genomeinabottle.org
Genome in a Bottle Consortium (GIAB)
Hosted by US National Institute of Standards and Technology
Goal: Provide infrastructure for
performance assessment of NGS
• Appropriately consented widely
available DNA samples, distributed by
the Coriell Institute
– Also, QCed Reference Material (RM)
versions from controlled lots will be
available from NIST
– Pilot NIST RM 8398:
tinyurl.com/giabpilot
• High-accuracy reference data for these
samples
• Tools to facilitate their use
– With the Global Alliance Data Working
Group Benchmarking Team
ga4gh.org
8. genomeinabottle.org
High-confidence SNP/indel calls
Zook et al., Nature Biotechnology, 2014.
• methods to develop
SNP/indel call set
described in manuscript
• broad and quick
adoption of call set for
benchmarking
– struck nerve
9. genomeinabottle.org
Highlights
This workshop
• Progress Update
• Breakouts
– Analyses for PGP GIAB Trios
– Other RMs
• GIAB Roadmap
– Coordinating analyses
– Other RM plans
– Papers?
• Using GIAB Products for
analytical validation of clinical
NGS assays
Future GIAB work
• Beyond support,
improvement/development
and maintenance of existing
GIAB products…
– What future work should
GIAB do that would uniquely
take advantage of the
momentum we’ve built?
10. genomeinabottle.org
Agenda
Thursday
• Welcome and Status Update
• Break
• Breakout presentations
– Analysis Team
– Other Reference Materials
• Lunch (on your own in
cafeteria)
• GIAB Roadmap
• Break
• Breakouts to plan to carry out
the roadmap
• Plenary to discuss Roadmap
plans
Friday
• Additional Analysis breakout
if needed
• Using GIAB products for
Analytical Validation
• Break
• GIAB products for analytical
validation?
• Lunch (on your own in
cafeteria)
• Steering committee meeting
11. genomeinabottle.org
Agenda
Monday
• Breakfast and registration
• Welcome and Context Setting
• NIST RM Update and Status Report
• Charge to Working Groups
• Coffee Break
• Working Group Breakout Discussions
• Lunch (provided)
• Informal Working Group Reports
• Coffee Break
• Breakout Topical Discussions
– Topic #1: Moving beyond the 'easy'
variants and regions of the genome
– Topic #2: Selecting future genomes for
Reference Materials
Tuesday
• Breakfast and registration
• Use cases: Experiences using the pilot
Reference Material
• Discussion of plans to release pilot
Reference Material
• Coffee Break
• Working Group Breakout discussions
• Lunch (provided)
• Working Group leaders present plans
and discussion
• Steering committee Overview
• First meeting of the Steering
Committee (others adjourn)
Please Note
Slides will be made available on SlideShare after
the workshop (see genomeinabottle.org).
Tweets are welcome unless the speaker requests
otherwise. Please use #giab as the hashtag.
12. GIAB Roadmap: Where are we,
Where are we going?
• Reference Materials
– Germline
– Somatic
• Informatics
– Analysis of GIAB data
– Benchmarking
• Documentary Standards/Publications
– Documentation of methods
– Supporting Use
13. GIAB
Germline Genomes
Pilot RM
High-confidence
SNPs/indels
RM Release
High-confidence
SVs
PGP RMs
High-confidence
SNPs/indels
RM Release
High-confidence
SVs
Other ancestries
Do we need trios?
Other large
families?
Sample panels
Many samples with
clinically important
mutations
Pharmacogenomics
In depth analyses
Characterize
harder parts of the
genome
Diploid de novo
assemblies
Assign confidence
scores to variants
in RMs
Somatic mutation
RMs
Interlaboratory
study
ctDNA/cfDNA/fetal
DNA
Whole cancer
genomes
Benchmarking
tools
Define
performance
metrics
Stratification -
Assign confidence
to types of variants
Documents/Publica
tions
Analyses
Best
practices/analytic
validation
Documentary
standards
14. genomeinabottle.org
Others working in this space…
Well-characterized genomes
• Illumina Platinum Genomes
• CDC GeT-RM
• Korean Genome Project
• Human Longevity, Inc.
• Hyditaform mole haploid
cell line
• Genome Reference
Consortium
• 1000 Genomes SV group
Performance Metrics
• Global Alliance for
Genomics and Health
Benchmarking Team
• NCBI/CDC GeT-RM Browser
• GCAT website
15. What should GIAB do?
• Beyond support, improvement/development
and maintenance of existing in--process GIAB
products…
– What future work should GIAB do that would take
advantage of the momentum and unique
community we’ve built?
17. genomeinabottle.org
NIST Human Genome
Reference Materials (RMs)
• NIST RM 8398 is available!
– tinyurl.com/giabpilot
– DNA isolated from large
growth cell cultures
– Stable, homogeneous
– Best for regulated uses
– DNA from same cell line at
Coriell (NA12878)
• New AJ and Asian Samples
– Available from Coriell now
– NIST RM available in 2016
18. genomeinabottle.org
Using high-confidence NIST-GIAB
genotypes for NA12878
• NIST have released
several versions of high-
confidence genotypes
for its pilot RM
• These data are
presently being used for
benchmarking
– prior to release of RMs
– SNPs & indels
• ~77% of the genome
•Data on FTP now well-organized
20. genomeinabottle.org
GeT-RM Browser from NCBI and CDC
• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/
• Allows visualization of data underlying call each call
21. genomeinabottle.org
Uses of GIAB NA12878
Oncology – Molecular and Cellular Tumor Markers
“Next Generation” Sequencing (NGS) guidelines for
somatic genetic variant detection
www.bioplanet.com/gcat
22. genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
• Formed June 2014 to develop
methods and tools for comparing
variant calls to a benchmark
• Developed standardized definitions
for performance metrics like TP, FP,
and FN.
• Initial focus on germline SNPs/indels
• Developing benchmarking tools
• Comparison engine
• Pluggable web interface with
modules for:
• Reporting/calculation of metrics
• Visualization/user interface
• Working with Genome in a Bottle
Consortium to host data and calls
from their well-characterized
genomes
www.bioplanet.com/gcat
Example User Interface
23. genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
Credit: Rebecca Truty, Complete Genomics
How should we interpret this complex variant on chr21?
24. genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
Credit: Rebecca Truty, Complete Genomics
Beyond simple T/F classification: Genotype errors
Trut
h
Callse
t
Description Proposed
Name(s)
CM#1 region
match
CM#2 allele match CM#3 genotype
match
0/1 1/1 zygosity/genotype
error
GE TP 1TP, 1GE FN
1/1 0/1
1/2 0/1
1/1
0/2
2/2
common allele, FN
allele
GE_FN TP 1TP, 1GE, 1FN FN
0/1 1/2 common allele, FP
allele
GE_FP TP 1TP, 1GE, 1FP FP, FN
1/1 1/2
1/2 1/3 common allele, FP
allele, FN allele
GE_FP_FN TP 1TP, 1GE, 1FP,
1FN
FP, FN
25. genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
Credit: Rebecca Truty, Complete Genomics
Beyond simple T/F classification: no-calls and half-calls
Truth Callset Description Proposed
Name(s)
CM#1 region
match
CM#2 allele match CM#3
genotyp
e match
0/1 ./1 half-call, TP allele HC_TP NC, NCV,
TP
1NC, 1NCV, 1TP, 1GE TP
1/1 ./1 1NC, 1NCV, 1TP, 1GE FN
0/1
1/1
./0 half call, FN
allele(s)
HC_FN NC, NCV, TP 1NC, 1NCV, 1FN FN
1/2 ./0 1NC, 2NCV, 2FN FN
1/2 ./1
./2
half-call, TP allele,
FN allele
HC_TP_F
N
NC, NCV,
TP
1NC, 1NCV, 1TP, 1GE,
1FN
FN
27. genomeinabottle.org
Data from GIAB PGP Trios
Dataset Characteristics Coverage Availability Most useful for…
Illumina Paired-end 150x150bp ~300x/individual on SRA/FTP SNPs/indels/some
SVs
Illumina Long Mate
pair
~6000 bp insert ~20x/individual on FTP SVs
Illumina “moleculo” Custom library ~20-30x by long
fragments
on FTP SVs/phasing/assem
bly
Complete Genomics 100x/individual On SRA/ftp SNPs/indels/some
SVs
Complete Genomics LFR on SRA/FTP SNPs/indels/phasin
g
Ion Proton Exome 1000x/individual On SRA/FTP SNPs/indels in
exome
BioNano Genomics 200-250kbp optical
map reads
~100x/AJ individual;
57x on Asian son
Raw reads and
assemblies on FTP
SVs/assembly
10X Linked reads 30-45x/individual On FTP SVs/phasing/assem
bly
PacBio ~10kb reads ~70x on AJ son, ~30x
on each AJ parent
on SRA/FTP SVs/phasing/assem
bly/STRs
28. genomeinabottle.org
GIAB Analysis Group – New Data Sets
Leaders
• Francisco de la Vega
– Annai Systems
• Chris Mason
– Weil Cornell Medical Center
• Tina Graves
– Washington University
• Valerie Schneider
– NCBI
•and Justin and Marc
Status
• Analysis Group Responsibilities:
– https://docs.google.com/document/d/10eA0DwB4i
YTSFM_LPO9_2LyyN2xEqH49OXHhtNH1uzw/edit?u
sp=sharing
• Analysis Milestones:
– https://docs.google.com/spreadsheets/d/1Pj4nSzH742g4
0wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing
• Analysis Methods
– https://docs.google.com/spreadsheets/d/1Je2g85
H7oK6kMXbBOoqQ1FMNrvGnFuUJTJn7deyYiS8/e
dit?usp=sharing
• Analysis Plan:
– https://drive.google.com/file/d/0B7Ao1qqJJDHQdn
VEaVdqbWdEdkE/view?usp=sharing
• Collecting Data into a Central FTP Site
• Recruiting people to help with the work.
This could be you.
We need volunteers!
Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and
sizes, as well as homozygous reference regions, on GIAB PGP trios
29. genomeinabottle.org
Data Release Plans: Real-time,
Open, Public Release
Individual Datasets
• Uploaded to GIAB FTP site
as it is collected
• Includes raw reads, aligned
reads, and
variant/reference calls
Integrated High-confidence Calls
• First develop SNP, indel,
and homozygous reference
calls
• Then develop SV and non-
SV calls
• Released calls are versioned
• Preliminary callsets will be
made available to be
critiqued
30. genomeinabottle.org
SNP/Indel Integration Method Update
• Implementing refined integration methods on
DNAnexus
– Others can readily reproduce results
– Consistent results for all GIAB genomes
• Validating with released NA12878 RM data
– Planned completion Sep 2015
• Then, apply to PGP trios
– Plan to analyze AJ trio by Nov 2015
– Release of NIST RMs in early 2016
31. genomeinabottle.org
Integration to form high-confidence
SNP/indel calls
VCFs with 0 FP PASS and
0 FN PASS+filtered in
BED files
If 1+ datasets PASS and
all PASSing datasets have
same genotype
High-confidence variant,
include in high-
confidence regions
If all datasets are filtered
or outside BED
Unless manually inspect
alignments: not high-
confidence, exclude +-50
bp from high-confidence
regions
If PASSing datasets
disagree about genotype
or variant
Unless manually inspect
alignments: not high-
confidence, exclude +-50
bp from high-confidence
regions
If inside BED and not in
VCF for 1+ datasets, and
no datasets have
PASSing variants
High-confidence region
32. genomeinabottle.org
Forming high-confidence calls on AJ Trio
Generate candidate calls with
multiple analysis methods from
multiple types of data
Compare/integrate candidate calls
and manually inspect data to
understand differences; refine calls?
Generate integrated calls with
several methods (MetaSV,
Parliament, svclassify, others?)
Combine integrated calls (with
heuristics and/or machine learning)
to generate high-confidence calls
https://docs.google.com/spreadsheets/d/1Pj4nSzH742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing
August 30, 2015
Nov 1, 2015
Dec 1, 2015
Jan 26, 2016
33. genomeinabottle.org
Analysis Progress: AJ Trio
• SNPs/indels
– Several candidate callsets
– NIST working on integration
• Assembly
– 2 de novo assemblies of AJ trio (MHAP and Falcon/Bionano)
– Will be used by at least 2 groups for SV calling
• Structural variants
– Candidate calls being generated by 14+ groups with >14 different
algorithms and 6 datasets
– 3 integration methods: MetaSV, Parliament, svclassify
• Long-range Phasing
– 2 phased calls so far (CG LFR and 10X)
– Integration methods needed!