SlideShare a Scribd company logo
1 of 33
genomeinabottle.org
Genome in a Bottle Consortium
August 2015
NIST, Gaithersburg, MD
Reference Materials for Clinical Applications of
Human Genome Sequencing
Marc Salit, Ph.D. and Justin Zook, Ph.D
National Institute of Standards and Technology
genomeinabottle.org
NIST Released the GIAB Pilot Genome
as RM 8398 in May 2015
genomeinabottle.org
GIAB Scope
• The Genome in a Bottle Consortium is
developing the reference materials, reference
methods, and reference data needed to
assess confidence in human whole genome
variant calls.
• A principal motivation for this consortium is to
enable performance assessment of
sequencing and science-based regulatory
oversight of clinical sequencing.
genomeinabottle.org
Genome in a Bottle
Consortium Development
• NIST met with sequencing
technology developers to assess
standards needs
– Stanford, June 2011
• Open, exploratory workshop
– ASHG, Montreal, Canada
– October 2011
• Small workshop at NIST to develop
consortium for human genome
reference materials
– FDA, NCBI, NHGRI, NCI, CDC, Wash
U, Broad, technology developers,
clinical labs, CAP, PGP, Partners,
ABRF, others
– developed draft work plan
– April 2012
• Open, public meetings of GIAB
– August 2012 at NIST
– March 2013 at Xgen
– August 2013 at NIST
– January 2014 at Stanford
– August 2014 at NIST
– January 2015 at Stanford
– August 2015 at NIST
– January 28-29, 2015 at Stanford
• Website
– www.genomeinabottle.org
genomeinabottle.org
Well-characterized, stable RMs
• Obtain metrics for validation,
QC, QA, PT
• Determine sources and types
of bias/error
• Learn to resolve difficult
structural variants
• Improve reference genome
assembly
• Optimization
– integration of data from
multiple platforms
– sequencing and analysis
• Enable regulated applications Comparison of SNP Calls for
NA12878 on 2 platforms, 3
analysis methods
genomeinabottle.org
NGS Validation Process using
Genomes in Bottles
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence
Estimates
Downstream
Analysis
Analytical Process
Genome in a Bottle Scope
Pre-Analytical Process
Clinical Interpretation
GIAB
Data
genomeinabottle.org
Genome in a Bottle Consortium (GIAB)
Hosted by US National Institute of Standards and Technology
Goal: Provide infrastructure for
performance assessment of NGS
• Appropriately consented widely
available DNA samples, distributed by
the Coriell Institute
– Also, QCed Reference Material (RM)
versions from controlled lots will be
available from NIST
– Pilot NIST RM 8398:
tinyurl.com/giabpilot
• High-accuracy reference data for these
samples
• Tools to facilitate their use
– With the Global Alliance Data Working
Group Benchmarking Team
ga4gh.org
genomeinabottle.org
High-confidence SNP/indel calls
Zook et al., Nature Biotechnology, 2014.
• methods to develop
SNP/indel call set
described in manuscript
• broad and quick
adoption of call set for
benchmarking
– struck nerve
genomeinabottle.org
Highlights
This workshop
• Progress Update
• Breakouts
– Analyses for PGP GIAB Trios
– Other RMs
• GIAB Roadmap
– Coordinating analyses
– Other RM plans
– Papers?
• Using GIAB Products for
analytical validation of clinical
NGS assays
Future GIAB work
• Beyond support,
improvement/development
and maintenance of existing
GIAB products…
– What future work should
GIAB do that would uniquely
take advantage of the
momentum we’ve built?
genomeinabottle.org
Agenda
Thursday
• Welcome and Status Update
• Break
• Breakout presentations
– Analysis Team
– Other Reference Materials
• Lunch (on your own in
cafeteria)
• GIAB Roadmap
• Break
• Breakouts to plan to carry out
the roadmap
• Plenary to discuss Roadmap
plans
Friday
• Additional Analysis breakout
if needed
• Using GIAB products for
Analytical Validation
• Break
• GIAB products for analytical
validation?
• Lunch (on your own in
cafeteria)
• Steering committee meeting
genomeinabottle.org
Agenda
Monday
• Breakfast and registration
• Welcome and Context Setting
• NIST RM Update and Status Report
• Charge to Working Groups
• Coffee Break
• Working Group Breakout Discussions
• Lunch (provided)
• Informal Working Group Reports
• Coffee Break
• Breakout Topical Discussions
– Topic #1: Moving beyond the 'easy'
variants and regions of the genome
– Topic #2: Selecting future genomes for
Reference Materials
Tuesday
• Breakfast and registration
• Use cases: Experiences using the pilot
Reference Material
• Discussion of plans to release pilot
Reference Material
• Coffee Break
• Working Group Breakout discussions
• Lunch (provided)
• Working Group leaders present plans
and discussion
• Steering committee Overview
• First meeting of the Steering
Committee (others adjourn)
Please Note
Slides will be made available on SlideShare after
the workshop (see genomeinabottle.org).
Tweets are welcome unless the speaker requests
otherwise. Please use #giab as the hashtag.
GIAB Roadmap: Where are we,
Where are we going?
• Reference Materials
– Germline
– Somatic
• Informatics
– Analysis of GIAB data
– Benchmarking
• Documentary Standards/Publications
– Documentation of methods
– Supporting Use
GIAB
Germline Genomes
Pilot RM
High-confidence
SNPs/indels
RM Release
High-confidence
SVs
PGP RMs
High-confidence
SNPs/indels
RM Release
High-confidence
SVs
Other ancestries
Do we need trios?
Other large
families?
Sample panels
Many samples with
clinically important
mutations
Pharmacogenomics
In depth analyses
Characterize
harder parts of the
genome
Diploid de novo
assemblies
Assign confidence
scores to variants
in RMs
Somatic mutation
RMs
Interlaboratory
study
ctDNA/cfDNA/fetal
DNA
Whole cancer
genomes
Benchmarking
tools
Define
performance
metrics
Stratification -
Assign confidence
to types of variants
Documents/Publica
tions
Analyses
Best
practices/analytic
validation
Documentary
standards
genomeinabottle.org
Others working in this space…
Well-characterized genomes
• Illumina Platinum Genomes
• CDC GeT-RM
• Korean Genome Project
• Human Longevity, Inc.
• Hyditaform mole haploid
cell line
• Genome Reference
Consortium
• 1000 Genomes SV group
Performance Metrics
• Global Alliance for
Genomics and Health
Benchmarking Team
• NCBI/CDC GeT-RM Browser
• GCAT website
What should GIAB do?
• Beyond support, improvement/development
and maintenance of existing in--process GIAB
products…
– What future work should GIAB do that would take
advantage of the momentum and unique
community we’ve built?
genomeinabottle.org
GIAB Progress Update
August 2015
genomeinabottle.org
NIST Human Genome
Reference Materials (RMs)
• NIST RM 8398 is available!
– tinyurl.com/giabpilot
– DNA isolated from large
growth cell cultures
– Stable, homogeneous
– Best for regulated uses
– DNA from same cell line at
Coriell (NA12878)
• New AJ and Asian Samples
– Available from Coriell now
– NIST RM available in 2016
genomeinabottle.org
Using high-confidence NIST-GIAB
genotypes for NA12878
• NIST have released
several versions of high-
confidence genotypes
for its pilot RM
• These data are
presently being used for
benchmarking
– prior to release of RMs
– SNPs & indels
• ~77% of the genome
•Data on FTP now well-organized
genomeinabottle.org
90000
genomeinabottle.org
GeT-RM Browser from NCBI and CDC
• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/
• Allows visualization of data underlying call each call
genomeinabottle.org
Uses of GIAB NA12878
Oncology – Molecular and Cellular Tumor Markers
“Next Generation” Sequencing (NGS) guidelines for
somatic genetic variant detection
www.bioplanet.com/gcat
genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
• Formed June 2014 to develop
methods and tools for comparing
variant calls to a benchmark
• Developed standardized definitions
for performance metrics like TP, FP,
and FN.
• Initial focus on germline SNPs/indels
• Developing benchmarking tools
• Comparison engine
• Pluggable web interface with
modules for:
• Reporting/calculation of metrics
• Visualization/user interface
• Working with Genome in a Bottle
Consortium to host data and calls
from their well-characterized
genomes
www.bioplanet.com/gcat
Example User Interface
genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
Credit: Rebecca Truty, Complete Genomics
How should we interpret this complex variant on chr21?
genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
Credit: Rebecca Truty, Complete Genomics
Beyond simple T/F classification: Genotype errors
Trut
h
Callse
t
Description Proposed
Name(s)
CM#1 region
match
CM#2 allele match CM#3 genotype
match
0/1 1/1 zygosity/genotype
error
GE TP 1TP, 1GE FN
1/1 0/1
1/2 0/1
1/1
0/2
2/2
common allele, FN
allele
GE_FN TP 1TP, 1GE, 1FN FN
0/1 1/2 common allele, FP
allele
GE_FP TP 1TP, 1GE, 1FP FP, FN
1/1 1/2
1/2 1/3 common allele, FP
allele, FN allele
GE_FP_FN TP 1TP, 1GE, 1FP,
1FN
FP, FN
genomeinabottle.org
Global Alliance for Genomics and Health
Benchmarking Task Team
Credit: Rebecca Truty, Complete Genomics
Beyond simple T/F classification: no-calls and half-calls
Truth Callset Description Proposed
Name(s)
CM#1 region
match
CM#2 allele match CM#3
genotyp
e match
0/1 ./1 half-call, TP allele HC_TP NC, NCV,
TP
1NC, 1NCV, 1TP, 1GE TP
1/1 ./1 1NC, 1NCV, 1TP, 1GE FN
0/1
1/1
./0 half call, FN
allele(s)
HC_FN NC, NCV, TP 1NC, 1NCV, 1FN FN
1/2 ./0 1NC, 2NCV, 2FN FN
1/2 ./1
./2
half-call, TP allele,
FN allele
HC_TP_F
N
NC, NCV,
TP
1NC, 1NCV, 1TP, 1GE,
1FN
FN
genomeinabottle.org
Stratifying False PositivesGC Content
TR
Unit
<7
TR
Unit
>=7
TR
Unit
2TR
Unit
1
TR
Unit
3
TR
Unit
4
Credit:
Abby Beeler
Ellie Wood
GA4GH - Stratification
genomeinabottle.org
Data from GIAB PGP Trios
Dataset Characteristics Coverage Availability Most useful for…
Illumina Paired-end 150x150bp ~300x/individual on SRA/FTP SNPs/indels/some
SVs
Illumina Long Mate
pair
~6000 bp insert ~20x/individual on FTP SVs
Illumina “moleculo” Custom library ~20-30x by long
fragments
on FTP SVs/phasing/assem
bly
Complete Genomics 100x/individual On SRA/ftp SNPs/indels/some
SVs
Complete Genomics LFR on SRA/FTP SNPs/indels/phasin
g
Ion Proton Exome 1000x/individual On SRA/FTP SNPs/indels in
exome
BioNano Genomics 200-250kbp optical
map reads
~100x/AJ individual;
57x on Asian son
Raw reads and
assemblies on FTP
SVs/assembly
10X Linked reads 30-45x/individual On FTP SVs/phasing/assem
bly
PacBio ~10kb reads ~70x on AJ son, ~30x
on each AJ parent
on SRA/FTP SVs/phasing/assem
bly/STRs
genomeinabottle.org
GIAB Analysis Group – New Data Sets
Leaders
• Francisco de la Vega
– Annai Systems
• Chris Mason
– Weil Cornell Medical Center
• Tina Graves
– Washington University
• Valerie Schneider
– NCBI
•and Justin and Marc
Status
• Analysis Group Responsibilities:
– https://docs.google.com/document/d/10eA0DwB4i
YTSFM_LPO9_2LyyN2xEqH49OXHhtNH1uzw/edit?u
sp=sharing
• Analysis Milestones:
– https://docs.google.com/spreadsheets/d/1Pj4nSzH742g4
0wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing
• Analysis Methods
– https://docs.google.com/spreadsheets/d/1Je2g85
H7oK6kMXbBOoqQ1FMNrvGnFuUJTJn7deyYiS8/e
dit?usp=sharing
• Analysis Plan:
– https://drive.google.com/file/d/0B7Ao1qqJJDHQdn
VEaVdqbWdEdkE/view?usp=sharing
• Collecting Data into a Central FTP Site
• Recruiting people to help with the work.
This could be you.
We need volunteers!
Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and
sizes, as well as homozygous reference regions, on GIAB PGP trios
genomeinabottle.org
Data Release Plans: Real-time,
Open, Public Release
Individual Datasets
• Uploaded to GIAB FTP site
as it is collected
• Includes raw reads, aligned
reads, and
variant/reference calls
Integrated High-confidence Calls
• First develop SNP, indel,
and homozygous reference
calls
• Then develop SV and non-
SV calls
• Released calls are versioned
• Preliminary callsets will be
made available to be
critiqued
genomeinabottle.org
SNP/Indel Integration Method Update
• Implementing refined integration methods on
DNAnexus
– Others can readily reproduce results
– Consistent results for all GIAB genomes
• Validating with released NA12878 RM data
– Planned completion Sep 2015
• Then, apply to PGP trios
– Plan to analyze AJ trio by Nov 2015
– Release of NIST RMs in early 2016
genomeinabottle.org
Integration to form high-confidence
SNP/indel calls
VCFs with 0 FP PASS and
0 FN PASS+filtered in
BED files
If 1+ datasets PASS and
all PASSing datasets have
same genotype
High-confidence variant,
include in high-
confidence regions
If all datasets are filtered
or outside BED
Unless manually inspect
alignments: not high-
confidence, exclude +-50
bp from high-confidence
regions
If PASSing datasets
disagree about genotype
or variant
Unless manually inspect
alignments: not high-
confidence, exclude +-50
bp from high-confidence
regions
If inside BED and not in
VCF for 1+ datasets, and
no datasets have
PASSing variants
High-confidence region
genomeinabottle.org
Forming high-confidence calls on AJ Trio
Generate candidate calls with
multiple analysis methods from
multiple types of data
Compare/integrate candidate calls
and manually inspect data to
understand differences; refine calls?
Generate integrated calls with
several methods (MetaSV,
Parliament, svclassify, others?)
Combine integrated calls (with
heuristics and/or machine learning)
to generate high-confidence calls
https://docs.google.com/spreadsheets/d/1Pj4nSzH742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing
August 30, 2015
Nov 1, 2015
Dec 1, 2015
Jan 26, 2016
genomeinabottle.org
Analysis Progress: AJ Trio
• SNPs/indels
– Several candidate callsets
– NIST working on integration
• Assembly
– 2 de novo assemblies of AJ trio (MHAP and Falcon/Bionano)
– Will be used by at least 2 groups for SV calling
• Structural variants
– Candidate calls being generated by 14+ groups with >14 different
algorithms and 6 datasets
– 3 integration methods: MetaSV, Parliament, svclassify
• Long-range Phasing
– 2 phased calls so far (CG LFR and 10X)
– Integration methods needed!

More Related Content

What's hot

Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
GenomeInABottle
 

What's hot (20)

Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin2017 amp benchmarking_poster_justin
2017 amp benchmarking_poster_justin
 
The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)The Transforming Genetic Medicine Initiative (TGMI)
The Transforming Genetic Medicine Initiative (TGMI)
 
GIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seqGIAB Sep2016 Lightning megan cleveland targeted seq
GIAB Sep2016 Lightning megan cleveland targeted seq
 
GIAB GRC Workshop slides
GIAB GRC Workshop slidesGIAB GRC Workshop slides
GIAB GRC Workshop slides
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 

Viewers also liked

Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
GenomeInABottle
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_article
Neha Gupta
 
Dublinked tourism presentation_kevin_griffin_dit (1)
Dublinked tourism presentation_kevin_griffin_dit (1)Dublinked tourism presentation_kevin_griffin_dit (1)
Dublinked tourism presentation_kevin_griffin_dit (1)
Dublinked .
 
Retails Plus _ Brochure
Retails Plus _ BrochureRetails Plus _ Brochure
Retails Plus _ Brochure
Sam Kirk
 

Viewers also liked (19)

Aug2015 zivana tezak analytical validation
Aug2015 zivana tezak analytical validationAug2015 zivana tezak analytical validation
Aug2015 zivana tezak analytical validation
 
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_article
 
熊本地震でボランティア50人を集めたSNSの威力
熊本地震でボランティア50人を集めたSNSの威力熊本地震でボランティア50人を集めたSNSの威力
熊本地震でボランティア50人を集めたSNSの威力
 
Dublinked tourism presentation_kevin_griffin_dit (1)
Dublinked tourism presentation_kevin_griffin_dit (1)Dublinked tourism presentation_kevin_griffin_dit (1)
Dublinked tourism presentation_kevin_griffin_dit (1)
 
1 5 l.
1 5 l.1 5 l.
1 5 l.
 
Modals, Would, Should, National Geographic Survival Scenarios
Modals, Would, Should, National Geographic Survival ScenariosModals, Would, Should, National Geographic Survival Scenarios
Modals, Would, Should, National Geographic Survival Scenarios
 
Demanda contra la República
Demanda contra la República Demanda contra la República
Demanda contra la República
 
Practica6 ensayo traccion_alvarogarciacamaron
Practica6 ensayo traccion_alvarogarciacamaronPractica6 ensayo traccion_alvarogarciacamaron
Practica6 ensayo traccion_alvarogarciacamaron
 
Mary fonseca El deslinde de propiedad contiguas
Mary fonseca El deslinde de propiedad contiguasMary fonseca El deslinde de propiedad contiguas
Mary fonseca El deslinde de propiedad contiguas
 
Silsilah keluarga gaffar
Silsilah keluarga gaffarSilsilah keluarga gaffar
Silsilah keluarga gaffar
 
What is Python
What is PythonWhat is Python
What is Python
 
procesal Civil
procesal Civil procesal Civil
procesal Civil
 
Slide show...Intramedullary cystic spinal cord metastasis
Slide show...Intramedullary cystic spinal cord metastasisSlide show...Intramedullary cystic spinal cord metastasis
Slide show...Intramedullary cystic spinal cord metastasis
 
Retails Plus _ Brochure
Retails Plus _ BrochureRetails Plus _ Brochure
Retails Plus _ Brochure
 
Apresentação1
Apresentação1Apresentação1
Apresentação1
 
The COMBO co. new business pitch
The COMBO co. new business pitchThe COMBO co. new business pitch
The COMBO co. new business pitch
 
Escuela normal superior de neiva
Escuela normal superior de neivaEscuela normal superior de neiva
Escuela normal superior de neiva
 

Similar to Giab aug2015 intro and update 150821.pptx

2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
GenomeInABottle
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
GenomeInABottle
 
Aug2013 reference material selection and design working group
Aug2013 reference material selection and design working groupAug2013 reference material selection and design working group
Aug2013 reference material selection and design working group
GenomeInABottle
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
GenomeInABottle
 

Similar to Giab aug2015 intro and update 150821.pptx (20)

150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Giab roadmap 150820.pptx
Giab roadmap 150820.pptxGiab roadmap 150820.pptx
Giab roadmap 150820.pptx
 
Aug2014 giab intro slides
Aug2014 giab intro slidesAug2014 giab intro slides
Aug2014 giab intro slides
 
Aug2013 reference material selection and design working group
Aug2013 reference material selection and design working groupAug2013 reference material selection and design working group
Aug2013 reference material selection and design working group
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
 
Giab workshop update mar2019
Giab workshop update mar2019Giab workshop update mar2019
Giab workshop update mar2019
 
140127 GIAB Intro
140127 GIAB Intro140127 GIAB Intro
140127 GIAB Intro
 
Jan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis PlanningJan2015 GIAB intro, Update, and Data Analysis Planning
Jan2015 GIAB intro, Update, and Data Analysis Planning
 
Giab workshop intro 180125
Giab workshop intro 180125Giab workshop intro 180125
Giab workshop intro 180125
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821Genome in a bottle for next gen dx v2 180821
Genome in a bottle for next gen dx v2 180821
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 

More from GenomeInABottle

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 

Recently uploaded

Recently uploaded (20)

Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 

Giab aug2015 intro and update 150821.pptx

  • 1. genomeinabottle.org Genome in a Bottle Consortium August 2015 NIST, Gaithersburg, MD Reference Materials for Clinical Applications of Human Genome Sequencing Marc Salit, Ph.D. and Justin Zook, Ph.D National Institute of Standards and Technology
  • 2. genomeinabottle.org NIST Released the GIAB Pilot Genome as RM 8398 in May 2015
  • 3. genomeinabottle.org GIAB Scope • The Genome in a Bottle Consortium is developing the reference materials, reference methods, and reference data needed to assess confidence in human whole genome variant calls. • A principal motivation for this consortium is to enable performance assessment of sequencing and science-based regulatory oversight of clinical sequencing.
  • 4. genomeinabottle.org Genome in a Bottle Consortium Development • NIST met with sequencing technology developers to assess standards needs – Stanford, June 2011 • Open, exploratory workshop – ASHG, Montreal, Canada – October 2011 • Small workshop at NIST to develop consortium for human genome reference materials – FDA, NCBI, NHGRI, NCI, CDC, Wash U, Broad, technology developers, clinical labs, CAP, PGP, Partners, ABRF, others – developed draft work plan – April 2012 • Open, public meetings of GIAB – August 2012 at NIST – March 2013 at Xgen – August 2013 at NIST – January 2014 at Stanford – August 2014 at NIST – January 2015 at Stanford – August 2015 at NIST – January 28-29, 2015 at Stanford • Website – www.genomeinabottle.org
  • 5. genomeinabottle.org Well-characterized, stable RMs • Obtain metrics for validation, QC, QA, PT • Determine sources and types of bias/error • Learn to resolve difficult structural variants • Improve reference genome assembly • Optimization – integration of data from multiple platforms – sequencing and analysis • Enable regulated applications Comparison of SNP Calls for NA12878 on 2 platforms, 3 analysis methods
  • 6. genomeinabottle.org NGS Validation Process using Genomes in Bottles Sample gDNA isolation Library Prep Sequencing Alignment/Mapping Variant Calling Confidence Estimates Downstream Analysis Analytical Process Genome in a Bottle Scope Pre-Analytical Process Clinical Interpretation GIAB Data
  • 7. genomeinabottle.org Genome in a Bottle Consortium (GIAB) Hosted by US National Institute of Standards and Technology Goal: Provide infrastructure for performance assessment of NGS • Appropriately consented widely available DNA samples, distributed by the Coriell Institute – Also, QCed Reference Material (RM) versions from controlled lots will be available from NIST – Pilot NIST RM 8398: tinyurl.com/giabpilot • High-accuracy reference data for these samples • Tools to facilitate their use – With the Global Alliance Data Working Group Benchmarking Team ga4gh.org
  • 8. genomeinabottle.org High-confidence SNP/indel calls Zook et al., Nature Biotechnology, 2014. • methods to develop SNP/indel call set described in manuscript • broad and quick adoption of call set for benchmarking – struck nerve
  • 9. genomeinabottle.org Highlights This workshop • Progress Update • Breakouts – Analyses for PGP GIAB Trios – Other RMs • GIAB Roadmap – Coordinating analyses – Other RM plans – Papers? • Using GIAB Products for analytical validation of clinical NGS assays Future GIAB work • Beyond support, improvement/development and maintenance of existing GIAB products… – What future work should GIAB do that would uniquely take advantage of the momentum we’ve built?
  • 10. genomeinabottle.org Agenda Thursday • Welcome and Status Update • Break • Breakout presentations – Analysis Team – Other Reference Materials • Lunch (on your own in cafeteria) • GIAB Roadmap • Break • Breakouts to plan to carry out the roadmap • Plenary to discuss Roadmap plans Friday • Additional Analysis breakout if needed • Using GIAB products for Analytical Validation • Break • GIAB products for analytical validation? • Lunch (on your own in cafeteria) • Steering committee meeting
  • 11. genomeinabottle.org Agenda Monday • Breakfast and registration • Welcome and Context Setting • NIST RM Update and Status Report • Charge to Working Groups • Coffee Break • Working Group Breakout Discussions • Lunch (provided) • Informal Working Group Reports • Coffee Break • Breakout Topical Discussions – Topic #1: Moving beyond the 'easy' variants and regions of the genome – Topic #2: Selecting future genomes for Reference Materials Tuesday • Breakfast and registration • Use cases: Experiences using the pilot Reference Material • Discussion of plans to release pilot Reference Material • Coffee Break • Working Group Breakout discussions • Lunch (provided) • Working Group leaders present plans and discussion • Steering committee Overview • First meeting of the Steering Committee (others adjourn) Please Note Slides will be made available on SlideShare after the workshop (see genomeinabottle.org). Tweets are welcome unless the speaker requests otherwise. Please use #giab as the hashtag.
  • 12. GIAB Roadmap: Where are we, Where are we going? • Reference Materials – Germline – Somatic • Informatics – Analysis of GIAB data – Benchmarking • Documentary Standards/Publications – Documentation of methods – Supporting Use
  • 13. GIAB Germline Genomes Pilot RM High-confidence SNPs/indels RM Release High-confidence SVs PGP RMs High-confidence SNPs/indels RM Release High-confidence SVs Other ancestries Do we need trios? Other large families? Sample panels Many samples with clinically important mutations Pharmacogenomics In depth analyses Characterize harder parts of the genome Diploid de novo assemblies Assign confidence scores to variants in RMs Somatic mutation RMs Interlaboratory study ctDNA/cfDNA/fetal DNA Whole cancer genomes Benchmarking tools Define performance metrics Stratification - Assign confidence to types of variants Documents/Publica tions Analyses Best practices/analytic validation Documentary standards
  • 14. genomeinabottle.org Others working in this space… Well-characterized genomes • Illumina Platinum Genomes • CDC GeT-RM • Korean Genome Project • Human Longevity, Inc. • Hyditaform mole haploid cell line • Genome Reference Consortium • 1000 Genomes SV group Performance Metrics • Global Alliance for Genomics and Health Benchmarking Team • NCBI/CDC GeT-RM Browser • GCAT website
  • 15. What should GIAB do? • Beyond support, improvement/development and maintenance of existing in--process GIAB products… – What future work should GIAB do that would take advantage of the momentum and unique community we’ve built?
  • 17. genomeinabottle.org NIST Human Genome Reference Materials (RMs) • NIST RM 8398 is available! – tinyurl.com/giabpilot – DNA isolated from large growth cell cultures – Stable, homogeneous – Best for regulated uses – DNA from same cell line at Coriell (NA12878) • New AJ and Asian Samples – Available from Coriell now – NIST RM available in 2016
  • 18. genomeinabottle.org Using high-confidence NIST-GIAB genotypes for NA12878 • NIST have released several versions of high- confidence genotypes for its pilot RM • These data are presently being used for benchmarking – prior to release of RMs – SNPs & indels • ~77% of the genome •Data on FTP now well-organized
  • 20. genomeinabottle.org GeT-RM Browser from NCBI and CDC • http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/ • Allows visualization of data underlying call each call
  • 21. genomeinabottle.org Uses of GIAB NA12878 Oncology – Molecular and Cellular Tumor Markers “Next Generation” Sequencing (NGS) guidelines for somatic genetic variant detection www.bioplanet.com/gcat
  • 22. genomeinabottle.org Global Alliance for Genomics and Health Benchmarking Task Team • Formed June 2014 to develop methods and tools for comparing variant calls to a benchmark • Developed standardized definitions for performance metrics like TP, FP, and FN. • Initial focus on germline SNPs/indels • Developing benchmarking tools • Comparison engine • Pluggable web interface with modules for: • Reporting/calculation of metrics • Visualization/user interface • Working with Genome in a Bottle Consortium to host data and calls from their well-characterized genomes www.bioplanet.com/gcat Example User Interface
  • 23. genomeinabottle.org Global Alliance for Genomics and Health Benchmarking Task Team Credit: Rebecca Truty, Complete Genomics How should we interpret this complex variant on chr21?
  • 24. genomeinabottle.org Global Alliance for Genomics and Health Benchmarking Task Team Credit: Rebecca Truty, Complete Genomics Beyond simple T/F classification: Genotype errors Trut h Callse t Description Proposed Name(s) CM#1 region match CM#2 allele match CM#3 genotype match 0/1 1/1 zygosity/genotype error GE TP 1TP, 1GE FN 1/1 0/1 1/2 0/1 1/1 0/2 2/2 common allele, FN allele GE_FN TP 1TP, 1GE, 1FN FN 0/1 1/2 common allele, FP allele GE_FP TP 1TP, 1GE, 1FP FP, FN 1/1 1/2 1/2 1/3 common allele, FP allele, FN allele GE_FP_FN TP 1TP, 1GE, 1FP, 1FN FP, FN
  • 25. genomeinabottle.org Global Alliance for Genomics and Health Benchmarking Task Team Credit: Rebecca Truty, Complete Genomics Beyond simple T/F classification: no-calls and half-calls Truth Callset Description Proposed Name(s) CM#1 region match CM#2 allele match CM#3 genotyp e match 0/1 ./1 half-call, TP allele HC_TP NC, NCV, TP 1NC, 1NCV, 1TP, 1GE TP 1/1 ./1 1NC, 1NCV, 1TP, 1GE FN 0/1 1/1 ./0 half call, FN allele(s) HC_FN NC, NCV, TP 1NC, 1NCV, 1FN FN 1/2 ./0 1NC, 2NCV, 2FN FN 1/2 ./1 ./2 half-call, TP allele, FN allele HC_TP_F N NC, NCV, TP 1NC, 1NCV, 1TP, 1GE, 1FN FN
  • 26. genomeinabottle.org Stratifying False PositivesGC Content TR Unit <7 TR Unit >=7 TR Unit 2TR Unit 1 TR Unit 3 TR Unit 4 Credit: Abby Beeler Ellie Wood GA4GH - Stratification
  • 27. genomeinabottle.org Data from GIAB PGP Trios Dataset Characteristics Coverage Availability Most useful for… Illumina Paired-end 150x150bp ~300x/individual on SRA/FTP SNPs/indels/some SVs Illumina Long Mate pair ~6000 bp insert ~20x/individual on FTP SVs Illumina “moleculo” Custom library ~20-30x by long fragments on FTP SVs/phasing/assem bly Complete Genomics 100x/individual On SRA/ftp SNPs/indels/some SVs Complete Genomics LFR on SRA/FTP SNPs/indels/phasin g Ion Proton Exome 1000x/individual On SRA/FTP SNPs/indels in exome BioNano Genomics 200-250kbp optical map reads ~100x/AJ individual; 57x on Asian son Raw reads and assemblies on FTP SVs/assembly 10X Linked reads 30-45x/individual On FTP SVs/phasing/assem bly PacBio ~10kb reads ~70x on AJ son, ~30x on each AJ parent on SRA/FTP SVs/phasing/assem bly/STRs
  • 28. genomeinabottle.org GIAB Analysis Group – New Data Sets Leaders • Francisco de la Vega – Annai Systems • Chris Mason – Weil Cornell Medical Center • Tina Graves – Washington University • Valerie Schneider – NCBI •and Justin and Marc Status • Analysis Group Responsibilities: – https://docs.google.com/document/d/10eA0DwB4i YTSFM_LPO9_2LyyN2xEqH49OXHhtNH1uzw/edit?u sp=sharing • Analysis Milestones: – https://docs.google.com/spreadsheets/d/1Pj4nSzH742g4 0wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing • Analysis Methods – https://docs.google.com/spreadsheets/d/1Je2g85 H7oK6kMXbBOoqQ1FMNrvGnFuUJTJn7deyYiS8/e dit?usp=sharing • Analysis Plan: – https://drive.google.com/file/d/0B7Ao1qqJJDHQdn VEaVdqbWdEdkE/view?usp=sharing • Collecting Data into a Central FTP Site • Recruiting people to help with the work. This could be you. We need volunteers! Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and sizes, as well as homozygous reference regions, on GIAB PGP trios
  • 29. genomeinabottle.org Data Release Plans: Real-time, Open, Public Release Individual Datasets • Uploaded to GIAB FTP site as it is collected • Includes raw reads, aligned reads, and variant/reference calls Integrated High-confidence Calls • First develop SNP, indel, and homozygous reference calls • Then develop SV and non- SV calls • Released calls are versioned • Preliminary callsets will be made available to be critiqued
  • 30. genomeinabottle.org SNP/Indel Integration Method Update • Implementing refined integration methods on DNAnexus – Others can readily reproduce results – Consistent results for all GIAB genomes • Validating with released NA12878 RM data – Planned completion Sep 2015 • Then, apply to PGP trios – Plan to analyze AJ trio by Nov 2015 – Release of NIST RMs in early 2016
  • 31. genomeinabottle.org Integration to form high-confidence SNP/indel calls VCFs with 0 FP PASS and 0 FN PASS+filtered in BED files If 1+ datasets PASS and all PASSing datasets have same genotype High-confidence variant, include in high- confidence regions If all datasets are filtered or outside BED Unless manually inspect alignments: not high- confidence, exclude +-50 bp from high-confidence regions If PASSing datasets disagree about genotype or variant Unless manually inspect alignments: not high- confidence, exclude +-50 bp from high-confidence regions If inside BED and not in VCF for 1+ datasets, and no datasets have PASSing variants High-confidence region
  • 32. genomeinabottle.org Forming high-confidence calls on AJ Trio Generate candidate calls with multiple analysis methods from multiple types of data Compare/integrate candidate calls and manually inspect data to understand differences; refine calls? Generate integrated calls with several methods (MetaSV, Parliament, svclassify, others?) Combine integrated calls (with heuristics and/or machine learning) to generate high-confidence calls https://docs.google.com/spreadsheets/d/1Pj4nSzH742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing August 30, 2015 Nov 1, 2015 Dec 1, 2015 Jan 26, 2016
  • 33. genomeinabottle.org Analysis Progress: AJ Trio • SNPs/indels – Several candidate callsets – NIST working on integration • Assembly – 2 de novo assemblies of AJ trio (MHAP and Falcon/Bionano) – Will be used by at least 2 groups for SV calling • Structural variants – Candidate calls being generated by 14+ groups with >14 different algorithms and 6 datasets – 3 integration methods: MetaSV, Parliament, svclassify • Long-range Phasing – 2 phased calls so far (CG LFR and 10X) – Integration methods needed!