SlideShare uma empresa Scribd logo
1 de 47
Bacterial Pathogen Genomics at
NCBI
FDA, USDA, CDC State, Local and
Foreign Public Health Agencies
Industry/Academia Additional
DATA ANALYSIS
DATA ASSEMBLY AND
STORAGE and Analysis
DATA ACQUISITION
NCBI, EMBL DDBJ (INDIS)
(Public Access Database)
Our Current Model – Publicly available data
National Network of SequencersIntrenational Network of Sequencers
Automated Bacterial Assembly
SRA Reads
sample 1
Trim reads
(Ns, adaptor)
Reference
Distance tree
Find closest reference genome(s)
ArgoCA (Combined Assembly)
De novo assembly panel
Argo (Reference
assisted
assembly)
SOAP denovo
GS-assembler
(newbler)
MaSuRCA
Celera
Assembler
Reads remapped to combined assembly
Contig fasta
Read placements (bam)
Quality profile
SPAdes
WGS & Epidemiologically Relevant Distance (ERD)
• WGS allows high resolution genotypic comparison of
pathogen isolates
• What is the epidemiological relevance of genotypic
distance?
• Many methods to compute – we need some common
principles…
Since all approaches start with sequence reads, we must
retain for independent confirmation
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500
Millions
FDA-CFSAN: microbial foodborne pathogen
research
SRA format bytes per sequenced base versus
number of bases in MiSeq runs
With Quality Without Qualities
0
0.2
0.4
0.6
0.8
0 200 400 600 800 1000 1200
Millions
OXFORD University: Population Genomics of
Mycobacterium tuberculosis
SRA format bytes per sequenced base versus
number of bases in MiSeq and HiSeq runs
With Quality Without Quality
Storage is manageable…
Reliable, transparent, high throughput, high
resolution ERDs?
Major challenge is to distinguish independent
events (SNPs) from single events that generate
multiple nucleotide differences
i.e. collapsed repeats and other artifacts,
alignment errors (reference-based alignments),
sequence quality, & recombination
Fairly uniform distribution
of differences along the
two genomes…?
Cumulative count of differences
Iterative density filtering
(Richa Agarwala
modification of
Science. 2011 Jan
28;331(6016):430-4.
Table: Samples currently processed (as of Sept 5, 2014) in NCBI Pathogen Pipeline
Organisms
Center Listeria Salmonella E. coli Total
CDC 903 903
FDA + State Partners* 858 6129 307 7294
100K 565 34 599
FERA 14 14
Total 1775 6694 341 8810
Processing Status
How to measure the system?
need the raw data (sequence reads) in unprocessed form
any read trimming/filtering along with the assembly can be regenerated
Assembly metrics
map the reads back to the assembly and generate a profile of each position
(coverage, alleles, qualities)
compare the assembly against other assemblies of the same organism (genus,
species) and check the expected genome size, or similarity to related genomes
annotation metrics such as frameshifted proteins
What is the actual measurement for sequence
similarity?
the number of pairwise SNPs between two genomes
What is the threshold?
a pairwise distance (an observationally determined cutoff below which a cluster of 2
or more isolates are considered significantly close enough to warrant further investigation)
Sensitivity vs. Specificity
sequence clustering
sensitivity – measure of isolates which belong to the cluster within epidemiologically
relevant distance
(true positives) / true positives + false negatives (not correctly identified)
specificity – measure of isolates which are excluded from a cluster within
epidemiologically relevant distance
(true negatives) / true negatives + false positives
Organism
Total
Samples
Not
expected
species1
Mixed
organisms
Less than
5X
coverage Duplicates PacBio
Poor
2nd
read
Failed
assembly
stage
Listeria 1775 20 2 (?) 1 5 1
Salmonella 6694 35 5 9 12
E. coli 341 8 1
1. not L. monocytogenes, S. enterica, or E. coli
Processing Problems
PROBLEMS!
Reference Materials
Streptococcus massiliensis 4401825 - CANO - GCA_000341525.1
Streptococcus massiliensis DSM 18628 - ARCE - GCA_000380065.1
Streptococcus intermedius BA1 - ANFT - GCA_000313655.1
Streptococcus intermedius B196 - - GCA_000463355.1
Streptococcus intermedius C270 - - GCA_000463385.1
Streptococcus intermedius F0413 - AFXO - GCA_000234035.1
Streptococcus intermedius SK54 - AJKN - GCA_000258445.1
Streptococcus intermedius JTH08 - - GCA_000306805.1
Streptococcus intermedius ATCC 27335 - ATFK - GCA_000413475.1
Streptococcus intermedius F0395 - AFXN - GCA_000234015.1
Streptococcus sp. AS20 - JANS - GCA_000524255.1
Streptococcus constellatus subsp. constellatus SK53 - AICQ - GCA_000257785.1
Streptococcus constellatus subsp. constellatus SK53 - BASU - GCA_000474075.1
Streptococcus constellatus subsp. pharyngis C1050 - - GCA_000463425.1
Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - AFUP - GCA_000223295.2
Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - BASX - GCA_000474135.1
Streptococcus constellatus subsp. pharyngis C232 - - GCA_000463395.1
Streptococcus constellatus subsp. pharyngis C818 - - GCA_000463445.1
Streptococcus anginosus SK1138 - ALJO - GCA_000287595.1
Streptococcus sp. CM7 - JATP - GCA_000526035.1
Streptococcus sp. OBRC6 - JACR - GCA_000517685.1
Streptococcus anginosus F0211 - AECT - GCA_000184365.2
Streptococcus anginosus 1505 - BASW - GCA_000474115.1
Streptococcus sp. ACC21 - JAQU - GCA_000524375.1
Streptococcus sp. AC15 - JDFJ - GCA_000565055.1
Streptococcus anginosus subsp. whileyi MAS624 - - GCA_000478925.1
Streptococcus anginosus subsp. whileyi CCUG 39159 - AICP - GCA_000257765.1
Streptococcus anginosus C238 - - GCA_000463505.1
Streptococcus anginosus DORA_7 - AZMF - GCA_000508545.1
Streptococcus anginosus 1_2_62CV - ADME - GCA_000186545.1
Streptococcus anginosus C1051 - - GCA_000463465.1
Streptococcus anginosus T5 - BASY - GCA_000474155.1
Streptococcus anginosus SK52 = DSM 20563 - AFIM - GCA_000214555.2
Streptococcus anginosus SK52 = DSM 20563 - AREF - GCA_000373605.1
Streptococcus anginosus SK52 = DSM 20563 - BAST - GCA_000474055.1
Streptococcus intermedius SK54 - BASV - GCA_000474095.1
0.05
Escherichia coli KTE179 - ANYQ - GCA_000326485.1
Escherichia coli KTE229 - ANXK - GCA_000353165.1
Escherichia coli H252 - AEFI - GCA_000190895.1
Escherichia coli HVH 180 (4-3051617) - AVYH - GCA_000458685.1
Escherichia coli HVH 73 (4-2393174) - AVUX - GCA_000457025.1
Escherichia coli HVH 104 (4-6977960) - AVVT - GCA_000457455.1
Escherichia coli HVH 19 (4-7154984) - AVTL - GCA_000456265.1
Escherichia coli 908675 - AXTY - GCA_000488755.1
Escherichia coli HVH 127 (4-7303629) - AVWO - GCA_000457855.1
Escherichia coli HVH 12 (4-7653042) - AVTG - GCA_000494955.1
Escherichia coli KOEGE 32 (66a) - AWAD - GCA_000459635.1
Escherichia coli UMEA 3041-1 - AWAW - GCA_000460015.1
Escherichia coli HVH 148 (4-3192490) - AVXH - GCA_000495015.1
Escherichia coli HVH 59 (4-1119338) - AVUQ - GCA_000456885.1
Escherichia coli HVH 222 (4-2977443) - AVZU - GCA_000459455.1
Escherichia coli UMEA 3140-1 - AWBK - GCA_000460295.1
Escherichia coli HVH 178 (4-3189163) - AVYG - GCA_000495055.1
Escherichia coli KTE4 - ANSO - GCA_000350645.1
Escherichia coli KTE3 - ASTO - GCA_000407685.1
Escherichia coli KTE240 - ASUS - GCA_000408305.1
Escherichia coli BIDMC 49b - JAPT - GCA_000522365.1
Escherichia coli BIDMC 49a - JAPU - GCA_000522385.1
Escherichia coli APEC O1 - - GCA_000014845.1
Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - BAIM - GCA_000613265.1
Escherichia coli JCM 20135 - BAKV - GCA_000614505.1
Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - AGSE - GCA_000690815.1
Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - JMST - GCA_000734955.1
Escherichia coli HVH 214 (4-3062198) - AZJN - GCA_000507665.1
Escherichia coli UMEA 3162-1 - AWBU - GCA_000460475.1
Escherichia coli HVH 191 (3-9341900) - AVYR - GCA_000458875.1
Escherichia coli HVH 170 (4-3026949) - AVYA - GCA_000458555.1
Escherichia coli S88 - - GCA_000026285.1
Escherichia coli UMEA 3893-1 - AWEI - GCA_000461775.1
Escherichia coli HVH 217 (4-1022806) - AVZQ - GCA_000459375.1
Escherichia coli KTE5 - ANSP - GCA_000350665.1
Escherichia coli KTE7 - ASTP - GCA_000407705.1
Escherichia coli HVH 32 (4-3773988) - AVTX - GCA_000456505.1
Escherichia coli UMEA 3206-1 - AWCK - GCA_000460795.1
Escherichia coli UMEA 3203-1 - AWCJ - GCA_000460775.1
Escherichia coli KTE62 - ANUK - GCA_000351605.1
Escherichia coli KTE27 - ASTY - GCA_000407885.1
Escherichia coli cloneA_i1 - AEYT - GCA_000233675.2
Escherichia coli 597 - AYQU - GCA_000503475.1
Escherichia coli HVH 203 (4-3126218) - AVZD - GCA_000459115.1
Escherichia coli UMEA 3702-1 - AWDZ - GCA_000461595.1
Escherichia coli UMEA 3662-1 - AWDU - GCA_000461495.1
Escherichia coli HVH 5 (4-7148410) - AVTB - GCA_000456085.1
Escherichia coli HVH 102 (4-6906788) - AVVR - GCA_000465155.1
Escherichia coli HVH 201 (4-4459431) - AVZB - GCA_000459075.1
Escherichia coli HM605 - AJWU - GCA_000264175.1
Escherichia coli HM605 - CADZ - GCA_000285375.1
0.01
http://www.ncbi.nlm.nih.gov/assembly/?term=%22anomalous%22[Properties]
Contamination (multiple organisms)
Assembly for sample SAMN02727350
Type
Number of
contigs
Sum of contig
lengths
Full assembly 667 5251272
contigs with Listeria hits 37 3031650
contigs with Staphylococcus
hits 630 2203573
Contamination (carryover contamination)
Contamination (multiple strains)
Table: Assembly stats for SAMN02693748
measurement result
num_input_reads 4212706
aligned_reads 4040070
assembly_num_bases 3180478
assembly_num_contigs 50
assembly_N50 2817733
poor_quality_support_bases 132321
Organism Biosample SRA Run Similarity to:
Listeria monocytogenes IEH-NGS-LIS-00100 SAMN02567873 SRR1207486 Listeria SLCC7179
SRR1220750 Listeria J0161
Salmonella enterica Enteritidis MDH-2014-
00798 SAMN02741943 SRR1553852
Schwarzengrund str.
CVM19633
SRR1272871 Enteritidis str. P125109
Salmonella enterica Fluntern MDH-2013-
00153 SAMN02378158 SRR1067624
Javiana and
Schwarzengrund
SRR1395304 Cubana and Agona
Proficiency Testing
• Replicate results (phylogeny, SNPs) from published studies
• Resequencing
 same isolate on multiple platforms
 same isolate in multiple libraries
 same isolate in multiple labs
• Blinded submissions
 already-characterized isolates
 mixed sample isolates
 metagenomic isolates
• Corner cases
 Extreme coverage
 Duplicates
 Sample mixups
Acknowledgements
National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA
Richa Agarwala
Azat Badretdin
Slava Brover
Joshua Cherry
Vyacheslav Chetvernin
Robert Cohen
Michael DiCuccio
Mike Feldgarden
Dan Haft
William Klimke
Arjun Prasad
Edward Rice
Kirill Rotmistrovskyy
Stephen Sherry
Sergey Shiryev
Martin Shumway
Tatiana Tatusova
Igor Tolstoy
Chunlin Xiao
Leonid Zaslavsky
Alexander Zasypkin
Alejandro A. Schaffer
Lukas Wagner
Aleksandr Morgulis
David Lipman
James Ostell
NCBI
• This research was supported by the Intramural
Research Program of the NIH, National Library of
Medicine. http://www.ncbi.nlm.nih.gov
CDC
FDA/CFSAN
NIHGRI
UC-Davis
USDA
Vendors: PacBio, Illumina, Roche

Mais conteúdo relacionado

Mais procurados

Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applicationsrjorton
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine LectureDan Gaston
 
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingDr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingJohn Blue
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingEmiliano De Cristofaro
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingShelomi Karoon
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
Mci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseasesMci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseasesR Lin
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016ExternalEvents
 
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...ExternalEvents
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...João André Carriço
 
0301 ostrer
0301   ostrer0301   ostrer
0301 ostrertczucker
 
Genomics: The coming challenge to the health system
Genomics: The coming challenge to the health systemGenomics: The coming challenge to the health system
Genomics: The coming challenge to the health systemPrivate Healthcare Australia
 
Genomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyGenomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyDan Gaston
 
2017 09-07 Global Virome Project
2017 09-07 Global Virome Project2017 09-07 Global Virome Project
2017 09-07 Global Virome ProjectThe End Within
 
I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad TECNALIA Research & Innovation
 
Jan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome MeetingJan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome Meetingdansmith01
 

Mais procurados (20)

Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applications
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
 
Overview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategyOverview of the ECDC whole genome sequencing strategy
Overview of the ECDC whole genome sequencing strategy
 
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic SequencingDr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
Dr. Ben Hause - Pathogen Discovery Using Metagenomic Sequencing
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome Sequencing
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Mci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseasesMci5004 biomarkers infectious diseases
Mci5004 biomarkers infectious diseases
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
 
0301 ostrer
0301   ostrer0301   ostrer
0301 ostrer
 
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member StatesProposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
 
Genomics: The coming challenge to the health system
Genomics: The coming challenge to the health systemGenomics: The coming challenge to the health system
Genomics: The coming challenge to the health system
 
Genomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyGenomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and Pathology
 
2017 09-07 Global Virome Project
2017 09-07 Global Virome Project2017 09-07 Global Virome Project
2017 09-07 Global Virome Project
 
I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad I Jornada Actualización en Genética Reproductiva y Fertilidad
I Jornada Actualización en Genética Reproductiva y Fertilidad
 
Jan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome MeetingJan 15 2013 Hospital Microbiome Meeting
Jan 15 2013 Hospital Microbiome Meeting
 
Rossen eccmid2015v1.5
Rossen eccmid2015v1.5Rossen eccmid2015v1.5
Rossen eccmid2015v1.5
 

Destaque

SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTnist-spin
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Mark Pallen
 
Diagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern ApproachDiagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern ApproachChhaya Sawant
 
DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))Elisha Grace Diamsay
 
DNA Structure PowerPoint
DNA Structure PowerPointDNA Structure PowerPoint
DNA Structure PowerPointBiologyIB
 
DNA structure, Functions and properties
DNA structure, Functions and propertiesDNA structure, Functions and properties
DNA structure, Functions and propertiesNamrata Chhabra
 

Destaque (9)

SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Diagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern ApproachDiagnostic microbiology - Traditional and Modern Approach
Diagnostic microbiology - Traditional and Modern Approach
 
DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))DNA Structure and Function (Diamsay, Mendoza))
DNA Structure and Function (Diamsay, Mendoza))
 
DNA Structure PowerPoint
DNA Structure PowerPointDNA Structure PowerPoint
DNA Structure PowerPoint
 
DNA structure, Functions and properties
DNA structure, Functions and propertiesDNA structure, Functions and properties
DNA structure, Functions and properties
 

Semelhante a Bacterial Pathogen Genomics at NCBI

Bacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBIBacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBINathan Olson
 
EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013Business EpiVax
 
Automation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAutomation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAyman Allam
 
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...John Blue
 
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_FinalLawrence Hwang
 
BIOL335: Sequence alignment
BIOL335: Sequence alignmentBIOL335: Sequence alignment
BIOL335: Sequence alignmentPaul Gardner
 
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...EuFMD
 
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...EuFMD
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Merck Life Sciences
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...MilliporeSigma
 
2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)Health Catalyst
 
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...Kate Barlow
 
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...EuFMD
 
dkn520.pdf
dkn520.pdfdkn520.pdf
dkn520.pdfImeneFl
 
CCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptxCCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptxDoQuyenPhan1
 
PHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptxPHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptxSatendra Shroti
 
Wei as databank in taiwan 2011
Wei as databank in taiwan 2011Wei as databank in taiwan 2011
Wei as databank in taiwan 2011netnk
 
Presentation 2.6 Research progress and biosecurity control strategies against...
Presentation 2.6 Research progress and biosecurity control strategies against...Presentation 2.6 Research progress and biosecurity control strategies against...
Presentation 2.6 Research progress and biosecurity control strategies against...ExternalEvents
 
ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...
ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...
ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...Dr. Manuel Concepción
 

Semelhante a Bacterial Pathogen Genomics at NCBI (20)

Bacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBIBacterial Pathogen Genomics at NCBI
Bacterial Pathogen Genomics at NCBI
 
EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013EpiVax_Tregitope_Overview_2013
EpiVax_Tregitope_Overview_2013
 
Automation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAutomation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challenges
 
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
Dr. Jianqiang Zhang - Improvement of PRRSV Isolation from Clinical Samples Us...
 
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
6-23-2015 AACC Poster HIV Incidence Assay - Stengelin_Final
 
BIOL335: Sequence alignment
BIOL335: Sequence alignmentBIOL335: Sequence alignment
BIOL335: Sequence alignment
 
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
An overview of reverse genetic approaches to enhanced FMD vaccines in Africa ...
 
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
OS18 - 8.a.1 An Overview of reverse Genetic approaches to enhanced FMD vaccin...
 
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
Dr. Subha Madhavan: G-DOC – Enabling Systems Medicine through Innovations in ...
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
 
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
Unveiling the Potential of your AAV Gene Therapy: Orthogonal methods to under...
 
2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)2023 CPT Code Updates (CDM Focused)
2023 CPT Code Updates (CDM Focused)
 
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
 
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...OS16 - 2.P3.d   Genetic and Antigenic Variation of FMD Virus During Persisten...
OS16 - 2.P3.d Genetic and Antigenic Variation of FMD Virus During Persisten...
 
dkn520.pdf
dkn520.pdfdkn520.pdf
dkn520.pdf
 
CCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptxCCO_mCRPC_Management_Downloadable_3.pptx
CCO_mCRPC_Management_Downloadable_3.pptx
 
PHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptxPHI PowerPoint Template (1).pptx
PHI PowerPoint Template (1).pptx
 
Wei as databank in taiwan 2011
Wei as databank in taiwan 2011Wei as databank in taiwan 2011
Wei as databank in taiwan 2011
 
Presentation 2.6 Research progress and biosecurity control strategies against...
Presentation 2.6 Research progress and biosecurity control strategies against...Presentation 2.6 Research progress and biosecurity control strategies against...
Presentation 2.6 Research progress and biosecurity control strategies against...
 
ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...
ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...
ATCC y otros reguladores internacionales de bioderivados. Qc strains for comm...
 

Último

6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...HafsaHussainp
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 

Último (20)

6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 

Bacterial Pathogen Genomics at NCBI

  • 2.
  • 3. FDA, USDA, CDC State, Local and Foreign Public Health Agencies Industry/Academia Additional DATA ANALYSIS DATA ASSEMBLY AND STORAGE and Analysis DATA ACQUISITION NCBI, EMBL DDBJ (INDIS) (Public Access Database) Our Current Model – Publicly available data National Network of SequencersIntrenational Network of Sequencers
  • 4. Automated Bacterial Assembly SRA Reads sample 1 Trim reads (Ns, adaptor) Reference Distance tree Find closest reference genome(s) ArgoCA (Combined Assembly) De novo assembly panel Argo (Reference assisted assembly) SOAP denovo GS-assembler (newbler) MaSuRCA Celera Assembler Reads remapped to combined assembly Contig fasta Read placements (bam) Quality profile SPAdes
  • 5. WGS & Epidemiologically Relevant Distance (ERD) • WGS allows high resolution genotypic comparison of pathogen isolates • What is the epidemiological relevance of genotypic distance? • Many methods to compute – we need some common principles…
  • 6. Since all approaches start with sequence reads, we must retain for independent confirmation 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 Millions FDA-CFSAN: microbial foodborne pathogen research SRA format bytes per sequenced base versus number of bases in MiSeq runs With Quality Without Qualities 0 0.2 0.4 0.6 0.8 0 200 400 600 800 1000 1200 Millions OXFORD University: Population Genomics of Mycobacterium tuberculosis SRA format bytes per sequenced base versus number of bases in MiSeq and HiSeq runs With Quality Without Quality Storage is manageable…
  • 7. Reliable, transparent, high throughput, high resolution ERDs? Major challenge is to distinguish independent events (SNPs) from single events that generate multiple nucleotide differences i.e. collapsed repeats and other artifacts, alignment errors (reference-based alignments), sequence quality, & recombination
  • 8. Fairly uniform distribution of differences along the two genomes…? Cumulative count of differences
  • 9. Iterative density filtering (Richa Agarwala modification of Science. 2011 Jan 28;331(6016):430-4.
  • 10.
  • 11. Table: Samples currently processed (as of Sept 5, 2014) in NCBI Pathogen Pipeline Organisms Center Listeria Salmonella E. coli Total CDC 903 903 FDA + State Partners* 858 6129 307 7294 100K 565 34 599 FERA 14 14 Total 1775 6694 341 8810 Processing Status
  • 12. How to measure the system? need the raw data (sequence reads) in unprocessed form any read trimming/filtering along with the assembly can be regenerated
  • 13. Assembly metrics map the reads back to the assembly and generate a profile of each position (coverage, alleles, qualities) compare the assembly against other assemblies of the same organism (genus, species) and check the expected genome size, or similarity to related genomes annotation metrics such as frameshifted proteins
  • 14. What is the actual measurement for sequence similarity? the number of pairwise SNPs between two genomes What is the threshold? a pairwise distance (an observationally determined cutoff below which a cluster of 2 or more isolates are considered significantly close enough to warrant further investigation)
  • 15. Sensitivity vs. Specificity sequence clustering sensitivity – measure of isolates which belong to the cluster within epidemiologically relevant distance (true positives) / true positives + false negatives (not correctly identified) specificity – measure of isolates which are excluded from a cluster within epidemiologically relevant distance (true negatives) / true negatives + false positives
  • 16. Organism Total Samples Not expected species1 Mixed organisms Less than 5X coverage Duplicates PacBio Poor 2nd read Failed assembly stage Listeria 1775 20 2 (?) 1 5 1 Salmonella 6694 35 5 9 12 E. coli 341 8 1 1. not L. monocytogenes, S. enterica, or E. coli Processing Problems
  • 19.
  • 20. Streptococcus massiliensis 4401825 - CANO - GCA_000341525.1 Streptococcus massiliensis DSM 18628 - ARCE - GCA_000380065.1 Streptococcus intermedius BA1 - ANFT - GCA_000313655.1 Streptococcus intermedius B196 - - GCA_000463355.1 Streptococcus intermedius C270 - - GCA_000463385.1 Streptococcus intermedius F0413 - AFXO - GCA_000234035.1 Streptococcus intermedius SK54 - AJKN - GCA_000258445.1 Streptococcus intermedius JTH08 - - GCA_000306805.1 Streptococcus intermedius ATCC 27335 - ATFK - GCA_000413475.1 Streptococcus intermedius F0395 - AFXN - GCA_000234015.1 Streptococcus sp. AS20 - JANS - GCA_000524255.1 Streptococcus constellatus subsp. constellatus SK53 - AICQ - GCA_000257785.1 Streptococcus constellatus subsp. constellatus SK53 - BASU - GCA_000474075.1 Streptococcus constellatus subsp. pharyngis C1050 - - GCA_000463425.1 Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - AFUP - GCA_000223295.2 Streptococcus constellatus subsp. pharyngis SK1060 = CCUG 46377 - BASX - GCA_000474135.1 Streptococcus constellatus subsp. pharyngis C232 - - GCA_000463395.1 Streptococcus constellatus subsp. pharyngis C818 - - GCA_000463445.1 Streptococcus anginosus SK1138 - ALJO - GCA_000287595.1 Streptococcus sp. CM7 - JATP - GCA_000526035.1 Streptococcus sp. OBRC6 - JACR - GCA_000517685.1 Streptococcus anginosus F0211 - AECT - GCA_000184365.2 Streptococcus anginosus 1505 - BASW - GCA_000474115.1 Streptococcus sp. ACC21 - JAQU - GCA_000524375.1 Streptococcus sp. AC15 - JDFJ - GCA_000565055.1 Streptococcus anginosus subsp. whileyi MAS624 - - GCA_000478925.1 Streptococcus anginosus subsp. whileyi CCUG 39159 - AICP - GCA_000257765.1 Streptococcus anginosus C238 - - GCA_000463505.1 Streptococcus anginosus DORA_7 - AZMF - GCA_000508545.1 Streptococcus anginosus 1_2_62CV - ADME - GCA_000186545.1 Streptococcus anginosus C1051 - - GCA_000463465.1 Streptococcus anginosus T5 - BASY - GCA_000474155.1 Streptococcus anginosus SK52 = DSM 20563 - AFIM - GCA_000214555.2 Streptococcus anginosus SK52 = DSM 20563 - AREF - GCA_000373605.1 Streptococcus anginosus SK52 = DSM 20563 - BAST - GCA_000474055.1 Streptococcus intermedius SK54 - BASV - GCA_000474095.1 0.05
  • 21.
  • 22. Escherichia coli KTE179 - ANYQ - GCA_000326485.1 Escherichia coli KTE229 - ANXK - GCA_000353165.1 Escherichia coli H252 - AEFI - GCA_000190895.1 Escherichia coli HVH 180 (4-3051617) - AVYH - GCA_000458685.1 Escherichia coli HVH 73 (4-2393174) - AVUX - GCA_000457025.1 Escherichia coli HVH 104 (4-6977960) - AVVT - GCA_000457455.1 Escherichia coli HVH 19 (4-7154984) - AVTL - GCA_000456265.1 Escherichia coli 908675 - AXTY - GCA_000488755.1 Escherichia coli HVH 127 (4-7303629) - AVWO - GCA_000457855.1 Escherichia coli HVH 12 (4-7653042) - AVTG - GCA_000494955.1 Escherichia coli KOEGE 32 (66a) - AWAD - GCA_000459635.1 Escherichia coli UMEA 3041-1 - AWAW - GCA_000460015.1 Escherichia coli HVH 148 (4-3192490) - AVXH - GCA_000495015.1 Escherichia coli HVH 59 (4-1119338) - AVUQ - GCA_000456885.1 Escherichia coli HVH 222 (4-2977443) - AVZU - GCA_000459455.1 Escherichia coli UMEA 3140-1 - AWBK - GCA_000460295.1 Escherichia coli HVH 178 (4-3189163) - AVYG - GCA_000495055.1 Escherichia coli KTE4 - ANSO - GCA_000350645.1 Escherichia coli KTE3 - ASTO - GCA_000407685.1 Escherichia coli KTE240 - ASUS - GCA_000408305.1 Escherichia coli BIDMC 49b - JAPT - GCA_000522365.1 Escherichia coli BIDMC 49a - JAPU - GCA_000522385.1 Escherichia coli APEC O1 - - GCA_000014845.1 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - BAIM - GCA_000613265.1 Escherichia coli JCM 20135 - BAKV - GCA_000614505.1 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - AGSE - GCA_000690815.1 Escherichia coli DSM 30083 = JCM 1649 = ATCC 11775 - JMST - GCA_000734955.1 Escherichia coli HVH 214 (4-3062198) - AZJN - GCA_000507665.1 Escherichia coli UMEA 3162-1 - AWBU - GCA_000460475.1 Escherichia coli HVH 191 (3-9341900) - AVYR - GCA_000458875.1 Escherichia coli HVH 170 (4-3026949) - AVYA - GCA_000458555.1 Escherichia coli S88 - - GCA_000026285.1 Escherichia coli UMEA 3893-1 - AWEI - GCA_000461775.1 Escherichia coli HVH 217 (4-1022806) - AVZQ - GCA_000459375.1 Escherichia coli KTE5 - ANSP - GCA_000350665.1 Escherichia coli KTE7 - ASTP - GCA_000407705.1 Escherichia coli HVH 32 (4-3773988) - AVTX - GCA_000456505.1 Escherichia coli UMEA 3206-1 - AWCK - GCA_000460795.1 Escherichia coli UMEA 3203-1 - AWCJ - GCA_000460775.1 Escherichia coli KTE62 - ANUK - GCA_000351605.1 Escherichia coli KTE27 - ASTY - GCA_000407885.1 Escherichia coli cloneA_i1 - AEYT - GCA_000233675.2 Escherichia coli 597 - AYQU - GCA_000503475.1 Escherichia coli HVH 203 (4-3126218) - AVZD - GCA_000459115.1 Escherichia coli UMEA 3702-1 - AWDZ - GCA_000461595.1 Escherichia coli UMEA 3662-1 - AWDU - GCA_000461495.1 Escherichia coli HVH 5 (4-7148410) - AVTB - GCA_000456085.1 Escherichia coli HVH 102 (4-6906788) - AVVR - GCA_000465155.1 Escherichia coli HVH 201 (4-4459431) - AVZB - GCA_000459075.1 Escherichia coli HM605 - AJWU - GCA_000264175.1 Escherichia coli HM605 - CADZ - GCA_000285375.1 0.01
  • 23.
  • 24.
  • 26.
  • 28.
  • 29. Assembly for sample SAMN02727350 Type Number of contigs Sum of contig lengths Full assembly 667 5251272 contigs with Listeria hits 37 3031650 contigs with Staphylococcus hits 630 2203573
  • 31.
  • 33.
  • 34. Table: Assembly stats for SAMN02693748 measurement result num_input_reads 4212706 aligned_reads 4040070 assembly_num_bases 3180478 assembly_num_contigs 50 assembly_N50 2817733 poor_quality_support_bases 132321
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Organism Biosample SRA Run Similarity to: Listeria monocytogenes IEH-NGS-LIS-00100 SAMN02567873 SRR1207486 Listeria SLCC7179 SRR1220750 Listeria J0161 Salmonella enterica Enteritidis MDH-2014- 00798 SAMN02741943 SRR1553852 Schwarzengrund str. CVM19633 SRR1272871 Enteritidis str. P125109 Salmonella enterica Fluntern MDH-2013- 00153 SAMN02378158 SRR1067624 Javiana and Schwarzengrund SRR1395304 Cubana and Agona
  • 41.
  • 42. Proficiency Testing • Replicate results (phylogeny, SNPs) from published studies • Resequencing  same isolate on multiple platforms  same isolate in multiple libraries  same isolate in multiple labs • Blinded submissions  already-characterized isolates  mixed sample isolates  metagenomic isolates • Corner cases  Extreme coverage  Duplicates  Sample mixups
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Acknowledgements National Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA Richa Agarwala Azat Badretdin Slava Brover Joshua Cherry Vyacheslav Chetvernin Robert Cohen Michael DiCuccio Mike Feldgarden Dan Haft William Klimke Arjun Prasad Edward Rice Kirill Rotmistrovskyy Stephen Sherry Sergey Shiryev Martin Shumway Tatiana Tatusova Igor Tolstoy Chunlin Xiao Leonid Zaslavsky Alexander Zasypkin Alejandro A. Schaffer Lukas Wagner Aleksandr Morgulis David Lipman James Ostell NCBI • This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. http://www.ncbi.nlm.nih.gov CDC FDA/CFSAN NIHGRI UC-Davis USDA Vendors: PacBio, Illumina, Roche