Presentation from the ECDC expert consultation on Whole Genome Sequencing organised by the European Centre of Disease Prevention and Control - Stockholm, 19 November 2015
4. Population genomics:
the gene-by-gene approach
Complete
Sequence
Annotation
Bacterial Isolate
Genome Sequence
Database
(BIGSDB)
Contigs
Gene sequences
Provenance/phenotyp
e information
Jolley, K. A. & Maiden, M. C. (2010). BIGSdb: Scalable analysis of bacterial
genome variation at the population level. BMC Bioinformatics 11, 595.
5. Data submitters:
currently >1300;
Data curators:
currently >90 MLST
schemes
Sequence
definitions
MLST, rMLST,
antigen genes, core
genome, pan-
genome
GeneA
GeneB
GeneC
GeneD
Allele1: TTTGATACTGTTGCCGAAGGTTT
Allele2: TTTGATACCGTTGCCGAAGGTTT
Allele3: TTTGATTCCGTTGCCGAAGGTTT
>750 citations
Isolate datasets
• provenance
• phenotype
• gene content
• allelic variation
• genomes
Linked to:
Population
annotation
• locus classification
• description
• biochemical
pathway
• Core + accessory
genome analysis
• Association studies
Comparative
genomics
PubMLST
1998*, 2003
Gene-by-gene
analysis using
reference genome or
defined loci
Molecular typing
Species identification
Epidemiology
Vaccine coverage/
impact
Linking genotype
to phenotype
Outbreak investigation
Population structure
>8000 unique visitors/month*http://mlst.zoo.ox.ac.uk
6. PubMLST RESTful API facilitates data exchange
• All data accessible
via JSON API
• Authenticated
(OAuth) access to
protected resources
• Data submission
available soon
http://rest.pubmlst.org
7. WGS determination, interpretation
and dissemination pipeline
Isolate growth
DNA Extraction
Sequencing
(Illumina)
de novo assembly
(VELVET)
Database deposition
(BIGSDB)
Autotagged, web
accessible
sequences
Bacterial cells
Purified DNA
Short-read sequences
Assembled contiguous
sequences
Phenotype & provenance
linkage and annotation
‘Plain language’
dataBratcher, H. B., Bennett, J. S. & Maiden, M. C. J.
(2012). Evolutionary and genomic insights into
meningococcal biology. Future Microbiology 7, 873-885.
Deposited
8. MLST
(7 loci)
16S rRNA
sequences
(1 locus)
Ribosomal MLST
(53 loci)
Strain
Lineage/
Clonal Complex
Species
Family
Order
Class
Phylum
Genus
Whole genome
MLST
(>500 loci)
- Core genome
MLST
- Accessory
genome MLST
Hierarchical genome analysis
Clone
Meroclone
Maiden Maiden, M. C., van
Rensburg, M. J., Bray, J. E.,
Earle, S. G., Ford, S. A., Jolley,
K. A. & McCarthy, N. D. M.C.J.
et al. 2013. MLST revisited:
the gene-by-gene approach to
bacterial genomics. Nat Rev
Microbiol. 2013 Sep 2. doi:
10.1038/nrmicro3093.
PMCID: PMC3980634
9. Neisseria structure and
characterisation
Jolley, K. A., Brehony, C. & Maiden, M. C. (2007). Molecular typing of
meningococci: recommendations for target choice and nomenclature. FEMS
Microbiol Rev 31, 89-96.
Component Phenotypic Genotypic
Capsule Serogroup cps region
OMPS Serotype,
Subtype, etc.
porA, porB,
fetA, etc.
Housekeeping
genes
MLEE MLST
Ribosomes MALDITOF 16s rRNA,
rMLST
Neisseria meningitidis B: P1.7,16: F3-3: ST-32 (cc32)
10. Validation of WGS pipeline
• 108 diverse meningococcal isolates,
sequenced with 54bp Illumina
reads.
• Assembled with VELVET and
uploaded into BIGSDB.
• Comparison of 24 typing loci (total
of 2592 loci) previously
characterised by Sanger sequencing
in all isolates.
• There were 34 (1.3%) allelic
differences found in 20 of the de
novo assembled genomes.
• 30 discrepancies (1.15%)
attributable to Sanger sequence
errors (mislabelling, editing errors).
• 4 discrepancies (0.15%) attributable
to Velvet assembly. These were all
in the same porA allele (a repeat
sequence).
Bratcher, H. B., Corton, C., Jolley, K. A., Parkhill, J. & Maiden, M. C. (2014). A gene-by-
gene population genomics platform: de novo assembly, annotation and genealogical analysis of
108 representative Neisseria meningitidis genomes. BMC Genomics 15, 1138.
11. Genome and phenotype
• Whole genome MLST
(wgMLST).
• Autotagger – runs
regularly – tags all loci with
known alleles (>2200 in
Neisseria database.
• Each unique sequence
given new allele number.
• Loci grouped into
schemes.
• Linkage to phenotype &
other information.
Jolley, K. A. & Maiden, M. C. (2013). Automated extraction of typing information for bacterial pathogens
from whole genome sequence data: Neisseria meningitidis as an exemplar. Euro Surveill 18 (4): 20379.
12. Meningitis Research Foundation
Meningococcus Genome Library
• Charity funded.
• Open access
• All available England
and Wales (& soon
Scotland)
meningococcal isolates.
• Assembled &
annotated contiguous
sequence data.
http://www.meningitis.org/current-projects/genome
13. Isolates in the MRF Genome Library –
England and Wales
0
100
200
300
400
500
600
Z
Y
X
W/Y
W
NG
E
C
B
A
14. National Surveillance: MRF-MGL 2010-2012
Hill, D.M.C., Lucidarme, J., Gray S.J., Newbold , L.S., Ure, R., Brehony, C., Harrison,
O.B., Bray, J.E., Jolley, K.A., Bratcher H.B.,, Parkhill, J., Tang, C.M., Borrow, R., and
Maiden, M.C.J. Genomic epidemiology of age-associated meningococcal lineages in national
surveillance: an observational cohort study. Lancet Infectious diseases, DOI:
http://dx.doi.org/10.1016/S1473-3099(15)00267-4
• A total of 923 isolates from
England, Wales and Northern
Ireland.
• 899 from England and Wales:
• Scanned at >2000 loci;
• 2-313 alleles/locus;
• 219 STs, 22 clonal
complexes;
• 496 rSTs (ribosomal
sequence types);
• Most isolates (78%)
belonged to 6 clonal
complexes.
15. 0
500
1000
1500
2000
2500
3000
1975 ~ 1985 ~ 1995 ~ 1999 2000 2001 ~ 2005 2006 2007 2008 2009 2010 2011 2012
41/44 269 11 32 8 213 23 167 174 22 Other UA NT
Retrospective epidemiology
Hill, D.M.C., Lucidarme, J., Gray S.J., Newbold , L.S., Ure, R., Brehony, C., Harrison,
O.B., Bray, J.E., Jolley, K.A., Bratcher H.B.,, Parkhill, J., Tang, C.M., Borrow, R., and
Maiden, M.C.J. Genomic epidemiology of age-associated meningococcal lineages in national
surveillance: an observational cohort study. Lancet Infectious diseases, DOI:
http://dx.doi.org/10.1016/S1473-3099(15)00267-4
16. Outbreak investigation
Mulhall, RM, Brehony, C, O’Connor, L, Bennett, D, Jolley, KA, Bray, J, Maiden, MCJ,
Cunney, R. Resolution of a protracted serogroup B meningococcal outbreak in a large extended
indigenous Irish Traveller Family in the Republic of Ireland during 2010 to 2013 using non-culture
PCR, WGS and publically accessible web-based tools. In preparation.
17. High resolution international
epidemiology (W:cc11)
0
10
20
30
40
50
60
70
2005 2006 2007 2008 2009 2010 2011 2012 2013
n
year
W:cc11England and Wales2005to 2013
Current UK
UK Hajj
UK
1996 (n=3)
1997 (n=2)
1998 (n=2)
UK
1975 (n=6)
1987 (n=1)
1989 (n=1)
1990 (n=1)
UK
1996 (n=2)
1998 (n=1)
Argentina 2008-2012
Brazil 2008-2011
Current South Africa
Lucidarme, J., Hill, D.M., Bratcher, H.B., GrayS.J, du Plessis, M., Tsang, R.S.W., Vazquez, J.A., Taha, M.-K.,
Mehmet Ceyhan, Jamie Findlow J., Jolley, K.A., Maiden M.C.J., Borrow, R. (2015) Journal of Infection
0
10
20
30
40
50
60
70
80
90
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
NumberofCases
Year
N. meningitidis cases per year among inpatients in Bamako, Mali
(2002-2012)
Group A
meningococcal
cases
Group W135
meningococcal
cases
20. invasive isolate survey: proof of concept for
WGS based surveillance
Epidemiological year 2011/2012
Dominique Caugant, Holly Bratcher, Carina
Brehony, Martin Maiden, IBD-LabNet
26. Surveillance data coverage: PorA &
FetA
0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
100 66.7 33.3
numberofisolates
percent loci assigned (n=3)
21 partial antigen profiles (2.6%)
216-677 contigs / genome
1-2 loci assigned / isolate
14 no PorA VR1 allele
12 no PorA VR2 allele
6 no FetA VR allele
27. Surveillance data coverage: 5 BAST
loci
0
50
100
150
200
250
300
350
400
450
500
550
600
650
100 80-90 60-70 40-50
numberofisolates
percent loci defined
Over all 597 with partial profile
(74.8%)
14 no PorA VR1 allele
12 no PorA VR2 allele
130 no NadA peptide allele*
3 no fHbp peptide allele
19 no NHBA peptide allele
44 only 2-3 loci identified (5.5%)
average 495 contigs / genome
3 no PorA VR1 allele
11 no PorA VR1, VR2 alleles
3 no fHbp/NadA peptide alleles
19 no NHBA/NadA peptide alleles
32. Scalable genomic epidemiology
Centuries+ decades years months weeks days hours
Evolution emergence epidemiology diagnosis
COLOMBIA2004
(n=37)
Y
32%
B
51%
W-135
3%
C
14%
AFRICAN
MENINGITIS BELT
2003-2004
(n=501)
Other
1,2%
A
79%
W-135
20%
AUSTRALIA 2004
(n=361)
Other
7,2%
C
20%
A
0,3%
B
68%
W-135
3,3%
Y
2,2%
WESTERN
EUROPE 2002
(n=3,982)
A
0,1%
C
29%
Other
1,0%
B
64%
W-135
3,6%
Y
2,3%
RUSSIA 2002-2004
(n=1,899)
B
32%
A
36%
C
22%
Other
10%
CHILE 2003
(n=193)
Other
5%
C
14%
B
78%
W-135
1%
Y
2%
CANADA2003*
(n=148)
W-135
7%
C
24%
B
43%
Other
1%
Y
25%
UNITED STATES 2003
(n=200)
Y
27%
C
21% B
44%
Other
6%
W-135
2%
TAIWAN 2001
(n=43)
Y
19%
A
4,7%
W-135
41%
B
33%
C
2,3%
THAILAND 2001
(n=36)
Other
2%
B
81%
W-135
17%
SAUDI ARABIA
2002
(n=21)
B
10%
W-135
76%
A
14%
BRAZIL 2004
Sao Paulostate
(n=520)
B
36%
C
58%
Other
6%
NEW ZEALAND2004
(n=252)
C
8%
Other
0,8%
B
87%
W-135
3,6%
Y
0,4%
SOUTHAFRICA2003
(n=264)
Other
1%W-135
9%
B
29%
A
34%
C
11%
Y
16%
URUGUAY 2001
(n=53)
C
11%
B
83%
Other
6%
COLOMBIA2004
(n=37)
Y
32%
B
51%
W-135
3%
C
14%
AFRICAN
MENINGITIS BELT
2003-2004
(n=501)
Other
1,2%
A
79%
W-135
20%
AUSTRALIA 2004
(n=361)
Other
7,2%
C
20%
A
0,3%
B
68%
W-135
3,3%
Y
2,2%
WESTERN
EUROPE 2002
(n=3,982)
A
0,1%
C
29%
Other
1,0%
B
64%
W-135
3,6%
Y
2,3%
RUSSIA 2002-2004
(n=1,899)
B
32%
A
36%
C
22%
Other
10%
CHILE 2003
(n=193)
Other
5%
C
14%
B
78%
W-135
1%
Y
2%
CANADA2003*
(n=148)
W-135
7%
C
24%
B
43%
Other
1%
Y
25%
UNITED STATES 2003
(n=200)
Y
27%
C
21% B
44%
Other
6%
W-135
2%
TAIWAN 2001
(n=43)
Y
19%
A
4,7%
W-135
41%
B
33%
C
2,3%
THAILAND 2001
(n=36)
Other
2%
B
81%
W-135
17%
SAUDI ARABIA
2002
(n=21)
B
10%
W-135
76%
A
14%
BRAZIL 2004
Sao Paulostate
(n=520)
B
36%
C
58%
Other
6%
NEW ZEALAND2004
(n=252)
C
8%
Other
0,8%
B
87%
W-135
3,6%
Y
0,4%
SOUTHAFRICA2003
(n=264)
Other
1%W-135
9%
B
29%
A
34%
C
11%
Y
16%
URUGUAY 2001
(n=53)
C
11%
B
83%
Other
6%
0.1
UK 1993
Case 1
Carrier 1
FAM18
USA
1983
Carrier 2
Carrier 3
Cases 3 & 6
Remote
cases 1 & 2
Carrier 4
Carrier 5
33.
34. Contigs Total length Min Max Mean StdDev N50 L50 N90 L90 N95 L95 %GC
mean 306 2,133,479 209 88,174 7,847 12,456 33 22,789 117 5,194 151 2,688 52
max 612 2,278,600 273 258,183 19,478 32,854 80 64,227 289 16,887 370 9,336 52
min 109 2,026,649 200 30,309 3,499 4,877 12 7,569 35 1,670 44 942 51
MRF 2013/2014 assembly statistics
36. MRF 2014/2015 assembly statistics
H Bratcher, C Brehony, M Maiden, D Caugant . IDB-LabNet 2015
mean 323 2,126,265 202 80,153 6,900 10,841 35 19,614 126 4,366 162 2,215
max 516 2,278,600 273 219,677 18,381 28,710 67 58,245 236 15,605 305 9,336
min 116 2,037,538 200 34,143 4,184 5,990 13 8,869 43 1,992 53 1,065
Contigs Total length Min Max Mean StdDev N50 L50 N90 L90 N95 L95
37. Age association of meningococcal
genotypes (MRF-MGL 2010-2012)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
<1 1-4 5-9 10-14 15-19 20-24 25-29 30-39 40-49 50-69 >70
Proportionofcases
Age category
Minor clonal complexes
ND
ST-174 complex
ST-461 complex
ST-162 complex
ST-22 complex
ST-23 complex/Cluster A3
ST-213 complex
ST-60 complex
ST-41/44 complex/Lineage 3
ST-269 complex
ST-32 complex/ET-5 complex
ST-11 complex/ET-37 complex
Hill, D.M.C., Lucidarme, J., Gray S.J., Newbold , L.S., Ure, R., Brehony, C., Harrison,
O.B., Bray, J.E., Jolley, K.A., Bratcher H.B.,, Parkhill, J., Tang, C.M., Borrow, R., and
Maiden, M.C.J. Genomic epidemiology of age-associated meningococcal lineages in national
surveillance: an observational cohort study. Lancet Infectious diseases, DOI:
http://dx.doi.org/10.1016/S1473-3099(15)00267-4
38. Population annotation
Harrison, O.B., Bray, J.A., Maiden, M.C., and Caugant, D.A. (2015) Genomic Analysis of the Evolution
and Global Spread of Hyper-invasive Meningococcal Lineage 5. Ebiomedicine, 2(3), 234-243
doi:10.1016/j.ebiom.2015.01.004.
39. Validation against four reference
genomes
Isolate Loci present in
draft genome
Identical loci Discrepant
loci
Incomplete
loci
Discrepant
bases in
annotated
loci
Z2491 1872/1867
(99.8%)
1801 (96.2%) 19 (1%) 51 (2.7%) 32 (0.002%)
FAM18 1905/1914
(93.2%)
1775 (93.2%) 23 (1.2%) 107 (5.6%) 24 (0.001%)
G2136* 1897/1904
(99.6%)
1757 (92.6%) 47 (2.5%) 93 (4.9%) 90 (0.005%)
H44/76* 1967/1975
(99.2%)
1821 (92.6%) 49 (2.55) 97 (4.9%) 76 (0.004%)
Draft genomes generated by VELVET assembly of Illumina reads and deposited
in BIGSDB without further curation.
Annotations compared with GENOMECOMPARATOR.
* Finished genomes primarily generated with Roche 454 technology.
40. Phenotypic serogroup by country
0
25
50
75
100
125
150
175
200
225
250
numberofisolates
No value
NG
Y
W
W/Y
X
E
C
B
A
H Bratcher, C Brehony, M Maiden, D Caugant . IDB-LabNet 2015
41. Indexing the genome: Neiss loci
gene 122540..122974
/gene="rplK"
/locus_tag="NMC0119"
/db_xref="GeneID:4676186"
CDS 122540..122974
/gene="rplK"
/locus_tag="NMC0119"
/note="binds directly to 23S ribosomal RNA"
/codon_start=1
/transl_table=11
/product="50S ribosomal protein L11"
/protein_id="YP_974250.1"
/db_xref="GI:121634005"
/db_xref="GeneID:4676186"
/translation="MAKKIIGYIKLQIPAGKANPSPPVGPA
LGQRGLNIMEFCKAFNAATQGMEPGLPIPVVITAF
ADKSFTFVMKTPPASILLKKAAGLQKGSSNPLTNK
VGKLTRAQLEEIAKTKEPDLTAADLDAAVRTIAGS
ARSMGLDVEGVV“
Database: RefSeq
Entry: NC_008767
LinkDB: NC_008767
LOCUS NC_008767 2194961 bp DNA circular CON 10-
JUN-2013
DEFINITION Neisseria meningitidis FAM18 chromosome, complete
genome.
pubMLST.org/Neisseria
Sequence definition database
“LOCUS TAG IDENTIFIER”
NMC0119 (FAM18)
NMA0146 (020-06)
NGO1855 (FA 1090)
LOCUS “ALIASES” for
‘seed
sequences
’
42. Bacterial Isolate Genome Sequence Database
(BIGSDB)
CCATCCCGTTGTCGAACAGCAGGTACGCCA
CTTCACCGCCAACCACACCGACCTTGACCAC
AAACACCGCCTCATGCTGCTCACCGGCCCC
AATATGGGCGGCAAATCCACCTACATGCGCA
GGAACCCTCAAAGCCGTTTTCCCGGAAAACC
TATCCACAGCCGAACAGCTCCGCCAAGCCA
TTTTGCCCGAACCTTCCGTCTGGCTGAAAGA
CGGCAATGTCATCAACCACGGTTTTCATCCC
GAACTGGACGAATTGCGCCGCATTCAAAACC
ATGGCGACGAATTTTTGCTGGATTTGGAAGC
CAAGGAACGCGAACGTACCGGTTTGTCCAC
ACTTAAAGTCGAGTTCAACCGCGTTCACGGC
TTTTACATTGAATTGTCCAAAACCCAAGCCG
AACAAGCACCTGCCGACTACCAACGCCGGC
AAACCCTTAAAAACGCCGAACGCTTCATCAC
GCCGGAACTGAAAGCCTTTGAAGACAAAGT
GCTGACTGCTCAAGAGCAAGCCCTCGCCTT
AGAAAAACAACTCTTTGACGGCGTATTGAAA
AACCTTCAGACGGCATTGCCGCAGCTTCAAA
AAGCCGCCAAAGCCGCCGCCGCGCTGGAC
GTGTTGTCCACATTTTCAGCCTTGGCAAAAG
AGCGGAACTTCGTCCGCCCCGAGTTTGCCG
ACAAGTCGCGCTGATTGTTT
AACCTTCAGACGGCATTGCCGCAGCTTCAAA
AAGCCGCCAAAGCCGCCGCCGCGCTGGAC
GTGTTGTCCACATTTTCAGCCTTGGCAAAAG
AGCGGAACTTCGTCCGCCCCGAGTTTGCCG
ACTATCCGGTTATCCACATCGAAAACGGCCG
CCATCCCGTTGTCGAACAGCAGGTACGCCA
CTTCACCGCCAACCACACCGACCTTGACCAC
AGGAACCCTCAAAGCCGTTTTCCCGGAAAAC
CTATCCACAGCCGAACAGCTCCGCCAAGCC
ATTTTGCCCGAACCTTCCGTCTGGCTGAAAG
ACGGCAATGTCATCAACCACGGTTTTCATCC
CGAACTGGACGAATTGCGCCGCATTCAAAAC
CATGGCGACGAATTTTTGCTGGATTTGGAAG
CCAAGGAACGCGAACGTACCGGTTTGTCCA
CACTTAAAGTCGAGTTCAACCGCGTTCACGG
CTTTTACATTGAATTGTCCAAAACCCAAGCC
GCCCCGAGTTTGCCGACTATCCGGTTATCCA
CATCGAAAACGGCCGCCATCCCGTTGTCGA
ACAGCAGGTACGCCACTTCACCGCCAACCA
CACCGACCTTGACCACAAACACCGCCTCATG
CTGCTCACCGGCCCCAATATGGGCGGCAAA
TCCACCTACATGCGCCAAGTCGCGCTGATTG
TTT
AGGAACCCTCAAAGCCGTTTTCCCGGAAAAC
CTATCCACAGCCGAACAGCTCCGCCAAGCC
ATTTTGCCCGAACCTTCCGTCTGGCTGAAAG
ACGGCAATGTCATCAACCACGGTTTTCATCC
CGAACTGGACGAATTGCGCCGCATTCAAAAC
CATGGCGACGAATTTTTGCTGGATTTGGAAG
CCAAGGAACGCGAACGTACCGGTTTGTCCA
CACTTAAAGTCGAGTTCAACCGCGTTCACGG
CTTTTACATTGAATTGTCCAAAACCCAAGCC
GAACAAGCACCTGCCGACTACCAACGCCGG
CAAACCCTTAAAAACGCCGAACGCTTCATCA
CGCCGGAACTGAAAGCCTTTGAAGACAAAGT
GCTGACTGCTCAAGAGCAAGCCCTCGCCTT
AGAAAAACAACTCTTTGACGGCGTATTGAAA
AACCTTCAGACGGCATTGCCGCAGCTTCAAA
AAGCCGCCAAAGCCGCCGCCGCGCTGGAC
GTGTTGTCCACATTTTCAGCCTTGGCAAAAG
AGCGGAACTTCGTCCGCCCCGAGTTTGCCG
ACTATCCGGTTATCCACATCGAAAACGGCCG
CCATCCCGTTGTCGAACAGCAGGTACGCCA
CTTCACCGCCAACCACACCGACCTTGACCAC
AAACACCGCCTCATGCTGCTCACCGGCCCC
AATATGGGCGGCAAATCCACCTACATGCGC
CAAGTCGCGCTGATTGTTT
abcZ
adk
aroE
fumC
gdh
pdhC
pgm
porA
porB
fetA
penA
rpoB
16S
Locus X
Locus Y
Sequence
bin
Jolley, K. A. & Maiden, M. C. (2010). BIGSdb:
Scalable analysis of bacterial genome variation at
the population level. BMC Bioinformatics 11, 595.
Locus
definitions
tables:
annotation
source Locus Allele Provenance
abcZ 2 Country UK
adk 3 Year 2013
aroE 4 serogroup B
gdh 8 Disease carrier
pdhC 4 Age 23
pgm 6 Source Swab
... etc... ... etc ...
43. Acknowledgements
Julia BennettWT
Carly Bliss
Holly BratcherWT
James BrayWT
Carina BrehonyWT
Marianne Clemence
Ali Cody
Fran Colles
Kanny DialloWTF
Sarah Earle
Suzanne Ford
Odile HarrisonWT
Sofia Hauck
Dorothea Hill
Lisa Rebbets
Melissa Jansen van
Rensburg
Keith JolleyWT
Jasna Kovac
Jenny MacLennanWT
Noel McCarthyWTF
Maddi Pearce
Samuel SheppardWTF
Helen Strain
Eleanor Watkins
Helen Wimalarathna
44. Population genomics:
the gene-by-gene approach
Complete
Sequence
Annotation
Bacterial Isolate
Genome Sequence
Database (BIGSDB)
Contigs
Gene sequences
Provenance/phenotype
information
Jolley, K. A. & Maiden, M. C. (2010). BIGSdb: Scalable analysis of bacterial genome variation at
the population level. BMC Bioinformatics 11, 595.
45. Bacterial typing requirements
1. Universal, in that they are applicable to all bacteria.
2. Natural, reflecting genealogical relationships while retaining
the capacity to describe closely related organisms with
distinct properties.
3. Understandable, so that the output and the process by
which the system has been arrived at are transparent, easily
interpreted and reproducible, and where possible the system
should be backwards compatible with previous approaches.
4. Expandable, to account for the incompleteness of our
knowledge of diversity, and flexible enough to accommodate
changes in this knowledge.
46. Bacterial typing requirements
5. Portable, because methods need to be easily carried out in
any laboratory and the data need to be freely exchanged by
the use of generic methodologies, reagents and
bioinformatics pipelines
6. Technology independent, so that the data used are
independent of the means of their collection (this means
that schemes adopted now need to retain their validity as
data improve)
7. Readily available to the entire community
47. Bacterial typing requirements
8. Scalable, so that methods are sufficiently fast and
inexpensive to be useable in real time for large or small
numbers of isolates (this scalability is especially important for
clinical applications and large-scale bacterial population
analyses)
9. Accommodate a wide range of variation so that they can
encompass both close and distant genealogical relationships
10. Broadly accepted by those who use them and open to
contributions from members of the community.
48. Bacterial typing methods
• Universal, in that they are applicable to all bacteria
• Natural, reflecting genealogical relationships while retaining the capacity to describe closely related
organisms with distinct properties
• Understandable, so that the output and the process by which the system has been arrived at are
transparent, easily interpreted and reproducible, and where possible the system should be backwards
compatible with previous approaches
• Expandable, to account for the incompleteness of our knowledge of diversity, and flexible enough to
accommodate changes in this knowledge
• Portable, because methods need to be easily carried out in any laboratory and the data need to be freely
exchanged by the use of generic methodologies, reagents and bioinformatics pipelines
• Technology independent, so that the data used are independent of the means of their collection (this
means that schemes adopted now need to retain their validity as data improve)
• Readily available to the entire community
• Scalable, so that methods are sufficiently fast and inexpensive to be useable in real time for large or small
numbers of isolates (this scalability is especially important for clinical applications and large-scale bacterial
population analyses)
• Able to accommodate a wide range of variation so that they can encompass both close and distant
genealogical relationships
• Broadly accepted by those who use them and open to contributions from members of the community.
49. cnl meningococci & other species
Claus, H., Maiden, M. C., Maag, R., Frosch, M. & Vogel, U. (2002). Many carried
meningococci lack the genes required for capsule synthesis and transport.
Microbiology 148, 1813-1819.
Harrison, O. B., Claus, H., Jiang, Y., Bennett, J. S., Bratcher, H. B., Jolley, K. A.,
Corton, C., Care, R., Poolman, J. T., Zollinger, W. D., Frasch, C. E., Stephens, D. S.,
Feavers, I., Frosch, M., Parkhill, J., Vogel, U., Quail, M. A., Bentley, S. D. & Maiden,
M. C. J. (2013). Description and Nomenclature of Neisseria meningitidis Capsule
Locus. Emerging Infectious Diseases 19, 566-573.
50. First generation genomics:
single locus typing and MLSTaroE
gdh
pgm
adk
pdhC
fumC
porA
fetAabcZ
Maiden, MCJ, Bygraves, JA, Feil, E, Morelli, G, Russell, JE, Urwin, R, Zhang, Q, Zhou, J, Zurth, K,
Caugant, DA, Feavers, IM, Achtman, M & Spratt, BG. 1998. Multilocus sequence typing: a portable
approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad
Sci USA 95, 3140-3145.
Maiden, MC. 2006. Multilocus Sequence Typing of Bacteria. Annu Rev Microbiol 60, 561-588.
Jolley KA, Brehony C, Maiden MC. 2007. Molecular typing of meningococci: recommendations for target
choice and nomenclature. FEMS Microbiol Rev 31, 89-96.
• Neisseria seven-locus ST
summarises 3284bp.
• That is 0.15% of the 2.18Mbp
genome.
• 11,001 STs in PubMLST
database (September 2014).
• 469-750 alleles per locus.
• Many polymorphisms per
locus.
52. GENOMECOMPARATOR: rapid comparative
genomics
Jolley, K. A., Hill, D. M., Bratcher, H. B., Harrison, O. B., Feavers, I. M., Parkhill, J. &
Maiden, M. C. (2012). Resolution of a meningococcal disease outbreak from whole genome
sequence data with rapid web-based analysis methods. J Clin Microbiol. 50(9):3046-53.
SPLITSTREE 4.0
NEIGHBORNET
53. Ribosomal multi-locus sequence
typing, rMLST
Jolley, K. A., Bliss, C. M., Bennett, J. S., Bratcher, H. B., Brehony, C. M., Colles, F. M.,
Wimalarathna, H. M., Harrison, O. B., Sheppard, S. K., Cody, A. J. & Maiden, M. C. (2012).
Ribosomal Multi-Locus Sequence Typing: universal characterisation of bacteria from
domain to strain. Microbiology 158, 1005-1015.
• Isolate characterisation from ‘domain to
strain.
• Indexes the 53 ribosomal genes.
• PubMLST.org/rMLST, provides a look-up table
available on the web.
• Ribosomal sequence types, rSTs related to
appropriate nomenclatures, October 2014:
• 99,996 genome sequences;
• 977 genera;
• 2,531 unique species ;
• rSTs defined for 6 groups, Neisseria and
Campylobacter to clonal complex level.
55. Lineage 5: 40 years of global disease
and reverse vaccinology
1,886 (95%) core loci
52 (3%) accessory
Harrison, O. B., Bray, J. E., Maiden, M. C. J. & Caugant, D. A. Genomic Analysis of the Evolution and
Global Spread of Hyper-invasive Meningococcal Lineage 5. EBioMedicine.
Harrison, O.B., Hill, D.M., Maiden, M.C.J. unpublished.
56. Variability across the lineage 5 (ST-32
complex) genome
229 loci identical
1,600 loci p-distance values below
0.002
Harrison, O.B., Bray, J.A., Maiden, M.C., and Caugant, D.A. (2015)
Genomic Analysis of the Evolution and Global Spread of Hyper-
invasive Meningococcal Lineage 5. Ebiomedicine, 2(3), 234-243
doi:10.1016/j.ebiom.2015.01.004.
57. Meningitis Research Foundation
Meningococcus Genome Library
• Charity funded.
• Open access
• All available England
and Wales (& soon
Scotland)
meningococcal isolates.
• Assembled &
annotated contiguous
sequence data.
http://www.meningitis.org/current-projects/genome
58. MRF-MGL isolates 2010-2012
• A total of 923 isolates from
England, Wales and
Northern Ireland.
• 899 from England and
Wales:
• Scanned at >1600 loci;
• 2-313 alleles/locus;
• 219 STs, 22 clonal
complexes;
• 496 rSTs (ribosomal
sequence types);
• Most isolates (78%)
belonged to 6 clonal
complexes.
ST-41/44 complex
237 isolates
ST-269 complex
171 isolatesST-11 complex, 59 isolates
ST-213 complex
75 isolates
ST-23 complex
120 isolates
ST-32 complex
42 isolates
59. 0
500
1000
1500
2000
2500
3000
1975 ~ 1985 ~ 1995 ~ 1999 2000 2001 ~ 2005 2006 2007 2008 2009 2010 2011 2012
41/44 269 11 32 8 213 23 167 174 22 Other UA NT
Meningococcal clonal complexes and
disease: England and Wales
Hill, D.M.C., Lucidarme, J., Gray S.J., Newbold , L.S., Ure, R., Brehony, C., Harrison,
O.B., Bray, J.E., Jolley, K.A., Bratcher H.B.,, Parkhill, J., Tang, C.M.,, Borrow, R., and
Maiden, M.C.J. Genomic epidemiology of age-associated meningococcal lineages in national
surveillance: an observational cohort study. Submitted.
61. MRF-MGL isolates:
genogroups by epidemiological year
0
100
200
300
400
500
600
07/2010-06/2011 07/2011-06/2012 07/2012-06/2013 07/2013-06/2014 07/2014-06/2015
Numberofisolates
Epidemiological Year
Y
X
W/Y
W
NG
E
C
B
A
64. Contiguous sequences (contigs.)
Data sources
First generation ‘Next generation’
Archival
Short-read
sequence
data
DNA
Sequence on
preferred platform
(e.g. Illumina)
Bacteria
l isolate
Complete, assembled closed
genomes with annotation, available
from public databases (e.g. IMGD)
Clinical
specimen
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
TGGAGCAGATCGAGGAGAGCGAGTTCGACGC
Assemble with
preferred software
(e.g. VELVET)
65. wgMLST ST-32 complex isolates
2063 CDS,
1,894 present in all
isolates
Harrison, O.B. Maiden M.C., Caugant, D.A. Unpublished,
66. Rapid automated genome
assembly
506 Isolates
Illumina Genome Analyzer GAIIx
Read Lengths: 100 Nucleotides
Average Input FASTQ Filesize:
586MB
(258 million nucleotides)
Average Number of Reads: 2.58
million
K-mer Range: 21-99
Median Final K-mer: 81
Median N50: 37,503
Average Number of Contigs: 209
Average Program Time: 22 mins 31
secs
Total Program Time: 58 hours
Filesize (MB)
ProgramTime(hh:mm:ss)
Total AutoAssembler.pl Program Time
Using 10 Threads Per Assembly
James Bray, unpublished
67. BIGSDB automated annotation
MLST definitions CCATCCCGTTGTCGAACAGCAGGTACGCCA
CTTCACCGCCAACCACACCGACCTTGACCAC
AAACACCGCCTCATGCTGCTCACCGGCCCC
AATATGGGCGGCAAATCCACCTACATGCGCA
GGAACCCTCAAAGCCGTTTTCCCGGAAAACC
TATCCACAGCCGAACAGCTCCGCCAAGCCA
TTTTGCCCGAACCTTCCGTCTGGCTGAAAGA
CGGCAATGTCATCAACCACGGTTTTCATCCC
GAACTGGACGAATTGCGCCGCATTCAAAACC
ATGGCGACGAATTTTTGCTGGATTTGGAAGC
CAAGGAACGCGAACGTACCGGTTTGTCCAC
ACTTAAAGTCGAGTTCAACCGCGTTCACGGC
TTTTACATTGAATTGTCCAAAACCCAAGCCG
AACAAGCACCTGCCGACTACCAACGCCGGC
AAACCCTTAAAAACGCCGAACGCTTCATCAC
GCCGGAACTGAAAGCCTTTGAAGACAAAGT
GCTGACTGCTCAAGAGCAAGCCCTCGCCTT
AGAAAAACAACTCTTTGACGGCGTATTGAAA
AACCTTCAGACGGCATTGCCGCAGCTTCAAA
AAGCCGCCAAAGCCGCCGCCGCGCTGGAC
GTGTTGTCCACATTTTCAGCCTTGGCAAAAG
AGCGGAACTTCGTCCGCCCCGAGTTTGCCG
ACAAGTCGCGCTGATTGTTT
AACCTTCAGACGGCATTGCCGCAGCTTCAAA
AAGCCGCCAAAGCCGCCGCCGCGCTGGAC
GTGTTGTCCACATTTTCAGCCTTGGCAAAAG
AGCGGAACTTCGTCCGCCCCGAGTTTGCCG
ACTATCCGGTTATCCACATCGAAAACGGCCG
CCATCCCGTTGTCGAACAGCAGGTACGCCA
CTTCACCGCCAACCACACCGACCTTGACCAC
AGGAACCCTCAAAGCCGTTTTCCCGGAAAAC
CTATCCACAGCCGAACAGCTCCGCCAAGCC
ATTTTGCCCGAACCTTCCGTCTGGCTGAAAG
ACGGCAATGTCATCAACCACGGTTTTCATCC
CGAACTGGACGAATTGCGCCGCATTCAAAAC
CATGGCGACGAATTTTTGCTGGATTTGGAAG
CCAAGGAACGCGAACGTACCGGTTTGTCCA
CACTTAAAGTCGAGTTCAACCGCGTTCACGG
CTTTTACATTGAATTGTCCAAAACCCAAGCC
GCCCCGAGTTTGCCGACTATCCGGTTATCCA
CATCGAAAACGGCCGCCATCCCGTTGTCGA
ACAGCAGGTACGCCACTTCACCGCCAACCA
CACCGACCTTGACCACAAACACCGCCTCATG
CTGCTCACCGGCCCCAATATGGGCGGCAAA
TCCACCTACATGCGCCAAGTCGCGCTGATTG
TTT
AGGAACCCTCAAAGCCGTTTTCCCGGAAAAC
CTATCCACAGCCGAACAGCTCCGCCAAGCC
ATTTTGCCCGAACCTTCCGTCTGGCTGAAAG
ACGGCAATGTCATCAACCACGGTTTTCATCC
CGAACTGGACGAATTGCGCCGCATTCAAAAC
CATGGCGACGAATTTTTGCTGGATTTGGAAG
CCAAGGAACGCGAACGTACCGGTTTGTCCA
CACTTAAAGTCGAGTTCAACCGCGTTCACGG
CTTTTACATTGAATTGTCCAAAACCCAAGCC
GAACAAGCACCTGCCGACTACCAACGCCGG
CAAACCCTTAAAAACGCCGAACGCTTCATCA
CGCCGGAACTGAAAGCCTTTGAAGACAAAGT
GCTGACTGCTCAAGAGCAAGCCCTCGCCTT
AGAAAAACAACTCTTTGACGGCGTATTGAAA
AACCTTCAGACGGCATTGCCGCAGCTTCAAA
AAGCCGCCAAAGCCGCCGCCGCGCTGGAC
GTGTTGTCCACATTTTCAGCCTTGGCAAAAG
AGCGGAACTTCGTCCGCCCCGAGTTTGCCG
ACTATCCGGTTATCCACATCGAAAACGGCCG
CCATCCCGTTGTCGAACAGCAGGTACGCCA
CTTCACCGCCAACCACACCGACCTTGACCAC
AAACACCGCCTCATGCTGCTCACCGGCCCC
AATATGGGCGGCAAATCCACCTACATGCGC
CAAGTCGCGCTGATTGTTT
abcZ
adk
aroE
fumC
gdh
pdhC
pgm
porA
porB
fetA
penA
rpoB
16S
Locus X
Locus Y
MLST definitions
database
External
definitions
databases
Sequence
bin
Jolley, K. A. & Maiden, M. C. (2010). BIGSdb:
Scalable analysis of bacterial genome variation at
the population level. BMC Bioinformatics 11, 595.