Talk at Doherty Institute Advances in Microbial Genomics for Public Health and Clinical Microbiology symposium.
Salmonella genomics for epidemiology and emerging threats.
1. WGS for Salmonella
public health: 25000
genomes and counting
Dr Philip Ashton @flashton2003
Bioinformatician
Formerly Gastrointestinal Bacteria Reference Unit
http://www.slideshare.net/PhilipAshton1/20161108-doherty-institute-symposium
8. SNP Typing Cluster Detection
8
A G T C G C G T A T G T C T G A C C C
A G T C G C A T G T A T
C G C G T A T A T G A C C C
A G T C G C G A T G A C C C
G T A T G T A
T C G C G T G T A T G A
A G T C G C G A T G A C
G C G T A T
T G T A T
C G T A T G
Reference
genome
Isolate
genome
pieces
Single nucleotide polymorphism (SNP)
i.e. a different nucleotide to the reference genome
Pathogen genomes and public health – Background & Methods
9. SNP clustering & the SNP address
• Maintain a SNP
distance matrix
• Hierarchical
Clustering on
that matrix to
produce a
‘SNP address’
9
Strain1 Strain2 SNP dist
E145346 TW09098 917
H124380427 TW09098 1050
H124380427 E145346 812
H124380427 H124340168 1
SNP
threshold
250 100 50 25 10 5 0
SNP
address
1 1 1 158 199 222 243
Pathogen genomes and public health – Background & Methods
16. SNP address
16
SNP
threshold
250 100 50 25 10 5 0
SNP
address (1)
1 1 1 158 199 222 243
SNP
address (2)
1 1 1 158 199 222 256
SNP
address (3)
1 2 2 35 60 125 160
Pathogen genomes and public health – Background & Methods
17. 2. Epi Stories
17 Pathogen genomes and public health – Epi Stories
1.Background
2.Don’t go on holiday
3.Don’t eat eggs
4.Don’t have pets
*not official PHE advice
*
*
*
18. 18
PHE sequenced Salmonella April-August 2014
(Enteritidis, Typhimurium, Typhi, Agona, Paratyphi A,
Paratyphi B Java, Newport)
(10 SNP cluster, < 7 days, n = 5)
Exposure information available
Clusters with at least 5 cases
with exposure information
Statistically significant positive association
Alison Waldram & Gayle Dolan
Pathogen genomes and public health – Epi Stories, background
19. 19
Enteritidis
N = 26
Typhimurium
N = 6
Typhi
N = 2
How many clusters?
Pathogen genomes and public health – Epi Stories, background
Traditional cluster
detection, 1 for 32.
N = 287
Inns et al., Eurosurveillance, 2015
Quick et al., Genome Biology, 2015
20. 20 Pathogen genomes and public health15/32 clusters had > 75% foreign travel
21. 21
10 SNP cluster (N=27)
5 SNP cluster (N=14)
0 SNP cluster (N=6)
WGS can give geographical
info, even if the patient doesn’t
Alison Waldram & Gayle Dolan
Pathogen genomes and public health – Epi Stories, we know where you went on holiday
22. 22 Pathogen genomes and public health – Epi Stories, don’t eat eggs
Outbreak
Don’t eat eggs*
Salmonella Enteritidis
(28000 variant positions)
23. Pathogen genomes and public health23 Pathogen genomes and public health – Epi stories, don’t eat eggs, or go on holiday
Inns et al., Epi and Infection, 2016
doi:10.1017/S0950268816001941
25. 25 Pathogen genomes and public health – Epi Stories, don’t have pets
Figure 1: S. Enteritidis PT8 cases from 2012 and 2015
Richard Elson, Sanch Kanagarajah
Don’t have pets*
26. 26 Pathogen genomes and public health – Epi Stories, don’t have pets
Clusters identified by WGS
29. 3. Salmonella Typhimurium
ST313: Just an African problem?
29 Pathogen genomes and public health – ST313, just an African problem?
Siân Owen
@implosian
Jay Hinton
@jay_salsa
30. 30 Pathogen genomes and public health– ST313, just an African problem?
Salmonella Typhimurium
Non-typhoidal Salmonella (NTS) cause gastroenteritis
worldwide
Thought to be mainly ST19
Sian Owen
31. 31 Pathogen genomes and public health – ST313, just an African problem?
Invasive non-typhoidal Salmonella (iNTS) cause systemic disease in sub-
Saharan Africa
How have S. Typhimurium ST313 adapted to
cause systemic disease in an African human host?
390,000
deaths per year
Sian Owen
33. 33 Pathogen genomes and public health – ST313, just an African problem?
Typhimurium
ST19
ST313
lineage I
ST313
lineage II
Food poisoning
Horrible invasive
disease
???
Sian Owen
34. 34 Pathogen genomes and public health – ST313, just an African problem?
Do we see ST313 in the UK?
Lineage II
Non-
lineage II
Sub-Saharan Africa 7 1
Not sub-Saharan Africa 2 41
Lineage II
Non-lineage
II
Blood 8 3
Faeces 1 57
101
SNPs
Lineage 1
Lineage 2
35. 35 Pathogen genomes and public health – ST313, just an African problem?
Typhimurium
ST19
ST313
lineage I
ST313
lineage II
Food poisoning
Horrible invasive
disease
The two African ST313 lineages
are not actually phylogenetic
neighbours….
UK lineages
Sian Owen
40. Acknowledgements
40 Salmonella WGS at PHE
Gastrointestinal Bacteria Reference Unit
Elizabeth de Pinna, Satheesh Nair, Martin Day, Tim Dallman,
Kathie Grant, Salmonella Reference Service staff
Epidemiology
Alison Waldram (FETP), Gayle Dolan, Richard Elson
Genomic Services Unit
Cath Arnold and team
Bioinformatics Unit
Jonathon Green, Anthony Underwood
Rediat Tewolde, Ulf Schaefer
University of Liverpool
Jay Hinton
Sian Owen
Notas do Editor
so therefore, nothing in this talk should be taken as the official phe line
So, you have to wait for the outcome of each process before you start the next process. You have to know what serotype it is, before you start phage typing. Before you do mlva, you have to know the phage type, and there has to have been a meeting between the micro and epi deciding that there is something worth investigating, and then you start the mlva process.
Which is fairly typical of most public health labs, its an unavoidable consequence of having a variety of tests conditionally applied to the isolates coming through your lab. And, at a lot of steps, it takes a person to decide which path the sample goes down, which has operational implications.
So, here we have the wgs workflow. The sample is received on day 1 …. Blah blah blah.
This can be the same workflow for more or less every bug in the gastro lab. Since you have pretty much all the information you could ever need in the sequence data, you can automatically push the data down the appropriate pipeline. And once you buy into a decent informatic system, you really start to reap rewards. For example, oh, there is a cluster in the south east of england. Easy to automate an email to that team telling them there is a cluster. Or whatever suits your workflow. The results arrive, ready for a person to validate and send teh sample out. The rate limiting step is really he bioinformaticians you have to implement and maintain the pipeline.
Since you have all the information you need for id, typing and cahracterisation done in one test, that means fewer post-assay delays before the result can be reported out to the customer. All the info arrives to the validator in one batch.
Ok, so if you are going to do this, what is quite expensive test, you need to get your moneys worth, so it’s a good thing we can replace pretty much every test with a wgs equivaent. We can subspeciate at 99.7% accuracy ..
Ok, so we are going to be talking about snps quite a lot from here on, so just to recap what a snp is. Quite simple…
And we can use these snps to infer phylogenetic trees showing how the isolates are related in evolution.
However,
Another problem you are going to come across is how to commuicate snp differences to different people. The solution tim dallman came up with was the snp address…blah blah blah, leaves you with this 7 number string.
Ok, so a quick analogy to explain the snp address.
.. And this kind of format will allow epidemiologists to rapidly know the clusters that are closely related.
Ok, so this has all been a bit fluffy so far, so lets get into some concrete results.
We ran WGS in parallel with traditional typing for a year without using it for outbreak detection. WGS was only used for outbreak investigation once an outbreak was identified by traditional means. As part of that, we wanted to determine thresholds for using in outbreak detection once we started running with WGS only.
Around 30% of cases in clusters have to be discarded because there wasn’t enough information available. Major problem is getting access to standard questionaires. With WGS we are going to be moving away from reactive outbreak investigations, which is great because food recall is going to decrease dramatically with time. In an ideal world, we would have exposure information on all cases of salmonella, before typing. Then can go back once we have the typing and generate hypotheses. This is obviously a big shift in number of cases, and perhaps epidemiology needs to drag itself into the 21st century to join microbiology e.g. web forms sent to patients rather than paper forms filled out over the phone. The quality of the information might drop, but perhaps the quality of the typing will make up for it.
Of the clusters that had enough information available 17/21 (80%) had a statistically significant association when compared against controls (controls were sporadic (> 50 SNPs from any other isolate) isolates with exposure information). Case-case study. This is a great endorsement of WGS as a typing tool for detecting linked cases.
This is a really important statistic. It’s well known that food poisoning is associated with travel abroad, but what is interesting here is that cases associated with travel to certain countries form genetic clusters with a level of similarity we associate with a common exposure. This means that these cases, which are being caused abroad, but which are causing illness in the UK on travellers return, and the costs associated with that, are possibly the result of point source exposures. And if we can identify those exposures, and work with holiday operators to improve food safety in, for example, problem hotels or other venues, then we can stop people getting sick. Which is great news for public health.
Interestingly, there has been a trend recently in legal firms advertising on daytime tv, similar to the ‘have you been injured in an accident at work?’ (do you have those here?), asking people if they became ill on holiday, because the tour operator could be liable if the illness is the result of their negligence. Maybe this kind of financial penalty for holidaymakers getting sick will lead to tour operators having strciter criteria for their operators, and less illness acquired abroad?
So, there are two things to focus on here. Firstly, if we look at the inner ring, you can see the temporal spread of this outbreak, over 30 weeks of 2015. and then, more interestingly, in white, are cases from 2014. and now, if we had an up to date tree of this exact clade, you would see new cases from 2016 in here as well.
So, this is another example of a significant number of cases, 136 met the outbreak definition, caused by a recurrant problem. Traceback, which took a lot of time and effort, didn’t result in any positive eggs, although we are fairly sure that it is linked with spanish eggs. So, rather than, when we have another outbreak this year, spending a lot of time on traceback, we can increase prosepective sampling on eggs imported from spain, and liase with spanish colleagues about trying to find the source for this, as it could be a single source.
So, a couple of other features of using snps to identify clusters. The clusters can last a really long time. In this study, which only lasted three months we had one cluster that spanned almost the entirely of that time, and actually went on for much longer. I’m going to go into some more details about that shortly.
The other thing to say is that, as we all know, bacteria don’t respect borders, and they respect public health administrative boundaries even less. 95% of clusters had cases in more than one PHE centre which is a large area that can have up to a few mllion people living in it. This has boring but important consequences for how you actually administer the investigation of these clusters. Traditionally, the centres were used to detec clusters, based on phage type, but now we have the sensitivity and specificity of WGS, we can link clusters across borders, which muddies the waters as to who is responsible for investigation.
Ok, so this is the investigation of the cluster which lasted 106 days in the previous slide. When we looked at it, we cases from this cluster dating back to 2012 and last I heard, we are still getting cases associated with this cluster.
This is the number of cases of Salmonella Enteritidis PT8 between 2012-aug 2015. PT stands for phage type, an older typing method we don’t do any more involving testing the susceptibility of the strain to a panel of phages.
The first thing to say here is that, pre-wgs, outbreak detection was done using an exceedance method. This can be illustrated if we take august 2013 and 2014 as a base line, and then compare with august 2012. the number of cases in 2012 exceeds the number we would expect based on 2013 & 2014.
So, after we started sequencing, we noticed that there was a highly related cluster that was contributing a lot of cases to this baseline. Up to 90%of the PT8 cases in any month were in this cluster, and there was no seasonal variation we see with non-cluster pt8.
So, my epi colleagues looked at the exposures reported by cases in the cluster compared with exposure in the controls. The thing that really jumped out was exposure to snakes. Now, reptiles aren’t typically associated with salmonella sub-species 1, they are associated with the other sub-species.
So, it wasn’t the snakes themselves, it was the snake food, which is typically day old mice.
These mice are bought from pet shops in large batches and kept frozen.
So, the question is, do we see any st313 in the uk? Yes we do, we sequenced st313 from 75 different people, one dog, and one foodstuff.
The uk-st313 are genomically heterogeneous. No L1, 9 L2 infections. Broad swathe of novel diversity.
Epidemiologically heterogeneous. Of the 9 patients with l2 infections for whom travel info was available, 7 reported travel to africa. Only 1 out of 42 non-L2.
Clinically heterogeneous. Again, 8/9 l2 = invasice, 3/57 UK. Underlying conditions a big unknown confounder, don’t know hiv status for example.
Bioinfo pipelines – good, fast, cheap. Pick 2.
Your going to see new types of outbreak, chances are you are used to investigating the types of outbreak your typing methods can pick up. New typing method means different kinds of outbreaks, great oppo, operational challenge.
And finally, wgs in developed countries will be a useful tool in monitoring emerging threats in LMICs.