SlideShare uma empresa Scribd logo
1 de 73
Next-generational sequencing for
microbial ecology:
alpha diversity, beta diversity, and
biases in high-throughput sequencing
Rachel Adams
Andrew Rominger
Sara Branco
Thomas Bruns
Understudied but fundamental ecological
habitat
Implications for human health
Sick building syndrome
Metrics are practically absent: composition and
quantitative characteristics
Need comparison of “typical” buildings
The microbiome of the built environment
Understudied but fundamental ecological
habitat
Implications for human health
Sick building syndrome
Metrics are practically absent: composition and
quantitative characteristics
Need comparison of “typical” buildings and high
replication across settings to detect patterns
The microbiome of the built environment
?
?
?
The What and Why of the indoor microbiome
?
?
?
Architecture
Ventilation
Building function
The What and Why of the indoor microbiome
?
?
?
Architecture
Ventilation
Building function Environmental setting
The What and Why of the indoor microbiome
?
?
?
Architecture
Ventilation
Building function Environmental setting
Residents
The What and Why of the indoor microbiome
Fungi in the indoor microbiome, and beyond
Yeasts
Filaments
Fungi in the indoor microbiome, and beyond
Yeasts
Filaments
Saprobes
Fungi in the indoor microbiome, and beyond
Yeasts Saprobes
Symbionts
Parasites Mutualists
− +
Assessing environmental fungi
1. Estimated that 5-20% of fungi grow in culture
2. Identification requires a fungal taxonomist
Assessing environmental fungi
SSU RNA (18S) (5.8S) LSU RNA (28S)
ITS1 ITS2
Nuclear ribosomal internal transcribed spacer
(ITS) region as a universal DNA barcode
marker for Fungi - Schoch et al. 2012
High-throughput sequencing has greatly expanded
capabilities in microbial ecology
ACGAGTGCGT
High-throughput sequencing has greatly expanded
capabilities in microbial ecology
ACGAGTGCGT
High-throughput sequencing has greatly expanded
capabilities in microbial ecology
ACGAGTGCGT
ACGCTCGACA
AGACGCACTC
AGCACTGTAG
ATCAGACACG
104
– 107
sequence reads
High-throughput sequencing has greatly expanded
capabilities in microbial ecology
α1
β12
ϒ
α2 α3
β23
β13
alpha, beta, gamma diversity
α1
α2 α3
alpha, beta, gamma diversity
α1
β12
α2 α3
β23
β13
alpha, beta, gamma diversity
α1
β12
ϒ
α2 α3
β23
β13
alpha, beta, gamma diversity
Kunin et al. 2010
Groundtruthing high-throughput sequencing for
alpha richness
Kunin et al. 2010
αtrue < αest
Groundtruthing high-throughput sequencing for
alpha richness
Groundtruthing high-throughput sequencing
True samples
High-throughputsequencing
Observed samples
α1
α2 α3
α1+
α2+ α3+
In terms of diversity, we know that α
can be elevated in high-throughput
sequenced communities...
True community
Observed community
β12 β13
β23
β12? β13?
β23?
α1
α2 α3
α1+
α2+ α3+
...but how does that change
conclusions of ecological processes
that are based on β diversity?
High-throughputsequencing
A key component to community ecology: Linking
processes to this compositional variation
Adams et al., ISME Journal, 2013
Beta diversity: the variation in species composition
among sites
Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
Hypothesis: No
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the
Question and hypotheses
Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
Hypothesis: No
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the
Question and hypotheses
Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the
errors
Question and hypotheses
Simulation process
Initial community
Simulated community
OTU1 OTU2 … OTUj
Sample
1
Sample
2
…
Sample i
OTU1 OTU2 … OTUk
Sample
1
Sample
2
…
Simulation process
Expected relative
abundance of OTUs
Initial communities
Simulation process
Biased relative
abundance
Variation in taxon-
specific amplification
Initial communities
Expected relative
abundance of OTUs
Simulation process
Biased relative
abundance
Variation in taxon-
specific amplification
Biased relative
abundance + error
Sequence error
Initial communities
Expected relative
abundance of OTUs
Simulation process
Biased relative
abundance
Variation in taxon-
specific amplification
Biased relative
abundance + error
Sequence error
Clustering OTUs
Initial communities
Biased relative
abundance + error +
clustering
Expected relative
abundance of OTUs
Simulation process
Biased relative
abundance
Variation in taxon-
specific amplification
Biased relative
abundance + error
Sequence error
Biased relative
abundance + error +
clusteringClustering OTUs
Simulated communities
Initial communities
Expected relative
abundance of OTUs
Model summary – 2 types of errors
1. Create group differences that aren’t there (Type I error)
-0.5 0.0 0.5
-0.4-0.20.00.20.4
True
NMDS1
NMDS2
-0.5 0.0 0.5
-0.4-0.20.00.20.4
Perceived
NMDS1
NMDS2
Model summary – 2 types of errors
2. Loose groups differences that are there (Type II error)
-0.5 0.0 0.5
-0.4-0.20.00.20.4
True
NMDS1
NMDS2
-0.5 0.0 0.5
-0.4-0.20.00.20.4
Perceived
NMDS1
NMDS2
Model summary output
1. Presence of bias: Statistical categorical differences
Groups R2
p-value
Location 0.02 0.34
Season 0.20 0.001
2. Degree of bias: percentage difference between true
and simulated communities
(Simulated – True)
True
= Normalized bias
Model summary output
1. Presence of bias: Statistical categorical differences
2. Degree of bias: percentage difference between true
and simulated communities
(Simulated distance – True distance)
True distance
= Normalized error
Morisita-Horn distance metric
Groups R2
p-value
Location 0.02 0.34
Season 0.20 0.001
Categorical differences are robust to high-throughput
sequencing errors in alpha diversity, regardless of the
underlying patterns of beta-diversity
The degree of bias is not affected by the underlying
patterns of beta-diversity but dependent on
community characteristics
Model findings
Model findings
Categorical differences are robust to high-throughput
sequencing errors in alpha diversity, regardless of the
underlying patterns of beta-diversity
The degree of bias is not affected by the underlying
patterns of beta-diversity but dependent on
community characteristics
True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error
True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error
True Simulated True Simulated
0.00.20.40.60.81.0
pvalues No groups Two groups
Model summary – Type I & II error
Whether groups are different or the same will not be biased
by inflated alpha diversity
Model summary – Degree of bias
Degree of bias will be affected by
- the error rate of the platform and OTU- clustering
- the gamma diversity of the environment
- the precise shape of the species abundance
distribution
But not the relationship among samples
Increasing probability of sequencing error and over-
splitting OTUs increases bias
1e-04 0.0334 0.0667 0.1
0.00.10.20.30.40.50.6
No groups
Normalizederror
1e-04 0.0334 0.0667 0.1
Two groups
Probability of splitting
Increasing OTU richness decreases bias
100 600 1100
0.00.20.40.60.8
Number of OTUs
Normalizederror
Shape of species abundance distribution (SAD) affects
bias
0 200 400 600 800 1000 1200
010002000300040005000
Rank
Abundance
Shape of species abundance distribution (SAD) affects
bias
1.5 2.5 3.5
0.00.20.40.60.8
Increasing SAD variance
Normalizederror
As true community distance increases, degree of error
decreases
0.65 0.70 0.75 0.80
0.20.30.40.50.6
True distance
Normalizederror
Clustering is the main error-producing step
True Amplified Split
0.00.10.20.30.40.5
R^2values Two groups
Simulation overview
Categorical analysis very robust to errors in high-
throughput biases
Degree of bias will be affected by error rate of the
sequencing platform and OTU-clustering, the gamma
diversity of the environment, the precise shape of the
species abundance distribution
High-throughput error leads to an over-estimation of
the difference between groups
Mean bias is ~20-40%
Incorrect OTU clustering is most of that
Steps
1. In silico: Add further complexity to simulations
2. In vitro: Empirically test artificially-created
microbial communities
Do errors that inflate alpha diversity bias conclusions on beta
diversity between samples?
Why would it?
• Particular taxa in one environment grouping do not amplify or
amplify in a way that skews relative abundance of all others*
• Clustering incorrectly groups divergent taxa or splits identical
taxa
Hypothesis: No
While richness/diversity estimations will be off for any given
sample, conclusions of beta-diversity will be robust to the errors
Question and hypotheses
Air samples in a mycology classroom:
a unique source distorts perceived species richness
Air samples in a mycology classroom:
a unique source distorts perceived species richness
Mycology classroom appears to be less rich than other
classrooms…
0 2000 4000 6000 8000
02004006008001000
B
A
C
D
E
Individuals
ChaoEstimatedRichness
… but has higher biomass
A B C D E
050100150200
Classroom
Penicilliumsporeequivalents
Composition of non-mycology classrooms are similar
ABCDE
Proportion
Classroom
0 20 40 60 80 100
Mycology classroom dominated by a few taxa
ABCDE
Proportion
Classroom
0 20 40 60 80 100
xxPuffballs dominate mycology classroom
Pisolithus, aka dog turd fungus Battarrea, tall stiltball
Lycoperdon, common puffball
Mycology classroom dominated by a few taxa
ABCDE
Proportion
Classroom
0 20 40 60 80 100
* * **
Adams et al., in review
Beta diversity of mycology classroom: distinct
communities
-1.5 -1.0 -0.5 0.0 0.5
-0.4-0.20.00.20.40.60.8
NMDS1
NMDS2
Observed
Beta diversity of mycology classroom: distinct
communities
-1.5 -1.0 -0.5 0.0 0.5
-0.4-0.20.00.20.40.60.8
NMDS1
NMDS2
Observed
Taxonomy reassigned
Beta diversity of mycology classroom: distinct
communities
-1.5 -1.0 -0.5 0.0 0.5
-0.4-0.20.00.20.40.60.8
NMDS1
NMDS2
Observed
Taxonomy reassigned
Abundance reassigned
Conclusions
• While deciphering alpha diversity is problematic:
- Inflated alpha due to sequence error & clustering
- Deflated alpha due to unevenness
beta diversity calculations are robust to these errors
in high-throughput sequencing
• Empirical test will be used to corroborate conclusions
of in silico simulations
• High-throughput sequencing will continue to be a
promising tool for microbial ecologists
Conclusions
• While deciphering alpha diversity is problematic:
- Inflated alpha due to sequence error & clustering
- Deflated alpha due to unevenness
beta diversity calculations are robust to these errors
in high-throughput sequencing
• Empirical test will be used to corroborate conclusions
of in silico simulations
• High-throughput sequencing will continue to be a
promising tool for microbial ecologists
Conclusions
• While deciphering alpha diversity is problematic:
- Inflated alpha due to sequence error & clustering
- Deflated alpha due to unevenness
beta diversity calculations are robust to these errors
in high-throughput sequencing
• Empirical test will be used to corroborate conclusions
of in silico simulations
• High-throughput sequencing will continue to be a
promising tool for microbial ecologists
References – potential biases in high-throughput
sequencing
DNA extraction: Frostegard et al Appl Environ Microbiol 1999; DeSantis et al FEMS
Microbiology 2005; Feinsten et al Appl Environ Microbiol 2009; Morgan et al PLoS ONE
2010; Delmont et al Appl Environ Microbiol 2011
PCR amplification/Relative abundance: Amend et al Mol Ecol 2010; Engelbrektson et al
ISME Journal 2010; Bellemain et al BMC Microbiol 2010; Schloss et al PLoS ONE
2011; Pinto & Raskin PLoS ONE 2012; Klindworth et al Nucleic Acids Res 2013
Sequencing error/Chimeras/OTU clustering: Huse et al Genome Biol 2007; Huse et al
Environ Microbiol 2010; Kunin et al Environ Microbiol 2010; Quince et al BMC
Bioinformatics 2010; Lee et al PLoS ONE 2012; Pinto & Raskin PLoS ONE 2012;
Bachy et al ISME Journal 2013
Sequencing platform/protocol: Morgan et al PLoS ONE 2010; Luo et al PLoS ONE 2012
Even sampling depth: Schloss et al PLoS ONE 2011; Gihring et al Environ Microbiol
2012
Denoising: Gasper & Thomas PLoS ONE 2013;
Empirical test of simulation results
100 600 1100
0.00.20.40.60.8
Number of OTUs
Normalizederror
PCR bias
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.00.51.01.52.0
PCR bias: beta distribution a=0.5, beta=1.0
Scatter around line of true abundance versus amplified abundance
Density
0 200 400 600 800 1000 1200
0200400600800100012001400
True abundance
Amplifiedabundance
OTU splitting bias
0 5 10 15 20
0.00.10.20.30.4
Split bias: binomial distribution with n=100
Number of splits
Density
p=0.001
p=0.0667
p=0.0334
p=0.0001
0.0 0.5 1.0
0.00.20.40.60.81.01.2
Split location: beta distribution with a=b=0.5
Location of split
Density

Mais conteúdo relacionado

Semelhante a Rachel Adams - SMBE Euks Meeting

Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Assessing Classification Uncertainty from the Perspective of End-Users
Assessing Classification Uncertainty from the Perspective of End-UsersAssessing Classification Uncertainty from the Perspective of End-Users
Assessing Classification Uncertainty from the Perspective of End-UsersEmma Beauxis-Aussalet
 
BIO 240 TUTORIAL Education Planning--bio240tutorial.com
BIO 240 TUTORIAL Education Planning--bio240tutorial.comBIO 240 TUTORIAL Education Planning--bio240tutorial.com
BIO 240 TUTORIAL Education Planning--bio240tutorial.comkopiko234
 
Outstanding challenges in the study of seed exchange networks in agrobiodiv...
Outstanding challenges in the study of seed exchange networks in agrobiodiv...Outstanding challenges in the study of seed exchange networks in agrobiodiv...
Outstanding challenges in the study of seed exchange networks in agrobiodiv...Marco Pautasso
 
Drs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research Update
Drs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research UpdateDrs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research Update
Drs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research UpdateJohn Blue
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligencePallavi Vashistha
 
Host community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potentialHost community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potentialILRI
 
BIO 240 Inspiring Innovation/tutorialrank.com
 BIO 240 Inspiring Innovation/tutorialrank.com BIO 240 Inspiring Innovation/tutorialrank.com
BIO 240 Inspiring Innovation/tutorialrank.comjonhson122
 
BIO 240 Enhance teaching - tutorialrank.com
BIO 240  Enhance teaching - tutorialrank.comBIO 240  Enhance teaching - tutorialrank.com
BIO 240 Enhance teaching - tutorialrank.comLeoTolstoy24
 
Genomics in Microbial Ecology by Ashish Malik
Genomics in Microbial Ecology by Ashish MalikGenomics in Microbial Ecology by Ashish Malik
Genomics in Microbial Ecology by Ashish MalikAshishMalik93
 
Online Diabetes: Inferring Community Structure in Healthcare Forums.
Online Diabetes: Inferring Community Structure in Healthcare Forums. Online Diabetes: Inferring Community Structure in Healthcare Forums.
Online Diabetes: Inferring Community Structure in Healthcare Forums. Luis Fernandez Luque
 
Pizza club - February 2017 - Federico
Pizza club - February 2017 - FedericoPizza club - February 2017 - Federico
Pizza club - February 2017 - FedericoRSG Luxembourg
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
BIO 240 Expect Success/newtonhelp.com
BIO 240 Expect Success/newtonhelp.comBIO 240 Expect Success/newtonhelp.com
BIO 240 Expect Success/newtonhelp.commyblue25
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
ISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar SlidesISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar SlidesAdina Chuang Howe
 
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiotaKazumasa Kaneko
 
EVE 161 Winter 2018 Class 9
EVE 161 Winter 2018 Class 9EVE 161 Winter 2018 Class 9
EVE 161 Winter 2018 Class 9Jonathan Eisen
 

Semelhante a Rachel Adams - SMBE Euks Meeting (20)

Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Assessing Classification Uncertainty from the Perspective of End-Users
Assessing Classification Uncertainty from the Perspective of End-UsersAssessing Classification Uncertainty from the Perspective of End-Users
Assessing Classification Uncertainty from the Perspective of End-Users
 
BIO 240 TUTORIAL Education Planning--bio240tutorial.com
BIO 240 TUTORIAL Education Planning--bio240tutorial.comBIO 240 TUTORIAL Education Planning--bio240tutorial.com
BIO 240 TUTORIAL Education Planning--bio240tutorial.com
 
Outstanding challenges in the study of seed exchange networks in agrobiodiv...
Outstanding challenges in the study of seed exchange networks in agrobiodiv...Outstanding challenges in the study of seed exchange networks in agrobiodiv...
Outstanding challenges in the study of seed exchange networks in agrobiodiv...
 
Improved N Retention Through Plant-Microbe Interactions
Improved N Retention Through Plant-Microbe InteractionsImproved N Retention Through Plant-Microbe Interactions
Improved N Retention Through Plant-Microbe Interactions
 
Drs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research Update
Drs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research UpdateDrs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research Update
Drs. Steve Hoff, Andersen, and Kerr - Foaming at the Pit: A Research Update
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
Host community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potentialHost community structure determines pathogen outbreak potential
Host community structure determines pathogen outbreak potential
 
BIO 240 Inspiring Innovation/tutorialrank.com
 BIO 240 Inspiring Innovation/tutorialrank.com BIO 240 Inspiring Innovation/tutorialrank.com
BIO 240 Inspiring Innovation/tutorialrank.com
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
BIO 240 Enhance teaching - tutorialrank.com
BIO 240  Enhance teaching - tutorialrank.comBIO 240  Enhance teaching - tutorialrank.com
BIO 240 Enhance teaching - tutorialrank.com
 
Genomics in Microbial Ecology by Ashish Malik
Genomics in Microbial Ecology by Ashish MalikGenomics in Microbial Ecology by Ashish Malik
Genomics in Microbial Ecology by Ashish Malik
 
Online Diabetes: Inferring Community Structure in Healthcare Forums.
Online Diabetes: Inferring Community Structure in Healthcare Forums. Online Diabetes: Inferring Community Structure in Healthcare Forums.
Online Diabetes: Inferring Community Structure in Healthcare Forums.
 
Pizza club - February 2017 - Federico
Pizza club - February 2017 - FedericoPizza club - February 2017 - Federico
Pizza club - February 2017 - Federico
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
BIO 240 Expect Success/newtonhelp.com
BIO 240 Expect Success/newtonhelp.comBIO 240 Expect Success/newtonhelp.com
BIO 240 Expect Success/newtonhelp.com
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
ISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar SlidesISU ENVSCI690 Graduate Seminar Slides
ISU ENVSCI690 Graduate Seminar Slides
 
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
腸内細菌叢のメタゲノム解析に関する調査 / A survey on metagenomic analysis for gut microbiota
 
EVE 161 Winter 2018 Class 9
EVE 161 Winter 2018 Class 9EVE 161 Winter 2018 Class 9
EVE 161 Winter 2018 Class 9
 

Último

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Último (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Rachel Adams - SMBE Euks Meeting

  • 1. Next-generational sequencing for microbial ecology: alpha diversity, beta diversity, and biases in high-throughput sequencing Rachel Adams Andrew Rominger Sara Branco Thomas Bruns
  • 2. Understudied but fundamental ecological habitat Implications for human health Sick building syndrome Metrics are practically absent: composition and quantitative characteristics Need comparison of “typical” buildings The microbiome of the built environment
  • 3. Understudied but fundamental ecological habitat Implications for human health Sick building syndrome Metrics are practically absent: composition and quantitative characteristics Need comparison of “typical” buildings and high replication across settings to detect patterns The microbiome of the built environment
  • 4. ? ? ? The What and Why of the indoor microbiome
  • 6. ? ? ? Architecture Ventilation Building function Environmental setting The What and Why of the indoor microbiome
  • 7. ? ? ? Architecture Ventilation Building function Environmental setting Residents The What and Why of the indoor microbiome
  • 8. Fungi in the indoor microbiome, and beyond Yeasts Filaments
  • 9. Fungi in the indoor microbiome, and beyond Yeasts Filaments Saprobes
  • 10. Fungi in the indoor microbiome, and beyond Yeasts Saprobes Symbionts Parasites Mutualists − +
  • 11. Assessing environmental fungi 1. Estimated that 5-20% of fungi grow in culture 2. Identification requires a fungal taxonomist
  • 12. Assessing environmental fungi SSU RNA (18S) (5.8S) LSU RNA (28S) ITS1 ITS2 Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi - Schoch et al. 2012
  • 13. High-throughput sequencing has greatly expanded capabilities in microbial ecology
  • 14. ACGAGTGCGT High-throughput sequencing has greatly expanded capabilities in microbial ecology
  • 15. ACGAGTGCGT High-throughput sequencing has greatly expanded capabilities in microbial ecology
  • 16. ACGAGTGCGT ACGCTCGACA AGACGCACTC AGCACTGTAG ATCAGACACG 104 – 107 sequence reads High-throughput sequencing has greatly expanded capabilities in microbial ecology
  • 18. α1 α2 α3 alpha, beta, gamma diversity
  • 21. Kunin et al. 2010 Groundtruthing high-throughput sequencing for alpha richness
  • 22. Kunin et al. 2010 αtrue < αest Groundtruthing high-throughput sequencing for alpha richness
  • 24. True samples High-throughputsequencing Observed samples α1 α2 α3 α1+ α2+ α3+ In terms of diversity, we know that α can be elevated in high-throughput sequenced communities...
  • 25. True community Observed community β12 β13 β23 β12? β13? β23? α1 α2 α3 α1+ α2+ α3+ ...but how does that change conclusions of ecological processes that are based on β diversity? High-throughputsequencing
  • 26. A key component to community ecology: Linking processes to this compositional variation Adams et al., ISME Journal, 2013 Beta diversity: the variation in species composition among sites
  • 27. Do errors that inflate alpha diversity bias conclusions on beta diversity between samples? Why would it? • Particular taxa in one environment grouping do not amplify or amplify in a way that skews relative abundance of all others* • Clustering incorrectly groups divergent taxa or splits identical taxa Hypothesis: No While richness/diversity estimations will be off for any given sample, conclusions of beta-diversity will be robust to the Question and hypotheses
  • 28. Do errors that inflate alpha diversity bias conclusions on beta diversity between samples? Why would it? • Particular taxa in one environment grouping do not amplify or amplify in a way that skews relative abundance of all others* • Clustering incorrectly groups divergent taxa or splits identical taxa Hypothesis: No While richness/diversity estimations will be off for any given sample, conclusions of beta-diversity will be robust to the Question and hypotheses
  • 29. Do errors that inflate alpha diversity bias conclusions on beta diversity between samples? Why would it? • Particular taxa in one environment grouping do not amplify or amplify in a way that skews relative abundance of all others* • Clustering incorrectly groups divergent taxa or splits identical taxa While richness/diversity estimations will be off for any given sample, conclusions of beta-diversity will be robust to the errors Question and hypotheses
  • 30. Simulation process Initial community Simulated community OTU1 OTU2 … OTUj Sample 1 Sample 2 … Sample i OTU1 OTU2 … OTUk Sample 1 Sample 2 …
  • 31. Simulation process Expected relative abundance of OTUs Initial communities
  • 32. Simulation process Biased relative abundance Variation in taxon- specific amplification Initial communities Expected relative abundance of OTUs
  • 33. Simulation process Biased relative abundance Variation in taxon- specific amplification Biased relative abundance + error Sequence error Initial communities Expected relative abundance of OTUs
  • 34. Simulation process Biased relative abundance Variation in taxon- specific amplification Biased relative abundance + error Sequence error Clustering OTUs Initial communities Biased relative abundance + error + clustering Expected relative abundance of OTUs
  • 35. Simulation process Biased relative abundance Variation in taxon- specific amplification Biased relative abundance + error Sequence error Biased relative abundance + error + clusteringClustering OTUs Simulated communities Initial communities Expected relative abundance of OTUs
  • 36. Model summary – 2 types of errors 1. Create group differences that aren’t there (Type I error) -0.5 0.0 0.5 -0.4-0.20.00.20.4 True NMDS1 NMDS2 -0.5 0.0 0.5 -0.4-0.20.00.20.4 Perceived NMDS1 NMDS2
  • 37. Model summary – 2 types of errors 2. Loose groups differences that are there (Type II error) -0.5 0.0 0.5 -0.4-0.20.00.20.4 True NMDS1 NMDS2 -0.5 0.0 0.5 -0.4-0.20.00.20.4 Perceived NMDS1 NMDS2
  • 38. Model summary output 1. Presence of bias: Statistical categorical differences Groups R2 p-value Location 0.02 0.34 Season 0.20 0.001 2. Degree of bias: percentage difference between true and simulated communities (Simulated – True) True = Normalized bias
  • 39. Model summary output 1. Presence of bias: Statistical categorical differences 2. Degree of bias: percentage difference between true and simulated communities (Simulated distance – True distance) True distance = Normalized error Morisita-Horn distance metric Groups R2 p-value Location 0.02 0.34 Season 0.20 0.001
  • 40. Categorical differences are robust to high-throughput sequencing errors in alpha diversity, regardless of the underlying patterns of beta-diversity The degree of bias is not affected by the underlying patterns of beta-diversity but dependent on community characteristics Model findings
  • 41. Model findings Categorical differences are robust to high-throughput sequencing errors in alpha diversity, regardless of the underlying patterns of beta-diversity The degree of bias is not affected by the underlying patterns of beta-diversity but dependent on community characteristics
  • 42. True Simulated True Simulated 0.00.20.40.60.81.0 pvalues No groups Two groups Model summary – Type I & II error
  • 43. True Simulated True Simulated 0.00.20.40.60.81.0 pvalues No groups Two groups Model summary – Type I & II error
  • 44. True Simulated True Simulated 0.00.20.40.60.81.0 pvalues No groups Two groups Model summary – Type I & II error Whether groups are different or the same will not be biased by inflated alpha diversity
  • 45. Model summary – Degree of bias Degree of bias will be affected by - the error rate of the platform and OTU- clustering - the gamma diversity of the environment - the precise shape of the species abundance distribution But not the relationship among samples
  • 46. Increasing probability of sequencing error and over- splitting OTUs increases bias 1e-04 0.0334 0.0667 0.1 0.00.10.20.30.40.50.6 No groups Normalizederror 1e-04 0.0334 0.0667 0.1 Two groups Probability of splitting
  • 47. Increasing OTU richness decreases bias 100 600 1100 0.00.20.40.60.8 Number of OTUs Normalizederror
  • 48. Shape of species abundance distribution (SAD) affects bias 0 200 400 600 800 1000 1200 010002000300040005000 Rank Abundance
  • 49. Shape of species abundance distribution (SAD) affects bias 1.5 2.5 3.5 0.00.20.40.60.8 Increasing SAD variance Normalizederror
  • 50. As true community distance increases, degree of error decreases 0.65 0.70 0.75 0.80 0.20.30.40.50.6 True distance Normalizederror
  • 51. Clustering is the main error-producing step True Amplified Split 0.00.10.20.30.40.5 R^2values Two groups
  • 52. Simulation overview Categorical analysis very robust to errors in high- throughput biases Degree of bias will be affected by error rate of the sequencing platform and OTU-clustering, the gamma diversity of the environment, the precise shape of the species abundance distribution High-throughput error leads to an over-estimation of the difference between groups Mean bias is ~20-40% Incorrect OTU clustering is most of that
  • 53. Steps 1. In silico: Add further complexity to simulations 2. In vitro: Empirically test artificially-created microbial communities
  • 54. Do errors that inflate alpha diversity bias conclusions on beta diversity between samples? Why would it? • Particular taxa in one environment grouping do not amplify or amplify in a way that skews relative abundance of all others* • Clustering incorrectly groups divergent taxa or splits identical taxa Hypothesis: No While richness/diversity estimations will be off for any given sample, conclusions of beta-diversity will be robust to the errors Question and hypotheses
  • 55. Air samples in a mycology classroom: a unique source distorts perceived species richness
  • 56. Air samples in a mycology classroom: a unique source distorts perceived species richness
  • 57. Mycology classroom appears to be less rich than other classrooms… 0 2000 4000 6000 8000 02004006008001000 B A C D E Individuals ChaoEstimatedRichness
  • 58. … but has higher biomass A B C D E 050100150200 Classroom Penicilliumsporeequivalents
  • 59. Composition of non-mycology classrooms are similar ABCDE Proportion Classroom 0 20 40 60 80 100
  • 60. Mycology classroom dominated by a few taxa ABCDE Proportion Classroom 0 20 40 60 80 100
  • 61. xxPuffballs dominate mycology classroom Pisolithus, aka dog turd fungus Battarrea, tall stiltball Lycoperdon, common puffball
  • 62. Mycology classroom dominated by a few taxa ABCDE Proportion Classroom 0 20 40 60 80 100 * * ** Adams et al., in review
  • 63. Beta diversity of mycology classroom: distinct communities -1.5 -1.0 -0.5 0.0 0.5 -0.4-0.20.00.20.40.60.8 NMDS1 NMDS2 Observed
  • 64. Beta diversity of mycology classroom: distinct communities -1.5 -1.0 -0.5 0.0 0.5 -0.4-0.20.00.20.40.60.8 NMDS1 NMDS2 Observed Taxonomy reassigned
  • 65. Beta diversity of mycology classroom: distinct communities -1.5 -1.0 -0.5 0.0 0.5 -0.4-0.20.00.20.40.60.8 NMDS1 NMDS2 Observed Taxonomy reassigned Abundance reassigned
  • 66. Conclusions • While deciphering alpha diversity is problematic: - Inflated alpha due to sequence error & clustering - Deflated alpha due to unevenness beta diversity calculations are robust to these errors in high-throughput sequencing • Empirical test will be used to corroborate conclusions of in silico simulations • High-throughput sequencing will continue to be a promising tool for microbial ecologists
  • 67. Conclusions • While deciphering alpha diversity is problematic: - Inflated alpha due to sequence error & clustering - Deflated alpha due to unevenness beta diversity calculations are robust to these errors in high-throughput sequencing • Empirical test will be used to corroborate conclusions of in silico simulations • High-throughput sequencing will continue to be a promising tool for microbial ecologists
  • 68. Conclusions • While deciphering alpha diversity is problematic: - Inflated alpha due to sequence error & clustering - Deflated alpha due to unevenness beta diversity calculations are robust to these errors in high-throughput sequencing • Empirical test will be used to corroborate conclusions of in silico simulations • High-throughput sequencing will continue to be a promising tool for microbial ecologists
  • 69.
  • 70. References – potential biases in high-throughput sequencing DNA extraction: Frostegard et al Appl Environ Microbiol 1999; DeSantis et al FEMS Microbiology 2005; Feinsten et al Appl Environ Microbiol 2009; Morgan et al PLoS ONE 2010; Delmont et al Appl Environ Microbiol 2011 PCR amplification/Relative abundance: Amend et al Mol Ecol 2010; Engelbrektson et al ISME Journal 2010; Bellemain et al BMC Microbiol 2010; Schloss et al PLoS ONE 2011; Pinto & Raskin PLoS ONE 2012; Klindworth et al Nucleic Acids Res 2013 Sequencing error/Chimeras/OTU clustering: Huse et al Genome Biol 2007; Huse et al Environ Microbiol 2010; Kunin et al Environ Microbiol 2010; Quince et al BMC Bioinformatics 2010; Lee et al PLoS ONE 2012; Pinto & Raskin PLoS ONE 2012; Bachy et al ISME Journal 2013 Sequencing platform/protocol: Morgan et al PLoS ONE 2010; Luo et al PLoS ONE 2012 Even sampling depth: Schloss et al PLoS ONE 2011; Gihring et al Environ Microbiol 2012 Denoising: Gasper & Thomas PLoS ONE 2013;
  • 71. Empirical test of simulation results 100 600 1100 0.00.20.40.60.8 Number of OTUs Normalizederror
  • 72. PCR bias -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.00.51.01.52.0 PCR bias: beta distribution a=0.5, beta=1.0 Scatter around line of true abundance versus amplified abundance Density 0 200 400 600 800 1000 1200 0200400600800100012001400 True abundance Amplifiedabundance
  • 73. OTU splitting bias 0 5 10 15 20 0.00.10.20.30.4 Split bias: binomial distribution with n=100 Number of splits Density p=0.001 p=0.0667 p=0.0334 p=0.0001 0.0 0.5 1.0 0.00.20.40.60.81.01.2 Split location: beta distribution with a=b=0.5 Location of split Density