SlideShare a Scribd company logo
1 of 1
Inferring microbial community function from taxonomic composition
                                   Morgan G.I. Langille1,*, Jesse R.R. Zaneveld2, J Gregory Caporaso3, Joshua Reyes4,
                                  Dan Knights5, Daniel McDonald6, Rob Knight5, Robert G. Beiko1, Curtis Huttenhower4
 1Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada; 2Dept. of Microbiology, Oregon State University, Corvallis, OR, USA; 3Dept. of Computer Science, Northern Arizona University, Flagstaff, AZ, USA;4Dept. of
Biostatistics, Harvard School of Public Health, Boston, MA, USA; 5Dept. Computer Science, University of Colorado, Boulder, CO, USA; 6Biofrontiers Institute, University of Colorado, Boulder, CO, USA; *morgangilangille@gmail.com

  Abstract
  It is often most efficient to characterize microbial communities using taxonomic markers such as                      3. Genome Validation
  the 16S ribosomal small subunit rRNA gene. The 16S gene is typically used to describe the
  organisms or taxonomic units present in a sample, but data from such markers do not inherently                        3.1 Method
  reveal the molecular functions or ecological roles of members of a microbial community. We have                       1)    Remove a single genome from our reference dataset (pretending it has not been sequenced)
  developed and validated a novel computational method that takes a set of observed taxonomic                           2)    Use PI-CRUST to predict the functional abundances for our “unknown” genome using only its 16S gene
  abundances and infers abundance profiles of enzymes and pathways from multiple functional                             3)    Compare PI-CRUST predictions vs. the known functional abundances of our genome
  classification schemes (KEGG, PFAM, COG, etc.). We use ancestral state reconstruction to                              4)    Repeat for all completed genomes (>2000)
  determine approximate genomic content, taking into account 16S copy number and known                                  5)    Plot the distribution of accuracy values for each genome (3.2) or each functional group (3.3)
  functional abundance profiles from all currently available microbial genomes. We have evaluated
  the accuracy of this inference for different groups of taxa and for different areas of biological
  function. Our method, implemented as the PI-CRUST software (Phylogenetic Investigation of                             3.2 PI-CRUST accuracy for completed genomes
  Communities by Reconstruction of Unobserved STates), allows 16S metagenomic based studies to
  be extended to predict the functional abilities of microbiomes as well as to compare expected                             Using Various Ancestral State Reconstruction               Distance to nearest genome affects accuracy
  versus observed functions in shotgun based metagenomic experiments.


       1. PI-CRUST Software Pipeline
   1.1 Starting Data Sources (Internally used by PI-CRUST)
   •    Entire GreenGenes 16S reference tree.
   •    A functional “Trait Table” for all completed genomes (e.g. KEGG, PFAM, etc.). This contains
        abundances of each functional category for each genome in the IMG database.                                                                                                                         Endosymbionts&
   •    16S copy number information for each completed genome in IMG (used to normalize OTU tables)                                                                                                         Reduced Genomes
   •    GreenGenes identifier to IMG completed genomes map (to link information we have about
        completed genomes to tips in our reference tree).



   1.2 PI-CRUST: Genome Functional Predictions                                                                                                                                               16S phylogenetic distance to nearest species

         16S Copy                   Genome                                      Known functional composition                  “Random”: Functional abundances are chosen randomly from each of its distributions in all genomes.
          Number
        (completed       &      Functional Table
                                  (completed
                                                                                  (from sequenced genome)
                                                                                       Inferred ancestral
                                                                                                                              “Nearest Neighbour”: Functional profile from genome with closest 16S distance is used.
                                                                                                                              “PIC”: Ancestral state reconstruction using least squares regression (APE R package).
       genomes only)             genomes only)                                      functional composition                    “WAGNER”: Ancestral state reconstruction using Wagner parsimony (Count package).
                                                                               Predicted functional composition
                                                                                  (for unsequenced genome)
                                      Reference 16S Tree
                                         (greengenes)
                                                                                                                        3.3 PI-CRUST accuracy for various functional groups
                                                                          16S Copy             Functional
                                                                          Number                  Trait
                                                                         Predictions           Predictions
              Prune taxa with
                no genome
                information


                                                                             Predict
                                    Infer ancestral
                                                                           functional
                                    genome traits
                                                                          compositions



   1.3 User Input
   •    “OTU table”, Number of OTUs (with greengenes identifiers) per sample



   1.4 PI-CRUST: Metagenome Functional Predictions
                                                       16S Copy
                                                                                    Normalized
               OTU Table                               Number
                                                                                    OTU Table
                                                      Predictions
                                                                                                                                                                                 PI-CRUST Accuracy (for each SEED function)

                                                      Functional                       Metagenome                           The ability to predict functions from 16S varies depending on the functional class. Functions that are well
              Normalized                                                                Functional                          conserved and evolve similarly to 16S have higher accuracy, such as “RNA metabolism” and “Cell Division
                                                         Trait
              OTU Table                                                                 Predictions                         and Cell Cycle”. Other groups that tend not to be inherited by vertical descent such as “Phages, Prophages,
                                                      Predictions
                                                                                                                            Transposable Elements, Plasmids” are not predicted as accurately.


       2 Metagenome Validation                                                                                         4 Concluding Remarks
       2.1 Method
       1) Obtain microbiome samples with both whole metagenomic and 16S sequencing
                                                                                                                        4.1 Discussion
       2) Use PI-CRUST with 16S data to predict functions for samples                                                   •     Genome content has been shown in the past to vary widely even in closely related species. However,
       3) Compare PI-CRUST predictions with functions observed from sequencing                                                this may not be typical for the majority of bacterial and archaeal species. Our ability to predict the
                                                                                                                              functions encoded in an organism based solely by its 16S gene and knowledge from the thousands
                                                                                                                              of completed genomes suggests that gene content often has good phylogenetic correlation with 16S.
       2.2 PI-CRUST accuracy on HMP samples                                                                             •     PI-CRUST allows 16S-only studies to be expanded to include information about functional
                                                                                                                              abundances.
                                                                                                                        •     Studies with full metagenomic sequencing can use PI-CRUST to identify functions that are observed
                                                                                                                              but not expected based on their 16S profiles (i.e the taxa that are present in the sample).


                                                                                                                        4.2 Availability & Future Plans
                                                                                                                        • PI-CRUST is still under development but will be freely available under the GPL at:
                                                                                                                        http://picrust.sourceforge.net
                                                                                                                        • Various methods of ancestral state reconstruction and confidence weighting are still being evaluated.
                                                                                                                        • Evaluation of PI-CRUST on other paired metagenomic and 16S datasets is underway.



                                                                                                                         Acknowledgements
                                PI-CRUST predicted abundance based on 16S data                                          •     MGIL is the recipient of an IHMC travel award funded by the NIH.
       Each point represents the predicted vs. observed relative abundance for a single KEGG category                   •     MGIL and RGB are supported by a CIHR emerging team grant.

More Related Content

What's hot

Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencingSean Davis
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
Pb Stem Cell Engineering
Pb Stem Cell EngineeringPb Stem Cell Engineering
Pb Stem Cell EngineeringJack Crawford
 
German Russian Workshop 2011 - geneXplain
German Russian Workshop  2011 - geneXplainGerman Russian Workshop  2011 - geneXplain
German Russian Workshop 2011 - geneXplaingeneXplain GmbH
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programsMugdhaSharma11
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 

What's hot (12)

Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Pb Stem Cell Engineering
Pb Stem Cell EngineeringPb Stem Cell Engineering
Pb Stem Cell Engineering
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
German Russian Workshop 2011 - geneXplain
German Russian Workshop  2011 - geneXplainGerman Russian Workshop  2011 - geneXplain
German Russian Workshop 2011 - geneXplain
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Thesis def
Thesis defThesis def
Thesis def
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
 
artificial neural network-gene prediction
artificial neural network-gene predictionartificial neural network-gene prediction
artificial neural network-gene prediction
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 

Viewers also liked

Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netMorgan Langille
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community ProfilingMorgan Langille
 
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February..."The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...Jonathan Eisen
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Morgan Langille
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataMorgan Langille
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionMorgan Langille
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Microbial Phylogenomics (EVE161) Class 1
Microbial Phylogenomics (EVE161) Class 1Microbial Phylogenomics (EVE161) Class 1
Microbial Phylogenomics (EVE161) Class 1Jonathan Eisen
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Morgan Langille
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptRakesh Kumar
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
DNA extraction presentation
DNA extraction presentationDNA extraction presentation
DNA extraction presentationnortje
 
Dna extraction
Dna extractionDna extraction
Dna extractionGeet_singh
 

Viewers also liked (17)

MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 
Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.net
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community Profiling
 
Diagrams
DiagramsDiagrams
Diagrams
 
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February..."The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific Data
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown Function
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Microbial Phylogenomics (EVE161) Class 1
Microbial Phylogenomics (EVE161) Class 1Microbial Phylogenomics (EVE161) Class 1
Microbial Phylogenomics (EVE161) Class 1
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...
 
Bacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.pptBacterial Identification by 16s rRNA Sequencing.ppt
Bacterial Identification by 16s rRNA Sequencing.ppt
 
Lactobacillus
LactobacillusLactobacillus
Lactobacillus
 
identification of bacteria
identification of bacteriaidentification of bacteria
identification of bacteria
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
DNA extraction presentation
DNA extraction presentationDNA extraction presentation
DNA extraction presentation
 
Dna extraction
Dna extractionDna extraction
Dna extraction
 

Similar to Inferring microbial community function from taxonomic composition

Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and toolsKAUSHAL SAHU
 
genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptMohamedHasan816582
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataAlireza Doustmohammadi
 
Human SNPs in microRNA Target Sites
Human SNPs in microRNA Target SitesHuman SNPs in microRNA Target Sites
Human SNPs in microRNA Target Sitesshenbaba
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
Unison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryUnison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryReece Hart
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisMonica Munoz-Torres
 
Journal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkETJournal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkETHiroya Morimoto
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16Jonathan Eisen
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformKlaas Vandepoele
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationMohamedHasan816582
 

Similar to Inferring microbial community function from taxonomic composition (20)

Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 
genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
Human SNPs in microRNA Target Sites
Human SNPs in microRNA Target SitesHuman SNPs in microRNA Target Sites
Human SNPs in microRNA Target Sites
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Unison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryUnison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology Discovery
 
An26247254
An26247254An26247254
An26247254
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 
Journal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkETJournal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkET
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
Bioinformatica t8-go-hmm
Bioinformatica t8-go-hmmBioinformatica t8-go-hmm
Bioinformatica t8-go-hmm
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Inferring microbial community function from taxonomic composition

  • 1. Inferring microbial community function from taxonomic composition Morgan G.I. Langille1,*, Jesse R.R. Zaneveld2, J Gregory Caporaso3, Joshua Reyes4, Dan Knights5, Daniel McDonald6, Rob Knight5, Robert G. Beiko1, Curtis Huttenhower4 1Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada; 2Dept. of Microbiology, Oregon State University, Corvallis, OR, USA; 3Dept. of Computer Science, Northern Arizona University, Flagstaff, AZ, USA;4Dept. of Biostatistics, Harvard School of Public Health, Boston, MA, USA; 5Dept. Computer Science, University of Colorado, Boulder, CO, USA; 6Biofrontiers Institute, University of Colorado, Boulder, CO, USA; *morgangilangille@gmail.com Abstract It is often most efficient to characterize microbial communities using taxonomic markers such as 3. Genome Validation the 16S ribosomal small subunit rRNA gene. The 16S gene is typically used to describe the organisms or taxonomic units present in a sample, but data from such markers do not inherently 3.1 Method reveal the molecular functions or ecological roles of members of a microbial community. We have 1) Remove a single genome from our reference dataset (pretending it has not been sequenced) developed and validated a novel computational method that takes a set of observed taxonomic 2) Use PI-CRUST to predict the functional abundances for our “unknown” genome using only its 16S gene abundances and infers abundance profiles of enzymes and pathways from multiple functional 3) Compare PI-CRUST predictions vs. the known functional abundances of our genome classification schemes (KEGG, PFAM, COG, etc.). We use ancestral state reconstruction to 4) Repeat for all completed genomes (>2000) determine approximate genomic content, taking into account 16S copy number and known 5) Plot the distribution of accuracy values for each genome (3.2) or each functional group (3.3) functional abundance profiles from all currently available microbial genomes. We have evaluated the accuracy of this inference for different groups of taxa and for different areas of biological function. Our method, implemented as the PI-CRUST software (Phylogenetic Investigation of 3.2 PI-CRUST accuracy for completed genomes Communities by Reconstruction of Unobserved STates), allows 16S metagenomic based studies to be extended to predict the functional abilities of microbiomes as well as to compare expected Using Various Ancestral State Reconstruction Distance to nearest genome affects accuracy versus observed functions in shotgun based metagenomic experiments. 1. PI-CRUST Software Pipeline 1.1 Starting Data Sources (Internally used by PI-CRUST) • Entire GreenGenes 16S reference tree. • A functional “Trait Table” for all completed genomes (e.g. KEGG, PFAM, etc.). This contains abundances of each functional category for each genome in the IMG database. Endosymbionts& • 16S copy number information for each completed genome in IMG (used to normalize OTU tables) Reduced Genomes • GreenGenes identifier to IMG completed genomes map (to link information we have about completed genomes to tips in our reference tree). 1.2 PI-CRUST: Genome Functional Predictions 16S phylogenetic distance to nearest species 16S Copy Genome Known functional composition “Random”: Functional abundances are chosen randomly from each of its distributions in all genomes. Number (completed & Functional Table (completed (from sequenced genome) Inferred ancestral “Nearest Neighbour”: Functional profile from genome with closest 16S distance is used. “PIC”: Ancestral state reconstruction using least squares regression (APE R package). genomes only) genomes only) functional composition “WAGNER”: Ancestral state reconstruction using Wagner parsimony (Count package). Predicted functional composition (for unsequenced genome) Reference 16S Tree (greengenes) 3.3 PI-CRUST accuracy for various functional groups 16S Copy Functional Number Trait Predictions Predictions Prune taxa with no genome information Predict Infer ancestral functional genome traits compositions 1.3 User Input • “OTU table”, Number of OTUs (with greengenes identifiers) per sample 1.4 PI-CRUST: Metagenome Functional Predictions 16S Copy Normalized OTU Table Number OTU Table Predictions PI-CRUST Accuracy (for each SEED function) Functional Metagenome The ability to predict functions from 16S varies depending on the functional class. Functions that are well Normalized Functional conserved and evolve similarly to 16S have higher accuracy, such as “RNA metabolism” and “Cell Division Trait OTU Table Predictions and Cell Cycle”. Other groups that tend not to be inherited by vertical descent such as “Phages, Prophages, Predictions Transposable Elements, Plasmids” are not predicted as accurately. 2 Metagenome Validation 4 Concluding Remarks 2.1 Method 1) Obtain microbiome samples with both whole metagenomic and 16S sequencing 4.1 Discussion 2) Use PI-CRUST with 16S data to predict functions for samples • Genome content has been shown in the past to vary widely even in closely related species. However, 3) Compare PI-CRUST predictions with functions observed from sequencing this may not be typical for the majority of bacterial and archaeal species. Our ability to predict the functions encoded in an organism based solely by its 16S gene and knowledge from the thousands of completed genomes suggests that gene content often has good phylogenetic correlation with 16S. 2.2 PI-CRUST accuracy on HMP samples • PI-CRUST allows 16S-only studies to be expanded to include information about functional abundances. • Studies with full metagenomic sequencing can use PI-CRUST to identify functions that are observed but not expected based on their 16S profiles (i.e the taxa that are present in the sample). 4.2 Availability & Future Plans • PI-CRUST is still under development but will be freely available under the GPL at: http://picrust.sourceforge.net • Various methods of ancestral state reconstruction and confidence weighting are still being evaluated. • Evaluation of PI-CRUST on other paired metagenomic and 16S datasets is underway. Acknowledgements PI-CRUST predicted abundance based on 16S data • MGIL is the recipient of an IHMC travel award funded by the NIH. Each point represents the predicted vs. observed relative abundance for a single KEGG category • MGIL and RGB are supported by a CIHR emerging team grant.