Z Score,T Score, Percential Rank and Box Plot Graph
Towards better tools for fungal environmental metagenomics
1. Towards better tools for fungal environmental
metagenomics
Jason Stajich
Plant Pathology & Microbiology http://lab.stajich.org
http://fungalgenomes.org
http://fungidb.org
twitter: hyphaltip, stajichlab, fungalgenomes
2. Acknowledgements
Peng
Liu Sapphire
Ear Univ
of
Colorado,
Boulder
Brad
Cavinder Erum
Khan
Rob
Knight IIGB Computational Core
Sofia
Robb Lorena
Rivera Daniel
McDonald
Jinfeng
Chen Carlos
Rojas
Anastasia
Gio@ Megna
Tiwari Noah
Fierer
Jessica
De
Anda Sco0
Bates
Steven
Ahrendt Annie
Nguyen
Jon
Leff
Divya
Sain
Ramy
Wissa
Yizhou
Wang Marine
Biological
Laboratory
Yi
Zhou Mitch
Sogin
Sue
Huse
Raghu
Ramamurthy
Edward
Liaw Argonne
Na@onal
Lab
Greg
Gu Folker
Meyer
Daniel
Borcherding
Henrik
Nilsson
Keith
Seifert
3. Molecular Ecology of microbes
• What microbes live where?
• Using molecular techniques improve upon culture based methods reducing bias in just fast-
growing and or culturable organisms.
• Many efforts to examine Bacteria and Archaeal diversity with sequencing developed
important standards - e.g. Human Microbiome Project.
• Efforts towards improving methods of studying of fungi in the environment
4. Plantae
Amoebozoa
Choanozoa
Metazoa
Microsporidia Fungi
Rozella
Chytridiomycota
Blastocladiomycota
Multicellular with
Mucoromycotina
differentiated tissues
Entomophthoromycotina
Zoopagomycotina
Loss of flagellum
Kickxellomycotina
Glomeromycota
Mitotic sporangia Pucciniomycotina Basidiomycota
to mitotic conidia Ustilaginomycotina
Regular septa Agaricomycotina
Taphrinomycotina Ascomycota
Meiotic sporangia to Saccharomycotina
external meiospores
Pezizomycotina
1500 1000 500 0
Millions of years
Stajich et al. Current Biol 2009
5. Fungi interact with many organisms
10.3389/fpls.2011.00100
Betsy Arnold doi: 10.3389/fpls.2011.00100
Endophytes
Mycorrhiza doi: 10.1016/j.pbi.2009.05.007,
F. Martin
7. How many species of Fungi are there?
Mycol. Res. 9S (6): 641--655 (1991) Printed in Great Britain 641
1.5 Million based on
fungus to plant ratio of 6:1
Presidential address 1990
The fungal dimension of biodiversity: magnitude, significance,
and conservation
D. L. HAWKSWORTH
International Mycological Institute, Kew, Surrey TW9 3AF, UK
American Journal of Botany 98(3): 426–438. 2011.
Don’t forget the endophytes...
Fungi, members of the kingdoms Chromista, Fungi S.str. and Protozoa studied by mycologists, have received scant consideration in
discussions on biodiversity. The number of known species is about 69000, but that in the world is conservatively estimated at
1'5 million; six-times higher than hitherto suggested. The new world estimate is primarily based on vascular plant:fungus ratios in
THE FUNGI: 1, 2, 3 … 5.1 MILLION SPECIES?1
and the soil...
different regions. It is considered conservative as: (1) it is based on the lower estimates of world vascular plants; (2) no separate Meredith Blackwell2
provision is made for the vast numbers of insects now suggested to exist; (3) ratios are based on areas still not fully known
mycologically; and (4) no allowance is made for higher ratios in tropical and polar regions. Evidence that numerous new species Department of Biological Sciences; Louisiana State University; Baton Rouge, Louisiana 70803 USA
remain to be found is presented. This realization has major implications for systematic manpower, resources, and classification. Fungi
have and continue to playa vital role in the evolution of terrestrial life (especially through mutualisms), ecosystem functionPremise of the study: Fungi are major decomposers in certain ecosystems and essential associates of many organisms. They
• and the
provide enzymes and drugs and serve as experimental organisms. In 1991, a landmark paper estimated that there are 1.5 million
DOI:10.3732/ajb.1000298
maintenance of biodiversity, human progress, and the operation of Gaia. Conservation in situ and ex situ are complementary, andon the Earth. Because only 70 000 fungi had been described at that time, the estimate has been the impetus to search for
fungi the
significance of culture collections is stressed. International collaboration is required to develop a world inventory, quantify functional unknown fungi. Fungal habitats include soil, water, and organisms that may harbor large numbers of understudied
previously
roles, and for effective conservation. fungi, estimated to outnumber plants by at least 6 to 1. More recent estimates based on high-throughput sequencing methods
Upwards of 6M species - Lee Taylor (pers
suggest that as many as 5.1 million fungal species exist.
• Methods: Technological advances make it possible to apply molecular methods to develop a stable classification and to dis-
cover and identify fungal taxa.
'Biodiversity', the extent of biological variation on Earth, has species, or populations. Knowledge of all of these is pertinent
• Key results: Molecular methods have dramatically increased our knowledge of Fungi in less than 20 years, revealing a mono-
comm)
come to the fore as a key issue in science and politics for the to a thorough appreciation of the fungal dimension, butkingdom and increased diversity among early-diverging lineages. Mycologists are making significant advances in
phyletic here
“Thus, the Fungi is likely equaled only by the Insecta with respect to eukaryote
1990s. First used as 'BioDiversity' in the title of a scientific
meeting in Washington, D.C. in 1986 (Wilson, 1988: p. v), it at other levels.
species discovery, but many fungi remain to be discovered.
I will centre on species biodiversity; that is basal to discussions
• Conclusions: Fungi are essential to the survival of many groups of organisms with which they form associations. They also
attract attention as predators of invertebrate animals, pathogens of potatoes and rice and humans and bats, killers of frogs and
has been rapidly adopted as a contraction of 'biotic diversity' crayfish, producers of secondary metabolites to lower cholesterol, and subjects of prize-winning research. Molecular tools in
use and under development can be used to discover the world’s unknown fungi in less than 1000 years predicted at current new
8. Microbial Ecology is not just outside
• Most humans spend majority of lives indoors
• What are the organisms that live in the built environment?
• Are there beneficial organisms that influence overlal
composition of communities?
• How does the composition change when environmental
conditions change (moisture, temperature, food sources)
10. Microbial Ecology in simple
terms
• Collecting what’s there (sampling and PCR
amplifying) [LAB]
• Put labels on things by matching to knowns (BLAST
or other approach to see what matches in a
database) [COMPUTER]
• See what is different (compare communities)
[COMPUTER]
http://xkcd.com/1133/
11. Sampling and amplifying
• Total DNA extracted from a sample - soil,
plant tissue, swab
• PCR with primers designed to amplify a
conserved locus
• Sequencing with Sanger sequencing ->
Next Generation Sequencing
12. Metagenomics - Amplicon
• Amplify a targeted locus for sequencing.
• Works best if there are universal primers which can
amplify from all the species of interest
• For Bacteria most successful locus has been Ribosomal
Small Subunit gene (16S)
• Primers that work well to amplify most groups of
Bacteria and Archea
• Other loci are useful markers for sometimes better
species resolution (phylogenetics) or community
functional diversity by targeting a protein coding gene
15. Fungal Markers for molecular ecology
• Needs to be universally amplifying across all groups
• Ribosomal rRNA (
• Small Subunit and Large Subunit genes
• Internal Transcribed Spacer 1 and 2
• Protein coding genes
• EF1alpha, RPB1, RPB2 (Fungal Tree of Life project)
16. White, Bruns, Lee, Taylor
http://www.biology.duke.edu/fungi/
mycolab/primers.htm
17. White, Bruns, Lee, Taylor
http://www.biology.duke.edu/fungi/
mycolab/primers.htm
19. There’s a data storm
coming
320k curated Roche-454
1M sequences per run
sequences
Illumina HiSeq
2-3 Billion sequences per run
(10-14 days)
Illumina MiSeq
3-5 M reads (1 day)
IonTorrent
4-8 M reads (2hrs)
20. Fungal-specific Challenges
• Alignment of ITS
• Establishment of a reference tree
• Unalignable sequence into tree with LSU
• Naming and Curation of datasets
22. ITS is most useful as a barcode sequence
Nuclear ribosomal internal transcribed spacer (ITS)
region as a universal DNA barcode marker for Fungi
Conrad L. Schocha,1, Keith A. Seifertb,1, Sabine Huhndorfc, Vincent Robertd, John L. Spougea, C. André Levesqueb,
Wen Chenb, and Fungal Barcoding Consortiuma,2
a
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892; bBiodiversity (Mycology
and Microbiology), Agriculture and Agri-Food Canada, Ottawa, ON, Canada K1A 0C6; cDepartment of Botany, The Field Museum, Chicago, IL 60605; and
d
Centraalbureau voor Schimmelcultures Fungal Biodiversity Centre (CBS-KNAW), 3508 AD, Utrecht, The Netherlands
Edited* by Daniel H. Janzen, University of Pennsylvania, Philadelphia, PA, and approved February 24, 2012 (received for review October 18, 2011)
Six DNA regions were evaluated as potential DNA barcodes for the intron of the trnK gene. This system sets a precedent for
Fungi, the second largest kingdom of eukaryotic life, by a multina- reconsidering CO1 as the default fungal barcode.
tional, multilaboratory consortium. The region of the mitochondrial CO1 functions reasonably well as a barcode in some fungal
cytochrome c oxidase subunit 1 used as the animal barcode was genera, such as Penicillium, with reliable primers and adequate
excluded as a potential marker, because it is difficult to amplify in species resolution (67% in this young lineage) (9); however,
fungi, often includes large introns, and can be insufficiently vari- results in the few other groups examined experimentally are in-
able. Three subunits from the nuclear ribosomal RNA cistron were consistent, and cloning is often required (10). The degenerate
compared together with regions of three representative protein- primers applicable to many Ascomycota (11) are difficult to as-
coding genes (largest subunit of RNA polymerase II, second largest sess, because amplification failures may not reflect priming
subunit of RNA polymerase II, and minichromosome maintenance mismatches. Extreme length variation occurs because of multiple
protein). Although the protein-coding gene regions often had introns (9, 12–14), which are not consistently present in a species.
MICROBIOLOGY
a higher percent of correct identification compared with ribosomal Multiple copies of different lengths and variable sequences oc-
markers, low PCR amplification and sequencing success eliminated cur, with identical sequences sometimes shared by several species
them as candidates for a universal fungal barcode. Among the (11). Some fungal clades, such as Neocallimastigomycota (an
regions of the ribosomal cistron, the internal transcribed spacer early diverging lineage of obligately anaerobic, zoosporic gut
(ITS) region has the highest probability of successful identification fungi), lack mitochondria (15). Finally, because most fungi are
for the broadest range of fungi, with the most clearly defined bar- microscopic and inconspicuous and many are unculturable, ro-
code gap between inter- and intraspecific variation. The nuclear bust, universal primers must be available to detect a truly rep-
ribosomal large subunit, a popular phylogenetic marker in certain
resentative profile. This availability seems impossible with CO1.
groups, had superior species resolution in some taxonomic groups,
The nuclear rRNA cistron has been used for fungal dia-
such as the early diverging lineages and the ascomycete yeasts, but
gnostics and phylogenetics for more than 20 y (16), and its
was otherwise slightly inferior to the ITS. The nuclear ribosomal
components are most frequently discussed as alternatives to CO1
small subunit has poor species-level resolution in fungi. ITS will be
(13, 17). The eukaryotic rRNA cistron consists of the 18S, 5.8S,
formally proposed for adoption as the primary fungal barcode
and 28S rRNA genes transcribed as a unit by RNA polymerase I.
marker to the Consortium for the Barcode of Life, with the possibil-
Posttranscriptional processes split the cistron, removing two in-
ity that supplementary barcodes may be developed for particular
narrowly circumscribed taxonomic groups.
ternal transcribed spacers. These two spacers, including the 5.8S
23. Solutions
• ITS is hard to align across diverse taxa, but LSU is not.
• Marker with both sequences would be useful for both phylogenetic placement and
barcoding.
5.8S LSU
• ITS + LSU amplicon proposed - primer testing with Illumina is under testing - a bit too large
by current chemistry but could work in the near future
24. Putting a name on it
• Most sequences will not have identified names
• Grouping all observed sequences together to define OTU clusters even if no name can be
assigned
• Curated ITS databases - UNITE project
• ~300,000 sequences in UNITE, ~200,000 which are full length (SSU + ITS + LSU)
• 50% are identified to a species level (18,000 distinct latin binomials)
26. Soil Clone Group 1 - highly abundant, uncultured organism
Porter et al. 2008
27. Soil Clone Group 1 - highly abundant, uncultured organism
Porter et al. 2008
28. What’s in a name? Would a mold by any other name smell
as sweet?
• “One fungus, one name” is eliminating dual
nomeclature (naming of sexual and asexual forms
separately)
• How to name species from molecular data alone? PERSPEC
Uncultured fungus clone unisequences#37-3808_2763 ITS2, PS
• Name by close relatives on the tree?
Uncultured fungus clone MOTU_2635_GVUGVSB04J56R4 18S rRNA gene, PS, ITS
Uncultured fungus clone MOTU_3006_GVUGV5B04JIHT 18S rRNA gene
Uncultured fungus clone MOTU_1888_GVUGV5B04JJTLJ 18S rRNA gene
Uncultured fungus clone MOTU_2993_GOKCVYYY06HH12J 18S rRNA gene, PS, ITS
Uncultured fungus clone MOTU_2930_GOKCVYYY06G7201 18S rRNA
Fibulobasidium murrhardtense strain CB59109 18S rRNA gene
Uncultured fungus clone MOTU_141_GOKCVYYY06G5FYL 18S rRNA gene, PS, ITS
Uncultured Tremellales clone LTSP_EUKA_P4L03 18S rRNA gene, PS, ITS
• Use marker loci that contain both ITS and LSU Uncultured fungus clone unisequence#65-3936_0554 ITS2, PS
Uncultured fungus clone MOTU_601_GOK
Uncultured basidiomycete ITS
to better place sequence in tree. Fungi 3 leaves
Uncultured fungus clone unise
Uncultured Tremellales clone LTSP_EUKA
Trichosporonales sp. LM559 18S rRNA gene
Uncultured fungus clone unisequences #65-3574_00447, ITS2, PS
Uncultured fungus clone MOTU_4349_GOKCVYYY06GR7WA 18S rRNA gene, PS, ITS2
Uncultured fungus clone unisequences#69-3466_2373 ITS2, PS
• Proposal to name species in Botanical code Uncultured fungus clone MOTU_43
Uncultured fungus clone F66N0BQ02H1NX5 18S rRNA
Uncultured fungus clone LT5P_EUKA_P5H04 18S rRNA gene, 18S–25/28S rRNA gene
directly from sequence Uncultured fungus clone MOTU_1778_GVUGB5B04IF01X 18S rRNA gene, PS
Uncultured fungus clone MOTU_4043_GVUGB5B04JK5N2 18S rRNA gene, PS, ITS2
Uncultured fungus clone MOTU_2412
Uncultured Agaricomycotina clone 6_g19 18S rRNA gene
Uncultured fungus clone MOTU_3797_GOKCVYYY06HBZ1X 18S rRNA gene, PS, ITS2
Uncultured Rhodotorula IT51, 5.8S rRNA, ITS2 and partial 28S rRNA, clone MNIB2FAST_K1
Uncultured Tremellales clone 5_D20 18S rRNA, ITS1, 5.8S rRNA gene, ITS1
• Good old fashioned microbiology Uncultured fungus clone U_QM_090130_127_1A_plate1g12.b1 18S rRNA gene, PS, ITS1
Uncultured fungus clone OTU_1445_1GW5CJXV07HXDTO 18S rRNA gene
Uncultured fungus clone MOTU_3163_GYUGV5B0412KQP 18S rRNA gene, PS, ITS1
Uncultured fungus clone MOTU_533_GOKCVYYY06GU3JA18S rRNA gene, PS, ITS1
Uncultured fungus clone U_QM_090130_240_B_plate1a12.b1 18S rRNA gene, PS, ITS1
Uncultured fungus clone OTU_403_GW5CJXV07IOX5A 18S rRNA gene
Uncultured fungus clone singleton_70-3063_2201 18S rRNA gene, PS, ITS
HIbbett and Taylor 2013 gi|22497358|gb|FJ761130.1| uncultured fungus clone singleton_70-3063_2201 18S rRNA gene
29. From barcodes to organisms - low throughput but effective
Dilution to Extinction (d2e)
‘High throughput’ isolation from global dust samples
Sarea resinae
Cryptocoryneum rilstonei
Keith Seifert
30. Community
comparisons
• Pie charts of taxonomic differences
varied across treatments
• 16S Community composition varies
with smoking and COPD status
Erb-Downward et al 2011.
33. Tools - QIIME: Quantitative Insight Into Molecular Ecology
• For amplicon based datasets (16s, 18s, ITS)
• Alpha diversity - phylogenetic diversity, Chao, number of observed species
• Generate species diversity plots to assess community diversity
• Beta diversity - Unifrac distance, Bray-Curtis, Jaccard
• Need reference phylogenetic tree to compute these, unavailable
• Support for shotgun metagenomics
34. Approaches to clustering sequences
• De novo clustering
• Requires all-vs-all searches, very expensive
• Known Knows - “Closed reference”
• Match sequences to a database of representative
known sequences
• Fast, but throw out unknowns
• Known Knowns and Known Unknowns - “Open reference”
• Match to known set and de novo cluster the remainder
35. QIIME on fungal data
• New (Dec 2012) Fungal ITS reference database from UNITE incorporated as QIIME resource
• Can use it to match against known set (closed-reference) or match and cluster unknowns
(open reference)
• One dataset of Indoor dust samples from Kerry Kinney (UT Austin) group
• A second indoor sampled (Amend et al)
37. A previously published indoor mycobiome
• Amend et al PNAS 2010 “Indoor fungal composition is geographically patterned and more
diverse in temperate zones than in the tropics.”
• Sequencing dust from houses and office buildings
• 72 samples of fungi from 6 continents. Sampled ITS2 region and the D1-D2 region of LSU
with 454-FLX
• A primary finding was increasing species diversity with increasing latitude
44. ITS 28S
PCA
of
normalized
counts
–
Painted
by
rRNA
type MG-‐RAST
tools
45. PCA
of
normalized
counts
–
Painted
by
sampled
country
MG-‐RAST
tools
46. PCA
of
normalized
counts
–
Painted
by
sampled
eleva@on
MG-‐RAST
tools
47. Metagenomics -
shotgun approach
• For non-amplicon based studies of
community composition
• Will be the future approaches for
community studies with the
increased sequencing depth
• Metatranscriptomics for studying
what is expressed
• Support in QIIME and MG-RAST for
the studies, but limited by the
diversity of genome/protein
sequences which can be matched.
52. Summary
• Fungal microbial ecology is embracing highthroughput sequencing technologies for
community studies
• Limitations due to lack of curated sequences and the properties of the marker loci used
• Building new databases and tools to help with the analyses will improve utility
• Improvements in sequencing chemistry (read length x depth) make this a moving target for
establishing the best practices
• Deeper studies will improve our understanding of the fungal diversity and role of fungi in
different ecosystems - 1000 genomes project can help provide anchor representatives of
this diversity.