SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Lecture 14:

EVE 161:

Microbial Phylogenomics
!

Lecture #15:
Era IV: Shotgun Metagenomics
!
UC Davis, Winter 2014
Instructor: Jonathan Eisen

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!1
Where we are going and where we have been

• Previous lecture:
! 14: Era IV: Metagenomics
• Current Lecture:
! 15: Era IV: Shotgun Metagenomics
! Next Lecture:
! 16: Era IV: Function in Metagenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!2
Era IV: Genomes in the environment

Era IV:
Shotgun Metagenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Environmental Shotgun Sequencing

•
•

ESS first applied to endosymbiont genomes

•
•

Buchnera genome sequenced with ESS

Endosymbionts relatively clonal within one host and
even within one species sometimes

Many others too

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Wolbachia Metagenomic Sequencing

shotgun

sequence

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Wolbachia pipientis wMel

Wu et al., 2004. Collaboration between Jonathan Eisen and Scott O’Neill (Yale, U. Queensland).

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
articles

Community structure and metabolism
through reconstruction of microbial
genomes from the environment
Gene W. Tyson1, Jarrod Chapman3,4, Philip Hugenholtz1, Eric E. Allen1, Rachna J. Ram1, Paul M. Richardson4, Victor V. Solovyev4,
Edward M. Rubin4, Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2
1

Department of Environmental Science, Policy and Management, 2Department of Earth and Planetary Sciences, and 3Department of Physics, University of California,
Berkeley, California 94720, USA
4
Joint Genome Institute, Walnut Creek, California 94598, USA

RESEARCH ARTICLE

...........................................................................................................................................................................................................................

Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their
roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report
reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other
genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of
genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different
individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level.
The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance
variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous
recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the
pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme
J. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3
environment.
2
2
3

Environmental Genome Shotgun
Sequencing of the Sargasso Sea

Aaron L. Halpern, Doug Rusch, Jonathan A. Eisen,
Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3
The study of microbial evolution and ecology has been revolutio- fluorescence3in situ hybridization Anthony H. Knap,6 biofilms
Derrick E. Fouts, Samuel Levy,2 (FISH) revealed that all
nized by DNA sequencing and analysis1–3. However, isolates have contained mixtures of bacteria (Leptospirillum, Sulfobacillus and, in
Michael W. Lomas,6 Ken Nealson,5 Owen White,3 and other
been the main source of sequence data, and only a small fraction of a few cases, Acidimicrobium) and1archaea (Ferroplasma 6
Jeremy Peterson,3 Thermoplasmatales). The genome of one
microorganisms have been cultivated4–6. Consequently, focus has members of theJeff Hoffman, Rachel Parsons, of these
shifted towards the analysis of uncultivated microorganisms via archaea, Ferroplasma acidarmanus fer1, isolated fromRogers,4
Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui the Richmond
5
cloning of conserved genes and genome fragments directly from mine, has been sequenced previously (http://www.jgi.doe.gov/JGI_
Hamilton O. Smith1
the environment7–9. To date, only a small fraction of genes have been microbial/html/ferroplasma/ferro_homepage.html).
Slides for UC Davis EVE161 Course biofilm (Fig.Jonathan Eisen Winter 2014 was
recovered from individual environments, limiting the analysis of
A pink Taught by 1a) typical of AMD communities

chlorococcus, tha
photosynthetic bio
Surface water
were collected ab
from three sites o
February 2003. A
lected aboard the S
station S” in May
are indicated on F
S1; sampling prot
one expedition to
was extracted from
genomic libraries w
2 to 6 kb were m
prepared plasmid
both ends to!11
provi
Shotgun metagenomics

shotgun
sequence

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!12
articles

Community structure and metabolism
through reconstruction of microbial
genomes from the environment
Gene W. Tyson1, Jarrod Chapman3,4, Philip Hugenholtz1, Eric E. Allen1, Rachna J. Ram1, Paul M. Richardson4, Victor V. Solovyev4,
Edward M. Rubin4, Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2
1

Department of Environmental Science, Policy and Management, 2Department of Earth and Planetary Sciences, and 3Department of Physics, University of California,
Berkeley, California 94720, USA
4
Joint Genome Institute, Walnut Creek, California 94598, USA

...........................................................................................................................................................................................................................

Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their
roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report
reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other
genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of
genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different
individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level.
The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance
variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous
recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the
pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme
environment.
The study of microbial evolution and ecology has been revolutio- fluorescence in situ hybridization (FISH) revealed that all biofilms
nized by DNA sequencing and analysis1–3. However, isolates have contained mixtures of bacteria (Leptospirillum, Sulfobacillus and, in
Slides for UC Davis a small fraction of a few cases, Acidimicrobium) and archaea
been the main source of sequence data, and onlyEVE161 Course Taught by Jonathan Eisen Winter 2014 (Ferroplasma and other
is internally self consistent, with 97.2% of end pairs from fer1. We designate
uncultured Ferroplasma species distinct from the same
Acid Mine Drainage 2004 the appropriate orientation and separation, as
this as Ferroplasma type II. The dominance of this organism type
clone assembled with
was unexpected before the genomic analysis.
We assigned the (tracking and chimaericto
scaffolds
expected for a low rate of mispairing error roughly 3£ coverage, high GþC(474 scaffolds
Leptospirillum group III on the basis of rRNA markers
up to 31 kb, totalling 2.66 Mb). Comparison of these scaffolds with
clones).
those assigned to Leptospirillum group II indicates significant
sequence divergence and only locally conserved gene order, conThe first step in assignment of scaffolds to organism types was to
The first step in assignment of scaffolds to organism types was to

Figure 1 The pink biofilm. a, Photograph of the biofilm in the Richmond mine (hand
included for scale). b, FISH image of a. Probes targeting bacteria (EUBmix; fluorescein
isothiocyanate (green)) and archaea (ARC915; Cy5 (blue)) were used in combination with a
probe targeting the Leptospirillum genus (LF655; Cy3 (red)). Overlap of red and green
(yellow) indicates Leptospirillum cells and shows the dominance of Leptospirillum.
c, Relative microbial abundances determined using quantitative FISH counts.
2

firming that the scaffolds belong to a relatively distant relative of
Leptospirillum group II. A partial 16S rRNA gene sequence from
Sulfobacillus thermosulfidooxidans was identified in the unassembled reads, suggesting very low coverage of this organism. If
any Sulfobacillus scaffolds .2 kb were assembled, they would be
grouped with the Leptospirillum group III scaffolds.
We compared the 3£ coverage, low GþC scaffolds (580 scaffolds,
4.12 Mb) to the fer1 genome in order to assign them to organism
types (Supplementary Fig. S6). Scaffolds with $96% nucleotide
identity to fer1 were assigned to an environmental Ferroplasma type
I genome (170 scaffolds up to 47 kb in length and comprising
1.48 Mb of sequence). The remaining low-coverage, low GþC
scaffolds are tentatively assigned to G-plasma. The largest scaffold
in this bin (62 kb) contains the G-plasma 16S rRNA gene. The 410
scaffolds assigned to G-plasma comprise 2.65 Mb of sequence. A
partial 16S rRNA gene sequence from A-plasma was identified in the
unassembled reads, suggesting low coverage of this organism. Any
scaffolds from A-plasma .2 kb would be included in the G-plasma
bin. Although eukaryotes are present in the AMD system, they were
in low abundance in the biofilm studied. So far, no scaffolds from
eukaryotes have been detected.
As independent evidence that the Leptospirillum group II and
Ferroplasma type II genomes are nearly complete, we located a full
complement of transfer RNA synthetases in each genome data set.
An almost complete set of these genes was also recovered from
Leptospirillum group III. The G-plasma bin contains more than a full
set of tRNA synthetases, consistent with inclusion of some A-plasma
scaffolds. In addition, we established that the Leptospirillum
group II, Leptospirillum group III, Ferroplasma type I, Ferroplasma
type II and G-plasma bins contained only one set of rRNA genes.

NATURE | doi:10.1038/nature02340 | www.nature.com/nature
Slides for UC Davis EVE1612004 Nature Publishing Jonathan Eisen Winter 2014
© Course Taught by Group

le
u
c
re
u
th
w

L
u
th
se
fi
L
S
a
a
g

4
ty
!14 id
Methods
• Plasmid library
• Shotgun sequence
• Assembled
• Binning
! GC content
! Coverage
• Potential “nearly” complete genomes
! Leptospirillum group II
! Ferroplasma type II
! Evidence for completeness: housekeeping genes
• Annotation, population analysis
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Leptospirillum group II genome may reflect strong recent environmental selection for this genome type or be the result of a founder
effect.

undergone homologous recombination. It is unlikely that the reads
with pattern transitions represent variants that arose simply
through accumulation of nucleotide polymorphisms, because this

Figure 2 Segment of the Ferroplasma type II composite genome. a, A 4.2-kb region
showing annotated open reading frames (ORFs) (red), average read depth (blue line), and
the number of nucleotide polymorphisms in the ‘green’ and ‘yellow’ relative to the ‘pink’
strain (green and yellow lines) averaged over 60-bp windows. Black dots indicate

recombination sites. b, Alignment of individual reads (XYG) for a 96-bp region in a. Letters
indicate nucleotide polymorphisms in the green and yellow strains relative to the pink
strain. Note the recombinant sequence (XYG48207). c, Evolutionary distance tree inferred
from the ancestral strain sequences in a.

NATURE | doi:10.1038/nature02340 | www.nature.com/nature

©2004 Nature Publishing Group

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

3
tein-coding sequences yields a very large number of genomic

limited evidence
integrases). We c
genes in order to
system is large e
transfer. Identical
plasma and Ferro
contexts), sugges
both lineages. Sim
with identical ad
genomic contexts
indicating that a
groups.

Metabolic analy

Figure 3 Schematic diagram illustrating a diversity of mosaic genome types within the
Ferroplasma type II population that are inferred to have arisen by homologous recombination
between three closely related ancestral genome types (pink, yellow and green).
4

We recovered nea
members of the
group II are par
phylum member
the metabolic pa
Ferroplasma type
plementary Infor
logical roles of
understanding of
The acidophi
that grow in th

©2004 Nature Publishing Group
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
genes needed to fix carbon by means of the Calvin–Benson–
Bassham cycle (using type II ribulose 1,5-bisphosphate carboxylase–oxygenase). All genomes recovered from the AMD system

fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway
by some, or all, organisms. Given the large number of ABC-type
sugar and amino acid transporters encoded in the Ferroplasma type

Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs
identified in the Leptospirillum group II genome (63% with putative assigned function) and
1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell

drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation,

Slides for UC Davis EVE161 Course pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate
Taught by Jonathan Eisen Winter 2014
carboxylase–oxygenase. THF, tetrahydrofolate.

!18
RESEARCH ARTICLE
Environmental Genome Shotgun
Sequencing of the Sargasso Sea
J. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3
Aaron L. Halpern,2 Doug Rusch,2 Jonathan A. Eisen,3
Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3
Derrick E. Fouts,3 Samuel Levy,2 Anthony H. Knap,6
Michael W. Lomas,6 Ken Nealson,5 Owen White,3
Jeremy Peterson,3 Jeff Hoffman,1 Rachel Parsons,6
Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui Rogers,4
Hamilton O. Smith1
We have applied “whole-genome shotgun sequencing” to microbial populations
collected en masse on tangential flow and impact filters from seawater samples
collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs
http://www.sciencemag.org/content/304/5667/66
of nonredundant sequence was generated, annotated, and analyzed to elucidate
the gene content, diversity, and relative abundance of the organisms within
these environmental samples. These data are estimated to derive from at least
1800 genomic species based on sequence relatedness, including 148 previously
unknown bacterial phylotypes. We have identified over 1.2 million previously
unknown genes represented in these samples,by Jonathanmore than 782 new
Slides for UC Davis EVE161 Course Taught including Eisen Winter 2014

chlorococcus, th
photosynthetic bi
Surface wate
were collected a
from three sites
February 2003. A
lected aboard the
station S” in Ma
are indicated on
S1; sampling pro
one expedition to
was extracted fro
genomic libraries
2 to 6 kb were
prepared plasmid
both ends to prov
Craig Venter Sc
nology Center on
ers (Applied Bi
Whole-genome ra
the Weatherbird II
4) produced 1.66
in length, for a tota
microbial DNA se
sequences were g
!19
two groups of scaffolds representing two disSargasso Sea related to the published
tinct strains closely

at depths ranging from 4ϫ to 36ϫ (indicated
with shading in table S3 with nine depicted in
Fig. 1. MODIS-Aqua satellite image of
ocean chlorophyll in the Sargasso Sea grid
about the BATS site from 22 February
2003. The station locations are overlain
with their respective identifications. Note
the elevated levels of chlorophyll (green
color shades) around station 3, which are
not present around stations 11 and 13.

http://www.sciencemag.org/content/304/5667/66

Fig. 2. Gene conserSlides
vation among closely for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!20
• Sampling Protocols. Sampling on the RV Weatherbird II was done as follows: Seawater (170
liters) from stations 11 and 13 was directly filtered through a 0.8µm Supor membrane disc
filter (Pall Life Sciences) followed in series by a 0.22µm Supor membrane disc filter (Pall Life
Sciences). The sample from station 3 was pumped into a 250 L carboy prior to being filtered
through the impact filters. The length of time from collection of the sample until the end of the
filtration step was approximately one hour. Filters were placed in 5ml of sucrose lysis buffer
(20mM EDTA, 400mM NaCl, 0.75 M Sucrose, 50mM Tris-HCl, pH 9.0) and stored in liquid
nitrogen on the Weatherbird then placed at -80oC until DNA extractions were done.
Alternatively seawater (340 liters) was collected from 5 meters below the surface into a
carboy then filtered through a 0.8µm Supor membrane disc filter (Pall Life Sciences), followed
by concentration to 1 liter using a Pellicon tangential flow filtration system (Millipore) with a
0.1µm Durapore VVPP cartridge (Millipore); again the total time for the filtration and
concentration was approximately one hour. Cells were pelleted at 10,000 rpm, 4oC for 30
minutes. ). The impact filters and the retentate from the TFF were then handled as described
above. The carboys, tubing and filter systems were cleaned with a 10% hydrochloric acid
wash prior to each leg of the sampling. Any of the sampling equipment (tubing, etc.) that
could reasonably be soaked was soaked in an acid bath is for at least 24 hours. Sampling
carboys were filled with the acid wash and “soaked” for at least 24 hours as well. All acid
washed items were subsequently rinsed very liberally with Milli-Q water. A liberal Milli-Q water
rinse was also conducted between samples on the same leg. All spigots from the carboys
were covered with a ziploc bag until needed. Tubing was stored in clean ziploc bags until
needed.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sample preparation. The impact filters were cut into quarters and placed in
individual 50 ml conical tubes. TE buffer (5 ml, pH 8) containing 150 ug/ml lysozyme
was added to each tube. The tubes were incubated at 37oC for 2 hours. SDS was
added to 0.1% and the samples were then put through three freeze/thaw cycles.
The lysate was then treated with Proteinase K (100 ug/mL) for one hour at 55oC
followed by three aqueous phenol extractions and one extraction with phenol/
chloroform. The supernatant was then precipitated with two volumes of 100%
ethanol and the DNA pellet washed with 70% ethanol.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
DNA preparation. DNA was randomly sheared, end-polished with consecutive BAL31
nuclease and T4 DNA polymerase treatments, and size-selected by electrophoresis on
1% low-melting-point agarose. After ligation to Bst XI adapters (Invitrogen, catalog no.!
N408-18), DNA was purified by three rounds of gel electrophoresis to remove excess
adapters, and the fragments, now with 3'-CACA overhangs, were inserted into Bst XIlinearized plasmid vector with 3'-TGTG overhangs. Fragments were cloned in a mediumcopy pBR322 derivative.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sequence assembly. With default parameter settings, the highly covered genome sequences would
have been treated as repetitive DNA by the Celera Assembler. Since the Celera Assembler
constructs scaffolds only from a backbone of sequence heuristically classified as unique, these
organisms would not have been eligible for scaffolding and would have been absent from the final
assembly. However, by tuning the threshold parameter for classifying unique sequence, we were
able to compensate for the apparent repetitiveness of these genomic regions, and scaffold them
appropriately. This was accomplished by identifying the most deeply assembling, obviously nonrepetitive contigs in an initial run of the assembler (in this case, the strong assemblies at 21-36x
coverage which were identified as gene-rich Burkholderia-like and plasmid scaffolds), and using a
value slightly below the calculated “A-statistic” (an empirical uniqueness measure within the
Assembler) of these contigs as the threshold parameter in a subsequent run. This allows the deep
contigs to be treated as unique sequence, when they would otherwise be labeled as repetitive. At
the other end of the spectrum, rare organisms in the sample have been sampled by sequencing
only to a shallow depth of coverage. Routine assembly would not have considered the small
fragment overlap based assemblies with shallow coverage as an eligible basis for scaffolding, due
to a minimum length requirement of 1000bp, which is typically in place for efficiency. Therefore, in
the present use case, the organisms represented by these sequences would not have been ordered
and oriented with mate-pairs without adjusting the default minimum length to compensate for the
low anticipated coverage depth and assembly length. With this selection of parameters, more
suitable to the enivironmental project at hand, we were able to adequately assemble both the
dominant and rare species simultaneously.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Methods
• Plasmid library
• Shotgun sequence
• Assembled
• No Major Binning
• Potential “nearly” complete genomes
• Annotation, population analysis, phylogenetic analysis

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
e relatively limited depth of serage given the level of diversity
ple.
genome shotgun (WGS) assembly
sited at DDBJ/EMBL/GenBank
ect accession AACY00000000,
have been deposited in a correeDB trace archive. The version
his paper is the first version,
00. Unlike a conventional WGS
deposited not just contigs and
e unassembled paired singletons
singletons in order to accuratediversity in the sample and
across the entire sample withabase.
and large assemblies. Our
ocused on the well-sampled geacterizing scaffolds with at least
depth. There were 333 scaffolds
26 contigs and spanning 30.9
his criterion (table S3), accounty 410,000 reads, or 25% of the
ly data set. From this set of wellal, we were able to cluster and
blies by organism; from the rare
ample, we used sequence similarods together with computational
obtain both qualitative and quans of genomic and functional diverparticular marine environment.
yed several criteria to sort the
y pieces into tentative organism
nclude depth of coverage, oligo-

Fig. 2. Gene conservation among closely
related Prochlorococcus. The outermost
concentric circle of
the diagram depicts
the competed genomic sequence of Prochlorococcus marinus
MED4 (11). Fragments
from environmental
sequencing were compared to this completed Prochlorococcus genome and are shown in
the inner concentric
circles and were given
boxed outlines. Genes
for the outermost circle have been assigned psuedospectrum colors based on
the position of those
genes along the chromosome, where genes
nearer to the start of
the genome are colored in red, and genes
nearer to the end of the genome are colored in blue. Fragments from environmental sequencing
were subjected to an analysis that identifies conserved gene order between those fragments and
the completed Prochlorococcus MED4 genome. Genes on the environmental genome segments
that exhibited conserved gene order are colored with the same color assignments as the
Prochlorococcus MED4 chromosome. Colored regions on the environmental segments exhibiting
color differences from the adjacent outermost concentric circle are the result of conserved gene
order with other MED4 regions and probably represent chromosomal rearrangements. Genes that
did not exhibit conserved gene order are colored in black.

http://www.sciencemag.org/content/304/5667/66
www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

67
RESEARCH ARTICLE
Fig. 3. Comparison of
Sargasso Sea scaffolds to Crenarchaeal
clone 4B7. Predicted
proteins from 4B7
and the scaffolds
showing significant
homology to 4B7 by
tBLASTx are arrayed
in positional order
along the x and y
axes. Colored boxes
represent
BLASTp
matches scoring at
least 25% similarity
and with an e value
of better than 1e-5.
Black vertical and
horizontal lines delineate scaffold borders.

http://www.sciencemag.org/content/304/5667/66
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Fig. 4). Oth
separated, p
nation of sh
nomic signa
greater dive
genomes (9
Discrete
continuum
scaffolds (21
and 9.35 M
single nucl
10,000 base
ence of disc
the remaini
SNP rate ran
a length-we
We closely
alignments
and were ab
distinct clas
related hap
creasing th
(10), and re
homogenou
consensus w
haplotypes,
fold region
cus scaffold
Fig. 4. Circular diagrams of nine complete megaplasmids. Genes encoded in the forward direction
are shown in the outer concentric circle; reverse coding genes are shown in the inner concentric
circle. The genes have been given role category assignment and colored accordingly: amino acid
biosynthesis, violet; biosynthesis of cofactors, prosthetic groups, and carriers, light blue; cell
envelope, light green; cellular processes, red; central intermediary metabolism, brown; DNA
metabolism, gold; energy metabolism, light gray; fatty acid and phospholipid metabolism, magenta;
protein fate and protein synthesis, pink; purines, pyrimidines, nucleosides, and nucleotides, orange;
regulatory functions and signal transduction, olive; transcription, dark green; transport and binding
proteins, blue-green; genes with no known homology to other proteins and genes with homology
to genes with no known function, white; genes of unknown function, gray; Tick marks are placed
on 10-kb intervals.

68

homogenous blend of discrepancies from
consensus without any apparent separation
haplotypes, such as the Prochlorococcus s
fold region (Fig. 5). Indeed, the Prochloroc
cus scaffolds display considerable heteroge
ity not only at the nucleotide sequence le
(Fig. 5) but also at the genomic level, wh
multiple scaffolds align with the same regio
the MED4 (11) genome but differ due to g
or genomic island insertion, deletion, rearran
ment events. This observation is consistent w
previous findings (12). For instance, scaffo
2221918 and 2223700 share gene synteny w
each other and MED4 but differ by the inser
of 15 genes of probable phage origin, lik
representing an integrated bacteriophage. Th
genomic differences are displayed graphic
in Fig. 2, where it is evident that up to f
conflicting scaffolds can align with the sa
region of the MED4 genome. More than 8
of the Prochlorococcus MED4 genome can
aligned with Sargasso Sea scaffolds gre
than 10 kb; however, there appear to b
couple of regions of MED4 that are not rep
sented in the 10-kb scaffolds (Fig. 2).
larger of these two regions (PMM1187
PMM1277) consists primarily of a gene clu
coding for surface polysaccharide biosynthe
which may represent a MED4-specific poly
charide absent or highly diverged in our S
gasso Sea Prochlorococcus bacteria. The he
ogeneity of the Prochlorococcus scaffolds sug
that the scaffolds are not derived from a sin
discrete strain, but instead probably represen
conglomerate assembled from a population
closely related Prochlorococcus biotypes.
The gene complement of the Sargas
The heterogeneity of the Sargasso sequen
complicates the identification of micro
genes. The typical approach for microbial
notation, model-based gene finding, relies
tirely on training with a subset of manu

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org
http://www.sciencemag.org/content/304/5667/66

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
frames (5). A total of 69,901 novel genes belonging to 15,601 single link clusters were identified. The predicted genes were categorized
Table 1. Gene count breakdown by TIGR role
category. Gene set includes those found on assemblies from samples 1 to 4 and fragment reads
from samples 5 to 7. A more detailed table, separating Weatherbird II samples from the Sorcerer II
samples is presented in the SOM (table S4). Note
that there are 28,023 genes which were classified
in more than one role category.
TIGR role category
Amino acid biosynthesis
Biosynthesis of cofactors,
prosthetic groups, and carriers
Cell envelope
Cellular processes
Central intermediary metabolism
DNA metabolism
Energy metabolism
Fatty acid and phospholipid
metabolism
Mobile and extrachromosomal
element functions
Protein fate
Protein synthesis
Purines, pyrimidines, nucleosides,
and nucleotides
Regulatory functions
Signal transduction
Transcription
Transport and binding proteins
Unknown function
Miscellaneous
Conserved hypothetical

Total
genes
37,118
25,905
27,883
17,260
13,639
25,346
69,718
18,558
1,061
28,768
48,012
19,912
8,392
4,817
12,756
49,185
38,067
1,864
794,061

Total number of roles assigned

1,242,230

Total number of genes

1,214,207

Fig. 5. Prochlorococcus-related scaffold 2223290 illustra
nity of closely related organisms, distinctly nonpunctat
global structure of Scaffold 2223290 with respect to asse
sequence alignment. Blue segments, contigs; green segm
stages of the assembly of fragments into the resulting
fragments were initially assembled in several different
form the final contig structure. The multiple sequenc
homogenous blend of haplotypes, none with sufficie
separate assembly.

http://www.sciencemag.org/content/304/5667/66
www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
d curated genes. With the vast maSargasso sequence in short (less
unassociated scaffolds and singleundreds of different organisms, it is
o apply this approach. Instead, we
n evidence-based gene finder (5).
ence in the form of protein alignquences in the bacterial portion of
ndant amino acid (nraa) data set
sed to determine the most likely
e. Likewise, approximate start and
s were determined from the boundtes of the alignments and refined to
cific start and stop codons. This
entified 1,214,207 genes covering
B of the total data set. This repreximately an order of magnitude
http://www.sciencemag.org/content/304/5667/66
nces than currently archived in the
Slides for UC
ssProt database (14), which con- Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

RESEA
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
rRNA phylotyping from metagenomics

http://www.sciencemag.org/content/304/5667/66

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!32
Shotgun Sequencing Allows Alternative Anchors (e.g., RecA)

http://www.sciencemag.org/content/304/5667/66

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!33
nomic group using the phylogenetic analysis
described for rRNA. For example, our data set

marker genes, is roughly comparable to the
97% cutoff traditionally used for rRNA. Thus

http://www.sciencemag.org/content/304/5667/66

Fig. 6. Phylogenetic diversity of Sargasso Sea sequences using multiple phylogenetic markers. The
relative contribution of organisms from different major phylogenetic groups (phylotypes) was
measured using multiple phylogenetic markers that have been used previously in phylogenetic
studies of prokaryotes: 16S rRNA, RecA, EF-Tu, EF-G, HSP70, and RNA polymerase B (RpoB). The
relative proportion of different phylotypes for each sequence (weighted by the depth of coverage
of the contigs from which those sequences came) is shown. The phylotype distribution was
determined as follows: (i) Sequences in the Sargasso data set corresponding to each of these genes
were identified using HMM and BLAST searches. (ii) Phylogenetic analysis was performed for each
phylogenetic marker identified in the Sargasso data separately compared with all members of that
gene family in all complete genome sequences (only complete genomes were used to control for
the differential sampling of these markers in GenBank). (iii) The phylogenetic affinity of each
sequence was assigned based on the classification of the nearest neighbor in the phylogenetic tree.

Slides for UC Davis
RIL 2004 VOL 304 SCIENCE www.sciencemag.org EVE161 Course Taught by Jonathan Eisen Winter 2014

!34
method based on fitting the observed depth of
coverage to a theoretical model of assembly
progress for a sample corresponding to a mix-

that a minimum of 12-fold deeper sampling
would be required to obtain 95% of the unique
sequence. However, these are only lower

Table 2. Diversity of ubiquitous single copy protein coding phylogenetic markers. Protein column uses
symbols that identify six proteins encoded by exactly one gene in virtually all known bacteria. Sequence
ID specifies the GenBank identifier for corresponding E. coli sequence. Ortholog cutoff identifies BLASTx
e-value chosen to identify orthologs when querying the E. coli sequence against the complete Sargasso
Sea data set. Maximum fragment depth shows the number of reads satisfying the ortholog cutoff at the
point along the query for which this value is maximal. Observed “species” shows the number of distinct
clusters of reads from the maximum fragment depth column, after grouping reads whose containing
assemblies had an overlap of at least 40 bp with Ͼ 94% nucleotide identity (single-link clustering).
Singleton “species” shows the number of distinct clusters from the observed “species” column that
consist of a single read. Most abundant column shows the fraction of the maximum fragment depth that
consists of single largest cluster.

Protein

Sequence ID

Ortholog
cutoff

AtpD
GyrB
Hsp70
RecA
RpoB
TufA

NTL01EC03653
NTL01EC03620
NT01EC0015
NTL01EC02639
NTL01EC03885
NTL01EC03262

1e-32
1e-11
1e-31
1e-21
1e-41
1e-41

Max.
fragment
depth

Observed
“species”

Singleton
“species”

Most
abundant
(%)

836
924
812
592
669
597

456
569
515
341
428
397

317
429
394
244
331
307

6
4
4
8
7
3

of se
ever
nity.
resen
know
scaff
cont
even
SAR
cove
fold,
21,0
popu
uted
V
key
proa
men
the r
isms
half
men
equa
colle

Table 3. Diversity models based on depth of coverage. Each row correcolumn) in the sample. The thi
http://www.sciencemag.org/content/304/5667/66
sponds to an abundance class of organisms. The first column in each
a genome expected to be s
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
model “fr(asm)” gives the fraction of the assembly consensus modeled
gives the resulting estimat
Figure S6. Accumulation curve for rpoB. Observed (black) OTU counts for rpoB (based
on the fragment grouping summarized in Table 2), as well as the Chao1-corrected
estimate of total species (red; see (3)). Points are mean values of 1000 shufflings of the
observed data, while bars show 90% confidence intervals.
http://www.sciencemag.org/content/304/5667/66
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
MS 1093857: Environmental Genome Shotgun Sequencing of the Sargasso Sea
Venter et al., revised

Figure S7. Each point in the figure corresponds to a scaffold from the assembly
(restricted to scaffolds > 10kb). Scaffolds were placed in separate panels of the figure
according to the most closely related organism as indicated by the BLAST searches
described in the text. Within a panel, a scaffold is shown with x coordinate equal to its
length, y coordinate equal to its estimated depth of coverage, and color determined by
which of 6 k-mer composition clusters it was assigned to. Depth of coverage was
estimated as the total base pairs in reads belonging to a given assembly piece divided by
the length of the consensus sequence for the piece. K-mer composition clusters were
determined by representing each scaffold as a vector of the frequencies of all possible 4mers, considering both the forward and reverse strands of the sequence, and then
applying the K-means clustering algorithm.

http://www.sciencemag.org/content/304/5667/66
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Functional Diversity of Proteorhodopsins?

http://www.sciencemag.org/content/304/5667/66

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!38
MS 1093857: Environmental Genome Shotgun Sequencing of the Sargasso Sea
Venter et al., revised

Figure S10. Scaffold 2217664, containing the gene encoding Proteorhodopsin. Genes are
colored using color assignments described in Fig. 2, and contig boundaries are indicated
with red vertical lines. In this scaffold, rhodopsin is associated with a DNA-directed
RNA polymerase, sigma subunit (rpoD) originating in the CFB group.
http://www.sciencemag.org/content/304/5667/66
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Binning challenge

A
B
C
D
E
F
G

T
U
V
W
X
Y
Z
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!40
Binning challenge

A
B
C
D
E
F
G

T
U
V
W
X
Y
Z
Best binning method: reference genomes
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

!41
Glassy Winged Sharpshooter
• Feeds on xylem sap	

• Vector for Pierce’s Disease 	

• Potential bioterror agent	

• Collaboration with Nancy
Moran to sequence
symbiont genomes	

• Funded by NSF	

• Published in PLOS Biology
2006

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sharpshooter Shotgun Sequencing

shotgun

Collaboration with Nancy Moran’s
Wu et al. 2006 PLoS Biology 4: e188.
lab
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Binning challenge

A	

B	

C	

D	

E	

F	

G

No reference genome? What do you do?	


!
Phylogeny ....
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

T	

U	

V	

W	

X	

Y	

Z
CFB Phyla

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sulcia makes vitamins and cofactors

Baumannia makes amino acids

Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

48
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sorcerer II GOS Expedition

Figure 1. Sampling Sites
Microbial populations were sampled from locations in the order shown. Samples were collected at approximately 200 miles (320 km) intervals along the
eastern North American coast through the Gulf of Mexico into the equatorial Pacific. Samples 00 and 01 identify sets of sites sampled as part of the
Sargasso Sea pilot study [19]. Samples 27 through 36 were sampled off the Galapagos Islands (see inset). Sites shown in gray were not analyzed as part
of this study.
doi:10.1371/journal.pbio.0050077.g001

environments as well as a few nonmarine aquatic samples for
the pilot Sargasso Sea study, 200 l surface seawater was
contrast (Table Eisen Winter 2014
filtered to isolate microorganisms UC Davis EVE161analysis. Taught by Jonathan1).
Slides for for metagenomic Course
Stalking the Fourth Domain in Metagenomic Data:
Searching for, Discovering, and Interpreting Novel, Deep
Branches in Marker Gene Phylogenetic Trees
Dongying Wu1, Martin Wu1,4, Aaron Halpern2,3, Douglas B. Rusch2,3, Shibu Yooseph2,3, Marvin Frazier2,3,
J. Craig Venter2,3, Jonathan A. Eisen1*
1 Department of Evolution and Ecology, Department of Medical Microbiology and Immunology, University of California Davis Genome Center, University of California
Davis, Davis, California, United States of America, 2 The J. Craig Venter Institute, Rockville, Maryland, United States of America, 3 The J. Craig Venter Institute, La Jolla,
California, United States of America, 4 University of Virginia, Charlottesville, Virginia, United States of America

Abstract
Background: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data
associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and
culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated
directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we
argue here, in studies of very early events in the evolution of gene families and of species.
Methodology/Principal Findings: We designed and implemented new methods for analyzing metagenomic data and used
them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly
used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies.
Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in
making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel
branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these
novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.
Conclusions/Significance: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from
uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third
possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which
sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree
of life, we suggest that methods such as those described herein currently offer the best way to search for them.
Citation: Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and
Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011
Editor: Robert Fleischer, Smithsonian Institution National Zoological Park, United States of America
Received October 25, 2010; Accepted February 20, 2011; Published March 18, 2011
This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public
domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Funding: The development and main work on this project was supported by the National Science Foundation via an ‘‘Assembling the Tree of Life’’ grant
(number 0228651) to to Jonathan A. Eisen and Naomi Ward. The final work on this project was funded by the Gordon and Betty Moore Foundation (through
Stalking the Fourth Domain

Figure 1. Phylogenetic tree of the RecA superfamily. All RecA sequences were grouped into clusters using the Lek algorithm. Representatives
of each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment
using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RecA
superfamily are shaded and given a name on the right. Five of the proposed subfamilies contained only GOS sequences at the time of our initial
analysis (RecA-like SAR, Phage SAR1, Phage SAR2, Unknown 1 and Unknown 2) and are highlighted by colored shading. As noted on the tree and in
the text, sequences from two Archaea that were released after our initial analysis group in the Unknown 2 subfamily.
doi:10.1371/journal.pone.0018011.g001

PLoS ONE | www.plosone.org

5

March 2011 | Volume 6 | Issue 3 | e18011

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Five RecA subfamilies were identified as being novel (i.e., only seen in metagenomic data) in our initial analyses. GOS metagenome assemblies that encode members of
these subfamilies were identified and the genes neighboring the novel RecAs were characterized. The neighboring gene descriptions are based on the top BLASTP hits
against the NRAA database; taxonomy assignments are based on their closest neighbor in phylogenetic trees built from the top NRAA BLASTP hits.
doi:10.1371/journal.pone.0018011.t002

Figure 2. The largest assembly from the GOS data that encodes a novel RecA subfamily member (a representative of subfamily
Unknown 2). This GOS assembly (ID 1096627390330) encodes 33 annotated genes plus 16 hypothetical proteins, including several with similarity to
known archaeal genes (e.g., DNA primase, translation initiation factor 2, Table 2). The arrow indicates a novel recA homolog from the Unknown 2
subfamily (cluster ID 9).
doi:10.1371/journal.pone.0018011.g002

Slides for UC
PLoS ONE | www.plosone.org Davis

EVE161 Course7Taught by Jonathan Eisen| Winter 2014 3
March 2011 Volume 6 | Issue

| e18011
Stalking the Fourth Domain

Figure 3. Phylogenetic tree of the RpoB superfamily. All RpoB sequences were grouped into clusters using the Lek algorithm. Representatives
of each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment
using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RpoB
superfamily are shaded and given a name on the right. The two novel RpoB clades that contain only GOS sequences are highlighted by the colored
panels.
doi:10.1371/journal.pone.0018011.g003

Methods

these 340 sequences were extracted from the European Ribosomal

[66] and then
Slides forIdentification of deeply-branching ss-rRNA sequences by Jonathan than 90% gaps or with 2014remove
UC Davis EVE161 Course Taught RNA databasemore Eisen manually curated toalignment
Winter poor
columns with

Mais conteúdo relacionado

Mais procurados

UC Davis EVE161 Lecture 17 by @phylogenomics
 UC Davis EVE161 Lecture 17 by @phylogenomics UC Davis EVE161 Lecture 17 by @phylogenomics
UC Davis EVE161 Lecture 17 by @phylogenomicsJonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingJonathan Eisen
 
UC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsUC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsJonathan Eisen
 
UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics UC Davis EVE161 Lecture 18 by @phylogenomics
UC Davis EVE161 Lecture 18 by @phylogenomicsJonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics Jonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative GenomicsMicrobial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative GenomicsJonathan Eisen
 
EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14Jonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsMicrobial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsJonathan Eisen
 
EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10Jonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
UC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomics
UC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomicsUC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomics
UC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomicsJonathan Eisen
 
EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17Jonathan Eisen
 
EVE 161 Winter 2018 Class 18
EVE 161 Winter 2018 Class 18EVE 161 Winter 2018 Class 18
EVE 161 Winter 2018 Class 18Jonathan Eisen
 
EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15Jonathan Eisen
 
UC Davis EVE161 Lecture 11 by @phylogenomics
UC Davis EVE161 Lecture 11 by @phylogenomicsUC Davis EVE161 Lecture 11 by @phylogenomics
UC Davis EVE161 Lecture 11 by @phylogenomicsJonathan Eisen
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13Jonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4Jonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNA
Microbial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNAMicrobial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNA
Microbial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNAJonathan Eisen
 

Mais procurados (20)

UC Davis EVE161 Lecture 17 by @phylogenomics
 UC Davis EVE161 Lecture 17 by @phylogenomics UC Davis EVE161 Lecture 17 by @phylogenomics
UC Davis EVE161 Lecture 17 by @phylogenomics
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
UC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsUC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomics
 
UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics UC Davis EVE161 Lecture 18 by @phylogenomics
UC Davis EVE161 Lecture 18 by @phylogenomics
 
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
 
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative GenomicsMicrobial Phylogenomics (EVE161) Class 13 - Comparative Genomics
Microbial Phylogenomics (EVE161) Class 13 - Comparative Genomics
 
EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14
 
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsMicrobial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
 
EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10EVE 161 Winter 2018 Class 10
EVE 161 Winter 2018 Class 10
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
UC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomics
UC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomicsUC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomics
UC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomics
 
EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17EVE 161 Winter 2018 Class 17
EVE 161 Winter 2018 Class 17
 
EVE 161 Winter 2018 Class 18
EVE 161 Winter 2018 Class 18EVE 161 Winter 2018 Class 18
EVE 161 Winter 2018 Class 18
 
EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15
 
UC Davis EVE161 Lecture 11 by @phylogenomics
UC Davis EVE161 Lecture 11 by @phylogenomicsUC Davis EVE161 Lecture 11 by @phylogenomics
UC Davis EVE161 Lecture 11 by @phylogenomics
 
EVE 161 Lecture 4
EVE 161 Lecture 4EVE 161 Lecture 4
EVE 161 Lecture 4
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 
Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4Microbial Phylogenomics (EVE161) Class 4
Microbial Phylogenomics (EVE161) Class 4
 
Microbial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNA
Microbial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNAMicrobial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNA
Microbial Phylogenomics (EVE161) Class 6: Era II - Culture Independent rRNA
 

Semelhante a Lecture 15: Era IV Shotgun Metagenomics

Metagenomics .pptx
Metagenomics .pptxMetagenomics .pptx
Metagenomics .pptxMalikSahib22
 
Metagenomics , Applications, Techniques And Limitations .pptx
Metagenomics , Applications, Techniques And Limitations .pptxMetagenomics , Applications, Techniques And Limitations .pptx
Metagenomics , Applications, Techniques And Limitations .pptxMalikSahib22
 
Eight Primate Research
Eight Primate ResearchEight Primate Research
Eight Primate ResearchJan Champagne
 
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Jonathan Eisen
 
Brief history and development of metagenomics
Brief history and development of metagenomicsBrief history and development of metagenomics
Brief history and development of metagenomicsSunidhi Shreya
 
Marine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesMarine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesJonathan Eisen
 
Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....Jonathan Eisen
 
U. littoralis research symposium poster
U. littoralis research symposium posterU. littoralis research symposium poster
U. littoralis research symposium posterLauren Stoneburner
 
Classification of microorganisms
Classification of microorganismsClassification of microorganisms
Classification of microorganismsahsankamal21
 
Gogarten issol2014 version4
Gogarten issol2014 version4Gogarten issol2014 version4
Gogarten issol2014 version4J Peter Gogarten
 
DNA -Genetic Material
DNA -Genetic MaterialDNA -Genetic Material
DNA -Genetic Materialgueste61bda
 
The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen Jonathan Eisen
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolutionPaula Mills
 
Molecular Genetics
Molecular GeneticsMolecular Genetics
Molecular GeneticsJolie Yu
 
5 mohammad chamani
5 mohammad chamani5 mohammad chamani
5 mohammad chamaniDheeraj Vasu
 

Semelhante a Lecture 15: Era IV Shotgun Metagenomics (20)

Metagenomics .pptx
Metagenomics .pptxMetagenomics .pptx
Metagenomics .pptx
 
Metagenomics , Applications, Techniques And Limitations .pptx
Metagenomics , Applications, Techniques And Limitations .pptxMetagenomics , Applications, Techniques And Limitations .pptx
Metagenomics , Applications, Techniques And Limitations .pptx
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Eight Primate Research
Eight Primate ResearchEight Primate Research
Eight Primate Research
 
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, a...
 
Brief history and development of metagenomics
Brief history and development of metagenomicsBrief history and development of metagenomics
Brief history and development of metagenomics
 
Marine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesMarine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and Opportunities
 
Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....
 
U. littoralis research symposium poster
U. littoralis research symposium posterU. littoralis research symposium poster
U. littoralis research symposium poster
 
Classification of microorganisms
Classification of microorganismsClassification of microorganisms
Classification of microorganisms
 
Gogarten issol2014 version4
Gogarten issol2014 version4Gogarten issol2014 version4
Gogarten issol2014 version4
 
DNA -Genetic Material
DNA -Genetic MaterialDNA -Genetic Material
DNA -Genetic Material
 
The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen The Era of the Microbiome - Talk by Jonathan Eisen
The Era of the Microbiome - Talk by Jonathan Eisen
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolution
 
Molecular Genetics
Molecular GeneticsMolecular Genetics
Molecular Genetics
 
5 mohammad chamani
5 mohammad chamani5 mohammad chamani
5 mohammad chamani
 
Activity keys sp 2018
Activity keys sp 2018Activity keys sp 2018
Activity keys sp 2018
 
Forensic Science
Forensic ScienceForensic Science
Forensic Science
 
THESIS_APRIL2015_0501_Final
THESIS_APRIL2015_0501_FinalTHESIS_APRIL2015_0501_Final
THESIS_APRIL2015_0501_Final
 

Mais de Jonathan Eisen

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfJonathan Eisen
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesJonathan Eisen
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingJonathan Eisen
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsJonathan Eisen
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Jonathan Eisen
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2Jonathan Eisen
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4Jonathan Eisen
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 Jonathan Eisen
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines Jonathan Eisen
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionJonathan Eisen
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2Jonathan Eisen
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionJonathan Eisen
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionJonathan Eisen
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingJonathan Eisen
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesJonathan Eisen
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingJonathan Eisen
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionJonathan Eisen
 

Mais de Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 

Último

JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Último (20)

JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Lecture 15: Era IV Shotgun Metagenomics

  • 1. Lecture 14: EVE 161:
 Microbial Phylogenomics ! Lecture #15: Era IV: Shotgun Metagenomics ! UC Davis, Winter 2014 Instructor: Jonathan Eisen Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !1
  • 2. Where we are going and where we have been • Previous lecture: ! 14: Era IV: Metagenomics • Current Lecture: ! 15: Era IV: Shotgun Metagenomics ! Next Lecture: ! 16: Era IV: Function in Metagenomics Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !2
  • 3. Era IV: Genomes in the environment Era IV: Shotgun Metagenomics Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 4. Environmental Shotgun Sequencing • • ESS first applied to endosymbiont genomes • • Buchnera genome sequenced with ESS Endosymbionts relatively clonal within one host and even within one species sometimes Many others too Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 5. Wolbachia Metagenomic Sequencing shotgun sequence Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 6. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 7. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 8. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 9. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 10. Wolbachia pipientis wMel Wu et al., 2004. Collaboration between Jonathan Eisen and Scott O’Neill (Yale, U. Queensland). Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 11. articles Community structure and metabolism through reconstruction of microbial genomes from the environment Gene W. Tyson1, Jarrod Chapman3,4, Philip Hugenholtz1, Eric E. Allen1, Rachna J. Ram1, Paul M. Richardson4, Victor V. Solovyev4, Edward M. Rubin4, Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2 1 Department of Environmental Science, Policy and Management, 2Department of Earth and Planetary Sciences, and 3Department of Physics, University of California, Berkeley, California 94720, USA 4 Joint Genome Institute, Walnut Creek, California 94598, USA RESEARCH ARTICLE ........................................................................................................................................................................................................................... Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme J. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3 environment. 2 2 3 Environmental Genome Shotgun Sequencing of the Sargasso Sea Aaron L. Halpern, Doug Rusch, Jonathan A. Eisen, Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3 The study of microbial evolution and ecology has been revolutio- fluorescence3in situ hybridization Anthony H. Knap,6 biofilms Derrick E. Fouts, Samuel Levy,2 (FISH) revealed that all nized by DNA sequencing and analysis1–3. However, isolates have contained mixtures of bacteria (Leptospirillum, Sulfobacillus and, in Michael W. Lomas,6 Ken Nealson,5 Owen White,3 and other been the main source of sequence data, and only a small fraction of a few cases, Acidimicrobium) and1archaea (Ferroplasma 6 Jeremy Peterson,3 Thermoplasmatales). The genome of one microorganisms have been cultivated4–6. Consequently, focus has members of theJeff Hoffman, Rachel Parsons, of these shifted towards the analysis of uncultivated microorganisms via archaea, Ferroplasma acidarmanus fer1, isolated fromRogers,4 Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui the Richmond 5 cloning of conserved genes and genome fragments directly from mine, has been sequenced previously (http://www.jgi.doe.gov/JGI_ Hamilton O. Smith1 the environment7–9. To date, only a small fraction of genes have been microbial/html/ferroplasma/ferro_homepage.html). Slides for UC Davis EVE161 Course biofilm (Fig.Jonathan Eisen Winter 2014 was recovered from individual environments, limiting the analysis of A pink Taught by 1a) typical of AMD communities chlorococcus, tha photosynthetic bio Surface water were collected ab from three sites o February 2003. A lected aboard the S station S” in May are indicated on F S1; sampling prot one expedition to was extracted from genomic libraries w 2 to 6 kb were m prepared plasmid both ends to!11 provi
  • 12. Shotgun metagenomics shotgun sequence Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !12
  • 13. articles Community structure and metabolism through reconstruction of microbial genomes from the environment Gene W. Tyson1, Jarrod Chapman3,4, Philip Hugenholtz1, Eric E. Allen1, Rachna J. Ram1, Paul M. Richardson4, Victor V. Solovyev4, Edward M. Rubin4, Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2 1 Department of Environmental Science, Policy and Management, 2Department of Earth and Planetary Sciences, and 3Department of Physics, University of California, Berkeley, California 94720, USA 4 Joint Genome Institute, Walnut Creek, California 94598, USA ........................................................................................................................................................................................................................... Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment. The study of microbial evolution and ecology has been revolutio- fluorescence in situ hybridization (FISH) revealed that all biofilms nized by DNA sequencing and analysis1–3. However, isolates have contained mixtures of bacteria (Leptospirillum, Sulfobacillus and, in Slides for UC Davis a small fraction of a few cases, Acidimicrobium) and archaea been the main source of sequence data, and onlyEVE161 Course Taught by Jonathan Eisen Winter 2014 (Ferroplasma and other
  • 14. is internally self consistent, with 97.2% of end pairs from fer1. We designate uncultured Ferroplasma species distinct from the same Acid Mine Drainage 2004 the appropriate orientation and separation, as this as Ferroplasma type II. The dominance of this organism type clone assembled with was unexpected before the genomic analysis. We assigned the (tracking and chimaericto scaffolds expected for a low rate of mispairing error roughly 3£ coverage, high GþC(474 scaffolds Leptospirillum group III on the basis of rRNA markers up to 31 kb, totalling 2.66 Mb). Comparison of these scaffolds with clones). those assigned to Leptospirillum group II indicates significant sequence divergence and only locally conserved gene order, conThe first step in assignment of scaffolds to organism types was to The first step in assignment of scaffolds to organism types was to Figure 1 The pink biofilm. a, Photograph of the biofilm in the Richmond mine (hand included for scale). b, FISH image of a. Probes targeting bacteria (EUBmix; fluorescein isothiocyanate (green)) and archaea (ARC915; Cy5 (blue)) were used in combination with a probe targeting the Leptospirillum genus (LF655; Cy3 (red)). Overlap of red and green (yellow) indicates Leptospirillum cells and shows the dominance of Leptospirillum. c, Relative microbial abundances determined using quantitative FISH counts. 2 firming that the scaffolds belong to a relatively distant relative of Leptospirillum group II. A partial 16S rRNA gene sequence from Sulfobacillus thermosulfidooxidans was identified in the unassembled reads, suggesting very low coverage of this organism. If any Sulfobacillus scaffolds .2 kb were assembled, they would be grouped with the Leptospirillum group III scaffolds. We compared the 3£ coverage, low GþC scaffolds (580 scaffolds, 4.12 Mb) to the fer1 genome in order to assign them to organism types (Supplementary Fig. S6). Scaffolds with $96% nucleotide identity to fer1 were assigned to an environmental Ferroplasma type I genome (170 scaffolds up to 47 kb in length and comprising 1.48 Mb of sequence). The remaining low-coverage, low GþC scaffolds are tentatively assigned to G-plasma. The largest scaffold in this bin (62 kb) contains the G-plasma 16S rRNA gene. The 410 scaffolds assigned to G-plasma comprise 2.65 Mb of sequence. A partial 16S rRNA gene sequence from A-plasma was identified in the unassembled reads, suggesting low coverage of this organism. Any scaffolds from A-plasma .2 kb would be included in the G-plasma bin. Although eukaryotes are present in the AMD system, they were in low abundance in the biofilm studied. So far, no scaffolds from eukaryotes have been detected. As independent evidence that the Leptospirillum group II and Ferroplasma type II genomes are nearly complete, we located a full complement of transfer RNA synthetases in each genome data set. An almost complete set of these genes was also recovered from Leptospirillum group III. The G-plasma bin contains more than a full set of tRNA synthetases, consistent with inclusion of some A-plasma scaffolds. In addition, we established that the Leptospirillum group II, Leptospirillum group III, Ferroplasma type I, Ferroplasma type II and G-plasma bins contained only one set of rRNA genes. NATURE | doi:10.1038/nature02340 | www.nature.com/nature Slides for UC Davis EVE1612004 Nature Publishing Jonathan Eisen Winter 2014 © Course Taught by Group le u c re u th w L u th se fi L S a a g 4 ty !14 id
  • 15. Methods • Plasmid library • Shotgun sequence • Assembled • Binning ! GC content ! Coverage • Potential “nearly” complete genomes ! Leptospirillum group II ! Ferroplasma type II ! Evidence for completeness: housekeeping genes • Annotation, population analysis Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 16. Leptospirillum group II genome may reflect strong recent environmental selection for this genome type or be the result of a founder effect. undergone homologous recombination. It is unlikely that the reads with pattern transitions represent variants that arose simply through accumulation of nucleotide polymorphisms, because this Figure 2 Segment of the Ferroplasma type II composite genome. a, A 4.2-kb region showing annotated open reading frames (ORFs) (red), average read depth (blue line), and the number of nucleotide polymorphisms in the ‘green’ and ‘yellow’ relative to the ‘pink’ strain (green and yellow lines) averaged over 60-bp windows. Black dots indicate recombination sites. b, Alignment of individual reads (XYG) for a 96-bp region in a. Letters indicate nucleotide polymorphisms in the green and yellow strains relative to the pink strain. Note the recombinant sequence (XYG48207). c, Evolutionary distance tree inferred from the ancestral strain sequences in a. NATURE | doi:10.1038/nature02340 | www.nature.com/nature ©2004 Nature Publishing Group Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 3
  • 17. tein-coding sequences yields a very large number of genomic limited evidence integrases). We c genes in order to system is large e transfer. Identical plasma and Ferro contexts), sugges both lineages. Sim with identical ad genomic contexts indicating that a groups. Metabolic analy Figure 3 Schematic diagram illustrating a diversity of mosaic genome types within the Ferroplasma type II population that are inferred to have arisen by homologous recombination between three closely related ancestral genome types (pink, yellow and green). 4 We recovered nea members of the group II are par phylum member the metabolic pa Ferroplasma type plementary Infor logical roles of understanding of The acidophi that grow in th ©2004 Nature Publishing Group Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 18. genes needed to fix carbon by means of the Calvin–Benson– Bassham cycle (using type II ribulose 1,5-bisphosphate carboxylase–oxygenase). All genomes recovered from the AMD system fixation via the reductive acetyl coenzyme A (acetyl-CoA) pathway by some, or all, organisms. Given the large number of ABC-type sugar and amino acid transporters encoded in the Ferroplasma type Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs identified in the Leptospirillum group II genome (63% with putative assigned function) and 1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation, Slides for UC Davis EVE161 Course pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate Taught by Jonathan Eisen Winter 2014 carboxylase–oxygenase. THF, tetrahydrofolate. !18
  • 19. RESEARCH ARTICLE Environmental Genome Shotgun Sequencing of the Sargasso Sea J. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3 Aaron L. Halpern,2 Doug Rusch,2 Jonathan A. Eisen,3 Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3 Derrick E. Fouts,3 Samuel Levy,2 Anthony H. Knap,6 Michael W. Lomas,6 Ken Nealson,5 Owen White,3 Jeremy Peterson,3 Jeff Hoffman,1 Rachel Parsons,6 Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui Rogers,4 Hamilton O. Smith1 We have applied “whole-genome shotgun sequencing” to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs http://www.sciencemag.org/content/304/5667/66 of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These data are estimated to derive from at least 1800 genomic species based on sequence relatedness, including 148 previously unknown bacterial phylotypes. We have identified over 1.2 million previously unknown genes represented in these samples,by Jonathanmore than 782 new Slides for UC Davis EVE161 Course Taught including Eisen Winter 2014 chlorococcus, th photosynthetic bi Surface wate were collected a from three sites February 2003. A lected aboard the station S” in Ma are indicated on S1; sampling pro one expedition to was extracted fro genomic libraries 2 to 6 kb were prepared plasmid both ends to prov Craig Venter Sc nology Center on ers (Applied Bi Whole-genome ra the Weatherbird II 4) produced 1.66 in length, for a tota microbial DNA se sequences were g !19
  • 20. two groups of scaffolds representing two disSargasso Sea related to the published tinct strains closely at depths ranging from 4ϫ to 36ϫ (indicated with shading in table S3 with nine depicted in Fig. 1. MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003. The station locations are overlain with their respective identifications. Note the elevated levels of chlorophyll (green color shades) around station 3, which are not present around stations 11 and 13. http://www.sciencemag.org/content/304/5667/66 Fig. 2. Gene conserSlides vation among closely for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !20
  • 21. • Sampling Protocols. Sampling on the RV Weatherbird II was done as follows: Seawater (170 liters) from stations 11 and 13 was directly filtered through a 0.8µm Supor membrane disc filter (Pall Life Sciences) followed in series by a 0.22µm Supor membrane disc filter (Pall Life Sciences). The sample from station 3 was pumped into a 250 L carboy prior to being filtered through the impact filters. The length of time from collection of the sample until the end of the filtration step was approximately one hour. Filters were placed in 5ml of sucrose lysis buffer (20mM EDTA, 400mM NaCl, 0.75 M Sucrose, 50mM Tris-HCl, pH 9.0) and stored in liquid nitrogen on the Weatherbird then placed at -80oC until DNA extractions were done. Alternatively seawater (340 liters) was collected from 5 meters below the surface into a carboy then filtered through a 0.8µm Supor membrane disc filter (Pall Life Sciences), followed by concentration to 1 liter using a Pellicon tangential flow filtration system (Millipore) with a 0.1µm Durapore VVPP cartridge (Millipore); again the total time for the filtration and concentration was approximately one hour. Cells were pelleted at 10,000 rpm, 4oC for 30 minutes. ). The impact filters and the retentate from the TFF were then handled as described above. The carboys, tubing and filter systems were cleaned with a 10% hydrochloric acid wash prior to each leg of the sampling. Any of the sampling equipment (tubing, etc.) that could reasonably be soaked was soaked in an acid bath is for at least 24 hours. Sampling carboys were filled with the acid wash and “soaked” for at least 24 hours as well. All acid washed items were subsequently rinsed very liberally with Milli-Q water. A liberal Milli-Q water rinse was also conducted between samples on the same leg. All spigots from the carboys were covered with a ziploc bag until needed. Tubing was stored in clean ziploc bags until needed. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 22. Sample preparation. The impact filters were cut into quarters and placed in individual 50 ml conical tubes. TE buffer (5 ml, pH 8) containing 150 ug/ml lysozyme was added to each tube. The tubes were incubated at 37oC for 2 hours. SDS was added to 0.1% and the samples were then put through three freeze/thaw cycles. The lysate was then treated with Proteinase K (100 ug/mL) for one hour at 55oC followed by three aqueous phenol extractions and one extraction with phenol/ chloroform. The supernatant was then precipitated with two volumes of 100% ethanol and the DNA pellet washed with 70% ethanol. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 23. DNA preparation. DNA was randomly sheared, end-polished with consecutive BAL31 nuclease and T4 DNA polymerase treatments, and size-selected by electrophoresis on 1% low-melting-point agarose. After ligation to Bst XI adapters (Invitrogen, catalog no.! N408-18), DNA was purified by three rounds of gel electrophoresis to remove excess adapters, and the fragments, now with 3'-CACA overhangs, were inserted into Bst XIlinearized plasmid vector with 3'-TGTG overhangs. Fragments were cloned in a mediumcopy pBR322 derivative. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 24. Sequence assembly. With default parameter settings, the highly covered genome sequences would have been treated as repetitive DNA by the Celera Assembler. Since the Celera Assembler constructs scaffolds only from a backbone of sequence heuristically classified as unique, these organisms would not have been eligible for scaffolding and would have been absent from the final assembly. However, by tuning the threshold parameter for classifying unique sequence, we were able to compensate for the apparent repetitiveness of these genomic regions, and scaffold them appropriately. This was accomplished by identifying the most deeply assembling, obviously nonrepetitive contigs in an initial run of the assembler (in this case, the strong assemblies at 21-36x coverage which were identified as gene-rich Burkholderia-like and plasmid scaffolds), and using a value slightly below the calculated “A-statistic” (an empirical uniqueness measure within the Assembler) of these contigs as the threshold parameter in a subsequent run. This allows the deep contigs to be treated as unique sequence, when they would otherwise be labeled as repetitive. At the other end of the spectrum, rare organisms in the sample have been sampled by sequencing only to a shallow depth of coverage. Routine assembly would not have considered the small fragment overlap based assemblies with shallow coverage as an eligible basis for scaffolding, due to a minimum length requirement of 1000bp, which is typically in place for efficiency. Therefore, in the present use case, the organisms represented by these sequences would not have been ordered and oriented with mate-pairs without adjusting the default minimum length to compensate for the low anticipated coverage depth and assembly length. With this selection of parameters, more suitable to the enivironmental project at hand, we were able to adequately assemble both the dominant and rare species simultaneously. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 25. Methods • Plasmid library • Shotgun sequence • Assembled • No Major Binning • Potential “nearly” complete genomes • Annotation, population analysis, phylogenetic analysis Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 26. e relatively limited depth of serage given the level of diversity ple. genome shotgun (WGS) assembly sited at DDBJ/EMBL/GenBank ect accession AACY00000000, have been deposited in a correeDB trace archive. The version his paper is the first version, 00. Unlike a conventional WGS deposited not just contigs and e unassembled paired singletons singletons in order to accuratediversity in the sample and across the entire sample withabase. and large assemblies. Our ocused on the well-sampled geacterizing scaffolds with at least depth. There were 333 scaffolds 26 contigs and spanning 30.9 his criterion (table S3), accounty 410,000 reads, or 25% of the ly data set. From this set of wellal, we were able to cluster and blies by organism; from the rare ample, we used sequence similarods together with computational obtain both qualitative and quans of genomic and functional diverparticular marine environment. yed several criteria to sort the y pieces into tentative organism nclude depth of coverage, oligo- Fig. 2. Gene conservation among closely related Prochlorococcus. The outermost concentric circle of the diagram depicts the competed genomic sequence of Prochlorococcus marinus MED4 (11). Fragments from environmental sequencing were compared to this completed Prochlorococcus genome and are shown in the inner concentric circles and were given boxed outlines. Genes for the outermost circle have been assigned psuedospectrum colors based on the position of those genes along the chromosome, where genes nearer to the start of the genome are colored in red, and genes nearer to the end of the genome are colored in blue. Fragments from environmental sequencing were subjected to an analysis that identifies conserved gene order between those fragments and the completed Prochlorococcus MED4 genome. Genes on the environmental genome segments that exhibited conserved gene order are colored with the same color assignments as the Prochlorococcus MED4 chromosome. Colored regions on the environmental segments exhibiting color differences from the adjacent outermost concentric circle are the result of conserved gene order with other MED4 regions and probably represent chromosomal rearrangements. Genes that did not exhibit conserved gene order are colored in black. http://www.sciencemag.org/content/304/5667/66 www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 67
  • 27. RESEARCH ARTICLE Fig. 3. Comparison of Sargasso Sea scaffolds to Crenarchaeal clone 4B7. Predicted proteins from 4B7 and the scaffolds showing significant homology to 4B7 by tBLASTx are arrayed in positional order along the x and y axes. Colored boxes represent BLASTp matches scoring at least 25% similarity and with an e value of better than 1e-5. Black vertical and horizontal lines delineate scaffold borders. http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Fig. 4). Oth separated, p nation of sh nomic signa greater dive genomes (9 Discrete continuum scaffolds (21 and 9.35 M single nucl 10,000 base ence of disc the remaini SNP rate ran a length-we We closely alignments and were ab distinct clas related hap creasing th (10), and re homogenou consensus w haplotypes, fold region cus scaffold
  • 28. Fig. 4. Circular diagrams of nine complete megaplasmids. Genes encoded in the forward direction are shown in the outer concentric circle; reverse coding genes are shown in the inner concentric circle. The genes have been given role category assignment and colored accordingly: amino acid biosynthesis, violet; biosynthesis of cofactors, prosthetic groups, and carriers, light blue; cell envelope, light green; cellular processes, red; central intermediary metabolism, brown; DNA metabolism, gold; energy metabolism, light gray; fatty acid and phospholipid metabolism, magenta; protein fate and protein synthesis, pink; purines, pyrimidines, nucleosides, and nucleotides, orange; regulatory functions and signal transduction, olive; transcription, dark green; transport and binding proteins, blue-green; genes with no known homology to other proteins and genes with homology to genes with no known function, white; genes of unknown function, gray; Tick marks are placed on 10-kb intervals. 68 homogenous blend of discrepancies from consensus without any apparent separation haplotypes, such as the Prochlorococcus s fold region (Fig. 5). Indeed, the Prochloroc cus scaffolds display considerable heteroge ity not only at the nucleotide sequence le (Fig. 5) but also at the genomic level, wh multiple scaffolds align with the same regio the MED4 (11) genome but differ due to g or genomic island insertion, deletion, rearran ment events. This observation is consistent w previous findings (12). For instance, scaffo 2221918 and 2223700 share gene synteny w each other and MED4 but differ by the inser of 15 genes of probable phage origin, lik representing an integrated bacteriophage. Th genomic differences are displayed graphic in Fig. 2, where it is evident that up to f conflicting scaffolds can align with the sa region of the MED4 genome. More than 8 of the Prochlorococcus MED4 genome can aligned with Sargasso Sea scaffolds gre than 10 kb; however, there appear to b couple of regions of MED4 that are not rep sented in the 10-kb scaffolds (Fig. 2). larger of these two regions (PMM1187 PMM1277) consists primarily of a gene clu coding for surface polysaccharide biosynthe which may represent a MED4-specific poly charide absent or highly diverged in our S gasso Sea Prochlorococcus bacteria. The he ogeneity of the Prochlorococcus scaffolds sug that the scaffolds are not derived from a sin discrete strain, but instead probably represen conglomerate assembled from a population closely related Prochlorococcus biotypes. The gene complement of the Sargas The heterogeneity of the Sargasso sequen complicates the identification of micro genes. The typical approach for microbial notation, model-based gene finding, relies tirely on training with a subset of manu 2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 29. frames (5). A total of 69,901 novel genes belonging to 15,601 single link clusters were identified. The predicted genes were categorized Table 1. Gene count breakdown by TIGR role category. Gene set includes those found on assemblies from samples 1 to 4 and fragment reads from samples 5 to 7. A more detailed table, separating Weatherbird II samples from the Sorcerer II samples is presented in the SOM (table S4). Note that there are 28,023 genes which were classified in more than one role category. TIGR role category Amino acid biosynthesis Biosynthesis of cofactors, prosthetic groups, and carriers Cell envelope Cellular processes Central intermediary metabolism DNA metabolism Energy metabolism Fatty acid and phospholipid metabolism Mobile and extrachromosomal element functions Protein fate Protein synthesis Purines, pyrimidines, nucleosides, and nucleotides Regulatory functions Signal transduction Transcription Transport and binding proteins Unknown function Miscellaneous Conserved hypothetical Total genes 37,118 25,905 27,883 17,260 13,639 25,346 69,718 18,558 1,061 28,768 48,012 19,912 8,392 4,817 12,756 49,185 38,067 1,864 794,061 Total number of roles assigned 1,242,230 Total number of genes 1,214,207 Fig. 5. Prochlorococcus-related scaffold 2223290 illustra nity of closely related organisms, distinctly nonpunctat global structure of Scaffold 2223290 with respect to asse sequence alignment. Blue segments, contigs; green segm stages of the assembly of fragments into the resulting fragments were initially assembled in several different form the final contig structure. The multiple sequenc homogenous blend of haplotypes, none with sufficie separate assembly. http://www.sciencemag.org/content/304/5667/66 www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 30. d curated genes. With the vast maSargasso sequence in short (less unassociated scaffolds and singleundreds of different organisms, it is o apply this approach. Instead, we n evidence-based gene finder (5). ence in the form of protein alignquences in the bacterial portion of ndant amino acid (nraa) data set sed to determine the most likely e. Likewise, approximate start and s were determined from the boundtes of the alignments and refined to cific start and stop codons. This entified 1,214,207 genes covering B of the total data set. This repreximately an order of magnitude http://www.sciencemag.org/content/304/5667/66 nces than currently archived in the Slides for UC ssProt database (14), which con- Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 RESEA
  • 31. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 32. rRNA phylotyping from metagenomics http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !32
  • 33. Shotgun Sequencing Allows Alternative Anchors (e.g., RecA) http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !33
  • 34. nomic group using the phylogenetic analysis described for rRNA. For example, our data set marker genes, is roughly comparable to the 97% cutoff traditionally used for rRNA. Thus http://www.sciencemag.org/content/304/5667/66 Fig. 6. Phylogenetic diversity of Sargasso Sea sequences using multiple phylogenetic markers. The relative contribution of organisms from different major phylogenetic groups (phylotypes) was measured using multiple phylogenetic markers that have been used previously in phylogenetic studies of prokaryotes: 16S rRNA, RecA, EF-Tu, EF-G, HSP70, and RNA polymerase B (RpoB). The relative proportion of different phylotypes for each sequence (weighted by the depth of coverage of the contigs from which those sequences came) is shown. The phylotype distribution was determined as follows: (i) Sequences in the Sargasso data set corresponding to each of these genes were identified using HMM and BLAST searches. (ii) Phylogenetic analysis was performed for each phylogenetic marker identified in the Sargasso data separately compared with all members of that gene family in all complete genome sequences (only complete genomes were used to control for the differential sampling of these markers in GenBank). (iii) The phylogenetic affinity of each sequence was assigned based on the classification of the nearest neighbor in the phylogenetic tree. Slides for UC Davis RIL 2004 VOL 304 SCIENCE www.sciencemag.org EVE161 Course Taught by Jonathan Eisen Winter 2014 !34
  • 35. method based on fitting the observed depth of coverage to a theoretical model of assembly progress for a sample corresponding to a mix- that a minimum of 12-fold deeper sampling would be required to obtain 95% of the unique sequence. However, these are only lower Table 2. Diversity of ubiquitous single copy protein coding phylogenetic markers. Protein column uses symbols that identify six proteins encoded by exactly one gene in virtually all known bacteria. Sequence ID specifies the GenBank identifier for corresponding E. coli sequence. Ortholog cutoff identifies BLASTx e-value chosen to identify orthologs when querying the E. coli sequence against the complete Sargasso Sea data set. Maximum fragment depth shows the number of reads satisfying the ortholog cutoff at the point along the query for which this value is maximal. Observed “species” shows the number of distinct clusters of reads from the maximum fragment depth column, after grouping reads whose containing assemblies had an overlap of at least 40 bp with Ͼ 94% nucleotide identity (single-link clustering). Singleton “species” shows the number of distinct clusters from the observed “species” column that consist of a single read. Most abundant column shows the fraction of the maximum fragment depth that consists of single largest cluster. Protein Sequence ID Ortholog cutoff AtpD GyrB Hsp70 RecA RpoB TufA NTL01EC03653 NTL01EC03620 NT01EC0015 NTL01EC02639 NTL01EC03885 NTL01EC03262 1e-32 1e-11 1e-31 1e-21 1e-41 1e-41 Max. fragment depth Observed “species” Singleton “species” Most abundant (%) 836 924 812 592 669 597 456 569 515 341 428 397 317 429 394 244 331 307 6 4 4 8 7 3 of se ever nity. resen know scaff cont even SAR cove fold, 21,0 popu uted V key proa men the r isms half men equa colle Table 3. Diversity models based on depth of coverage. Each row correcolumn) in the sample. The thi http://www.sciencemag.org/content/304/5667/66 sponds to an abundance class of organisms. The first column in each a genome expected to be s Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 model “fr(asm)” gives the fraction of the assembly consensus modeled gives the resulting estimat
  • 36. Figure S6. Accumulation curve for rpoB. Observed (black) OTU counts for rpoB (based on the fragment grouping summarized in Table 2), as well as the Chao1-corrected estimate of total species (red; see (3)). Points are mean values of 1000 shufflings of the observed data, while bars show 90% confidence intervals. http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 37. MS 1093857: Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter et al., revised Figure S7. Each point in the figure corresponds to a scaffold from the assembly (restricted to scaffolds > 10kb). Scaffolds were placed in separate panels of the figure according to the most closely related organism as indicated by the BLAST searches described in the text. Within a panel, a scaffold is shown with x coordinate equal to its length, y coordinate equal to its estimated depth of coverage, and color determined by which of 6 k-mer composition clusters it was assigned to. Depth of coverage was estimated as the total base pairs in reads belonging to a given assembly piece divided by the length of the consensus sequence for the piece. K-mer composition clusters were determined by representing each scaffold as a vector of the frequencies of all possible 4mers, considering both the forward and reverse strands of the sequence, and then applying the K-means clustering algorithm. http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 38. Functional Diversity of Proteorhodopsins? http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !38
  • 39. MS 1093857: Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter et al., revised Figure S10. Scaffold 2217664, containing the gene encoding Proteorhodopsin. Genes are colored using color assignments described in Fig. 2, and contig boundaries are indicated with red vertical lines. In this scaffold, rhodopsin is associated with a DNA-directed RNA polymerase, sigma subunit (rpoD) originating in the CFB group. http://www.sciencemag.org/content/304/5667/66 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 40. Binning challenge A B C D E F G T U V W X Y Z Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !40
  • 41. Binning challenge A B C D E F G T U V W X Y Z Best binning method: reference genomes Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !41
  • 42. Glassy Winged Sharpshooter • Feeds on xylem sap • Vector for Pierce’s Disease • Potential bioterror agent • Collaboration with Nancy Moran to sequence symbiont genomes • Funded by NSF • Published in PLOS Biology 2006 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 43. Wu et al. 2006 PLoS Biology 4: e188. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 44. Sharpshooter Shotgun Sequencing shotgun Collaboration with Nancy Moran’s Wu et al. 2006 PLoS Biology 4: e188. lab Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 45. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 46. Binning challenge A B C D E F G No reference genome? What do you do? ! Phylogeny .... Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 T U V W X Y Z
  • 47. CFB Phyla Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 48. Sulcia makes vitamins and cofactors Baumannia makes amino acids Wu et al. 2006 PLoS Biology 4: e188. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 48
  • 49. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 50. Sorcerer II GOS Expedition Figure 1. Sampling Sites Microbial populations were sampled from locations in the order shown. Samples were collected at approximately 200 miles (320 km) intervals along the eastern North American coast through the Gulf of Mexico into the equatorial Pacific. Samples 00 and 01 identify sets of sites sampled as part of the Sargasso Sea pilot study [19]. Samples 27 through 36 were sampled off the Galapagos Islands (see inset). Sites shown in gray were not analyzed as part of this study. doi:10.1371/journal.pbio.0050077.g001 environments as well as a few nonmarine aquatic samples for the pilot Sargasso Sea study, 200 l surface seawater was contrast (Table Eisen Winter 2014 filtered to isolate microorganisms UC Davis EVE161analysis. Taught by Jonathan1). Slides for for metagenomic Course
  • 51. Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees Dongying Wu1, Martin Wu1,4, Aaron Halpern2,3, Douglas B. Rusch2,3, Shibu Yooseph2,3, Marvin Frazier2,3, J. Craig Venter2,3, Jonathan A. Eisen1* 1 Department of Evolution and Ecology, Department of Medical Microbiology and Immunology, University of California Davis Genome Center, University of California Davis, Davis, California, United States of America, 2 The J. Craig Venter Institute, Rockville, Maryland, United States of America, 3 The J. Craig Venter Institute, La Jolla, California, United States of America, 4 University of Virginia, Charlottesville, Virginia, United States of America Abstract Background: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. Methodology/Principal Findings: We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. Conclusions/Significance: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them. Citation: Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011 Editor: Robert Fleischer, Smithsonian Institution National Zoological Park, United States of America Received October 25, 2010; Accepted February 20, 2011; Published March 18, 2011 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Funding: The development and main work on this project was supported by the National Science Foundation via an ‘‘Assembling the Tree of Life’’ grant (number 0228651) to to Jonathan A. Eisen and Naomi Ward. The final work on this project was funded by the Gordon and Betty Moore Foundation (through
  • 52. Stalking the Fourth Domain Figure 1. Phylogenetic tree of the RecA superfamily. All RecA sequences were grouped into clusters using the Lek algorithm. Representatives of each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RecA superfamily are shaded and given a name on the right. Five of the proposed subfamilies contained only GOS sequences at the time of our initial analysis (RecA-like SAR, Phage SAR1, Phage SAR2, Unknown 1 and Unknown 2) and are highlighted by colored shading. As noted on the tree and in the text, sequences from two Archaea that were released after our initial analysis group in the Unknown 2 subfamily. doi:10.1371/journal.pone.0018011.g001 PLoS ONE | www.plosone.org 5 March 2011 | Volume 6 | Issue 3 | e18011 Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
  • 53. Five RecA subfamilies were identified as being novel (i.e., only seen in metagenomic data) in our initial analyses. GOS metagenome assemblies that encode members of these subfamilies were identified and the genes neighboring the novel RecAs were characterized. The neighboring gene descriptions are based on the top BLASTP hits against the NRAA database; taxonomy assignments are based on their closest neighbor in phylogenetic trees built from the top NRAA BLASTP hits. doi:10.1371/journal.pone.0018011.t002 Figure 2. The largest assembly from the GOS data that encodes a novel RecA subfamily member (a representative of subfamily Unknown 2). This GOS assembly (ID 1096627390330) encodes 33 annotated genes plus 16 hypothetical proteins, including several with similarity to known archaeal genes (e.g., DNA primase, translation initiation factor 2, Table 2). The arrow indicates a novel recA homolog from the Unknown 2 subfamily (cluster ID 9). doi:10.1371/journal.pone.0018011.g002 Slides for UC PLoS ONE | www.plosone.org Davis EVE161 Course7Taught by Jonathan Eisen| Winter 2014 3 March 2011 Volume 6 | Issue | e18011
  • 54. Stalking the Fourth Domain Figure 3. Phylogenetic tree of the RpoB superfamily. All RpoB sequences were grouped into clusters using the Lek algorithm. Representatives of each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignment using PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RpoB superfamily are shaded and given a name on the right. The two novel RpoB clades that contain only GOS sequences are highlighted by the colored panels. doi:10.1371/journal.pone.0018011.g003 Methods these 340 sequences were extracted from the European Ribosomal [66] and then Slides forIdentification of deeply-branching ss-rRNA sequences by Jonathan than 90% gaps or with 2014remove UC Davis EVE161 Course Taught RNA databasemore Eisen manually curated toalignment Winter poor columns with