The Codex of Business Writing Software for Real-World Solutions 2.pptx
Eisen.Csb2009
1. Seeking out the Dark Matter of the
Biological Universe
&
The Need for a Phylogeny Driven
Genomic Encyclopedia
Jonathan A. Eisen
August 11, 2009
CSB 2009
Tuesday, August 11, 2009
8. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, August 11, 2009
9. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, August 11, 2009
10. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, August 11, 2009
11. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Same trend in
Dictyoglomus
Aquificae
Thermudesulfobacteria
Archaea
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, August 11, 2009
12. Need for Tree Guidance Well Established
• Common approach within some eukaryotic
groups
• Many small projects funded to fill in some
bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
Tuesday, August 11, 2009
13. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Eisen, Ward, Chloroflexi
Badger, Wu,
TM7
Deinococcus-Thermus
• Solution I:
Dictyoglomus
Wu, et al. Aquificae
Thermudesulfobacteria
sequence more
Thermotogae
OP1 phyla
OP11
Tuesday, August 11, 2009
15. Bacterial aTOL Project AIMS
• Improve resolution of deep branches in the
bacterial tree
• Launch biological studies of these phyla
• Leverage data for interpreting
environmental surveys
Tuesday, August 11, 2009
19. Proteobacteria
TM6
OS-K
• At least 100 phyla of
Acidobacteria
Termite Group
OP8
bacteria
Nitrospira
Bacteroides
Chlorobi
• Genome sequences are
Fibrobacteres
Marine GroupA mostly from three phyla
WS3
Gemmimonas
Firmicutes • Most phyla with cultured
Fusobacteria
Actinobacteria species are sparsely
OP9
Cyanobacteria
Synergistes
sampled
Deferribacteres
Chrysiogenetes
NKB19 • Lineages with no cultured
Verrucomicrobia
Chlamydia
OP3
taxa even more poorly
Planctomycetes
Spriochaetes sampled
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
• Solution - use tree to really
TM7
Deinococcus-Thermus fill gaps
Dictyoglomus
Aquificae Well sampled phyla
Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, August 11, 2009
21. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify which lineages have type species
available in DSMZ
• Grow > 200 of these and prep. DNA
• Sequence and finish 100
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
Tuesday, August 11, 2009
22. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen)
• Project management (David Bruce, Lynne Goodwin et al)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Libraries and DNA (Eileen Dalin et al)
• Sequencing and closure (Susan Lucas, Alla Lapidus et al.)
• Annotation and data release (Nikos Kyrpides)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Jenna
Morgan, Victor Kunin, Marcel Huntemann, Neil Rawlings, Ian
Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain
Anderson)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
Tuesday, August 11, 2009
23. GEBA Pilot:
Selecting Targets
Tuesday, August 11, 2009
28. B:
Ac
tin
ob
ac
te
B: ria # of Genomes
Am (H
in igh
10
15
20
25
30
35
0
5
an G
Tuesday, August 11, 2009
a C
B: B: er )
Ba Aq ob
ct uif ia
B: ero ica
B: e
D Ch ide
B: e ef lo te
r s
D rri ofl
ef ba e
B: e c xi
B: De B rrib ter
Ep lta : D act es
si Pr ei er
lo o n es
n te oc
Pr ob oc
ot a ci
B: e ct
G B: oba eri
am B F ct a
: ir e
B: m Fu mi ria
a
G P so cut
em ro ba e
t c s
B: ma eo te
ba ri
H tim c a
a t
B: loa ona eri
a
B: Pl nae de
an r te
Th c o s
Phyla
er B: to bia
m S m le
y s
B: od piro ce
es c te
T u h
B: he lfo ae s
rm b te
GEBA Pilot Target List
Th o a s
er de cte
m s ri
u a
A: ove lfo
H n bi
A: alo abu a
A: A b la
M rc ac e
A: et ha te
M han eo ria
et g
ha ob lob
ac i
A: no te
m r
A: The icr ia
Th rm obi
er oc a
m oc
op ci
ro
te
i
29. GEBA Current Status
• >100 in progress
• GEBA 56 (focus of first paper)
– 34 finished genomes
– Released to IMG-GEBA page, JGI-FTP site,
and Genbank
• All data is completely Open for anyone to
use
Tuesday, August 11, 2009
31. Assess Benefits of GEBA
• All genomes have some value
• But what, if any, is the benefit of tree-
guided sequencing over other selection
methods
Tuesday, August 11, 2009
32. Why Increase Taxonomic Coverage?
• Gene discovery
• Annotation, functional prediction
• Metagenomic analysis
• Mechanisms of diversification
• Species phylogeny and classification
Tuesday, August 11, 2009
33. GEBA Lesson 1
rRNA Tree of Life is a Useful Guide
for Genome Core Phylogenetic
Diversity
Tuesday, August 11, 2009
40. GEBA Lesson 2
Phylogenetically Guided Selection
Can Help Many Aspects of Genome
Analysis
Tuesday, August 11, 2009
41. Annotation Improves
• Conversion of hypothetical into conserved
hypotheticals
• Linking distantly related members of
protein families
• Non-homology functional prediction
methods
Tuesday, August 11, 2009
43. Al
ph
ap
ro
Be te
ta o ba
G
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
pr ct
am ot er
m eo ia
ap ba
ro ct
D te er
el ob ia
ta
pr ac
Ep
Tuesday, August 11, 2009
ot te
U si
lo eo ria
nc ba
la np
ct
ss ro er
ifi te ia
ed ob
Pr ac
ot te
eo ria
ba
Cy ct
an er
ob ia
ac
Ch te
ria
la
m
Ac yd
id ia
ob e
Ba act
ct er
er ia
Ac oi
de
tin te
ob s
ac
te
ria
Aq
Pl ui
an fic
ct
om ae
yc
Sp et
iro es
ch
ae
Fi te
rm s
ic
ut
Ch es
lo
ro
U
Phylogenetic Metagenomics
nc fle
la xi
ss Ch
ifi lo
ed ro
bi
Ba
ct
er
ia
frr
tsf
pgk
rplL
rplF
rplP
rplT
rplE
infC
rpsI
rplS
rplA
rplB
rplK
rplC
rpsJ
rplN
rplD
rplM
rpsE
rpsS
rpsB
rpsK
rpsC
rpoB
rpsM
pyrG
nusA
dnaG
rpmA
smpB
44. 16s Says Hyphomonas is in Rhodobacteriales
Badger et al.
2005
Tuesday, August 11, 2009
45. WGT Says Its Related to Caulobacterales
Badger et al.
2005
Tuesday, August 11, 2009
47. GEBA Lesson 3
We have still only scratched the
surface of microbial diversity
Tuesday, August 11, 2009
48. Protein Family Rarefaction Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Tuesday, August 11, 2009
54. 350000
300000
Number of proteins
250000
Total Gene Number
200000
150000
100000
S. agalactiae
Enterobacteriaceae
50000 Actinobacteria
Bacteria from GEBA project
0
0 10 20 30 40 50 60 70 80
Genome Number
Tuesday, August 11, 2009
55. Structural Novelty
• Of the 17000 protein families in the GEBA56,
1800 are novel in sequence (Wu)
• Structural modeling suggests many are structurally
novel too (D'haeseleer)
• 372 being crystallized by the PSI (Kerfeld)
Tuesday, August 11, 2009
56. Within Family Novelty Example:
Transporter Profiles
inorganic ions amino acids, nitro compounds and peptides drugs/ toxins sugars carboxylates nucleosides/ tides, bases siderophores other
700
600
500
Number of transporters
400
300
200
100
0
halut
pedhe
thete
denac
kanko
acife
aliac
chipi
desa7
sanke
capoc
catac
celfl
dyafe
sacvi
sphth
spili
stana
sulde
theac
tsupa
xylce
detpe
haloc
plali
thecu
atopa
crycu
kytse
jonde
slahe
eggle
halmu
desr5
anapr
strro
halbo
sebte
lepbu
actmi
beuca
brafa
conwo
nakmu
krifl
meisi
desba
geoob
thebi
gorbr
meiru
rhoma
bramu
Sebaldella termitidis ATCC 33386 has 2x number of sugar PTS
transporters of any genome
Tuesday, August 11, 2009
62. Proteobacteria
TM6
OS-K
• At least 40 phyla of
Acidobacteria
Termite Group
OP8
bacteria
Nitrospira
Bacteroides
Chlorobi
• Genome sequences are
Fibrobacteres
Marine GroupA mostly from three phyla
WS3
Gemmimonas
Firmicutes • Most phyla with cultured
Fusobacteria
Actinobacteria species are sparsely
OP9
Cyanobacteria
Synergistes
sampled
Deferribacteres
Chrysiogenetes
NKB19 • Lineages with no cultured
Verrucomicrobia
Chlamydia
OP3
taxa even more poorly
Planctomycetes
Spriochaetes sampled
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae Well sampled phyla
Thermudesulfobacteria
Thermotogae Poorly sampled
OP1
OP11
No cultured taxa
Tuesday, August 11, 2009
63. Uncultured Lineages:
Technical Approaches
• Get into culture
• Enrichment cultures
• If abundant in low diversity ecosystems
• Flow sorting
• Microbeads
• Microfluidic sorting
• Single cell amplification
Tuesday, August 11, 2009
64. GEBA Lesson 4
Need Experiments from Across the
Tree of Life too
Tuesday, August 11, 2009
65. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, August 11, 2009
66. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Experimental
WS3
Gemmimonas
Firmicutes
studies are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, August 11, 2009
67. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Experimental
WS3
Gemmimonas
Firmicutes
studies are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some studies
Verrucomicrobia
Chlamydia
OP3
in other phyla
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, August 11, 2009
68. Proteobacteria
TM6
OS-K
Need
Acidobacteria
Termite Group
OP8
experimental
Nitrospira
Bacteroides
Chlorobi
studies from
Fibrobacteres
Marine GroupA
WS3
across the tree
Gemmimonas
Firmicutes too
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, August 11, 2009
70. GEBA Lesson 5
The Importance of Culture
(Collections that is)
Tuesday, August 11, 2009
71. GEBA Biggest Challenge:
Getting DNA
• Getting quality DNA is biggest bottleneck
• Sharing strains is also a bottleneck
• Solution: Beg Borrow and Steal
• DSMZ offered to do for free
• ATCC is doing a small number for a fee
• In discussions with other PCC and other
collections
Tuesday, August 11, 2009
73. Quantification gel of the genomic DNA isolated from Microorganisms
Conexibacter woesei (DSM 14684T)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Lane 1: c(λ-Marker)= 15 ng Lane 9: DSM 18081, Patulibacter minatonensis
Lane 2: c(λ-Marker)= 30 ng Lane 10: DSM 14684, Conexibacter woesei
Lane 3: c(λ-Marker)= 50 ng Lane 11: DSM 11002, Dethiosulfovibrio peptidovorans
Lane 4: DNA Molecular Weight Marker II (Roche Lane 12: DSM 11551, Halogeometricum borinquense
236250) Lane 13: DNA Molecular Weight Marker II (Roche
Lane 5: DSM 13279, Collinsella stercoris 236250)
Lane 6: DSM 43043, Intrasporangium calvum Lane 14: c(λ-Marker)= 125 ng
Lane 7: DSM 18053, Dyadobacter fermentans Lane 15: c(λ-Marker)= 250 ng
Lane 8: DSM 20476, Slackia heliotrinireducens Lane 16: c(λ-Marker)= 500 ng
Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and
Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit
(Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel
Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image).
The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements
yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg).
Tuesday, August 11, 2009
75. SIGS
• The Genomic Standards Consortium
• The GSC is an open-membership working body which
formed in September 2005.
• The goal of this international community is to promote
mechanisms that standardize the description of
genomes and the exchange and integration of
genomic data.
• See http://gensc.org/gc_wiki/index.php/Main_Page
Tuesday, August 11, 2009
77. Additional Lessons
• Completeness matters
• Computational methods need to be more
automated
• Need to limit analyses to subsets of all
available data
• Need for people to help interpret and study
data is increasing not decreasing
• Sequence is just the beginning
• Need to train more students
Tuesday, August 11, 2009