1. Do Next Generation Sequencing
approaches provide the answer
for DNA barcoding of plants?
Hannah McPherson Marlien van der Merwe
Paul Rymer Mark Edwards Maurizio Rossetto
2. Landscape-level studies of the
Australian flora
Species and population
dynamics
Historical and current
processes shaping
distributions and
assemblages of native
trees
Using a range of molecular
tools, life history traits and
modelling
Reproduced from Crisp et al. 2004
3. Next generation sequencing
Exploring new molecular tools and
approaches
NGS to assemble whole chloroplast
genomes
Use of whole chloroplast as a barcode?
Reproduced from Crisp et al. 2004
4. Technical approach
Full genome shotgun sequencing
Solexa Illumina platform (7Gb/lane)
• 8 labelled paired-end libraries
multiplexed in one lane
• Sub-sampled data from single lanes
No reference sequence
Reproduced from Crisp et al. 2004
5. Sampling
2 locations
Nightcap N
* 20 rainforest tree
species
4 individuals
Sydney S
* pooled from each
species for each
site
Reproduced from Crisp et al. 2004
6. reality check: sampling from
rainforests
Collecting and identifying samples
Preserving leaf material
DNA extraction
9/20 plants successfully sequenced from
both North and South
Reproduced from Crisp et al. 2004
7. questions
Can we bioinformatically assemble chloroplast
genomes from whole genomic shotgun
sequencing without a reference?
What levels of variation do we find across a
broad range of species/families?
Can we mine the data for non-chloroplast
regions too?
Is whole/partial chloroplast genome
sequencing a viable option for barcoding?
Reproduced from Crisp et al. 2004
8. Angiosperm Phylogeny
Model organism tree Atherospermataceae
Monimiaceae
Lauraceae
Proteaceae
Euphorbiaceae
Urticaeae
Malvaceae
Sapindaceae,
Meliaceae
Pittosporaceae
From Angiosperm Phylogeny Website
http://www.mobot.org/MOBOT/Research/APweb/welcome.html
11. assembling chloroplast genomes
Map trimmed reads to whole cp genome of
closest relative available on Genbank (CLC)
• Consensus of N & S
De Novo assembly (CLC and Velvet)
• N & S separately
• Local BLAST / cpDNA genome database
Assemble contigs to N & S reference
(Geneious Pro)
14. NC_008325 Daucus carota
Pittosporum multiflorum
Toona ciliata
Synoum glandulosum
NC_008334 Citrus sinensis
Diploglottis cunninghamii
Brachychiton acerifolius
NC_008641 Gossypium barbadense
Claoxylon australe
NC_010433 Manihot esculenta
NC_004993 Calycanthus floridus var. glaucus
Cinnamomum oliveri
Wilkiea huegelii Aligned with MAFFT
RAXML tree from
Doryphora sassafras
Cipres Sci Gateway
~40Kbp excluding gaps
15. quantifying variation
Map trimmed reads to newly constructed
references (assembled contigs)
SNP detection (CLC)
SNP verification
• exploring data
• Sanger sequencing
Reproduced from Crisp et al. 2004
16. SNP detection
Synoum glandulosum (~140Kbp)
• SNPs between N and S
• ~1 in 550bp
• SNPs within N and S
• N ~1 in 2800bp
• S ~1 in 4500bp
reference
reference
Synoum N
Synoum N S
Synoum S
18. data mining
Chloroplast barcoding genes
Universal cpSSR markers
Other data BLAST
The question of coverage
Reproduced from Crisp et al. 2004
19. Citrus
Toona
Wilkiea
Daucus
Synoum
Claoxylon
Doryphora
Gossypium
Diploglottis
Pittosporum
Calycanthus
Brachychiton
Cinnamomum
rbcL a-f F
rbcL a-r R
rbcL 1F
rbcL 724R
accD 1 F
accD 2 F
accD 3 R
accD 4 R
matK 2.1 F
matK 2.1a F
matK X F
matK 3.2 R
matK 5 R
390 F
1326 R
matK_1F
matK_1R
matK_2F
matK_2R
rpoB 1 F
rpoB 2 F
rpoB 3 R
rpoB 4 R
rpoC1 1 F
rpoC1 2 F
rpoC1 3 R
rpoC1 4 R
ycf5 1 F
ycf5 2 F
ycf5 3 R
ycf5 4 R
ndhJ 1 F
ndhJ 2 F
ndhJ 3 R
ndhJ 4 R
trnH2 F
psbAF R
trn H (GUG) F
psb A R
choroplast barcoding loci
atpF F
atpH R
psbK R
psbI R
trnL-c F
trnL-d R
trnL-e F
trnL-f R
trnL-g F
Vijayan and Tsou 2010
trnL-h R
21. data mining
26S coverage ~35-300
Rpb2 only returned when sequence
available in same family or sister family
coverage ~3-5
Resistance genes – good return but
coverage ~2-10
Leafy – no returns
Reproduced from Crisp et al. 2004
22. data mining
Matches were good
Seem to be in more conserved bits
Single copy nuclear genes present but
low coverage
Some difficulty retrieving regions
depending on available data for BLAST
Reproduced from Crisp et al. 2004
23. viability for barcoding
Large portion of the chloroplast genome
retrieved and easily assembled even
without a reference
Potential for retrieving other regions with
increased coverage/ carefully designed
multiplexing
Reproduced from Crisp et al. 2004
24. to sum up the story so far
We can assemble large portions of chloroplast
genomes from whole genomic shotgun
sequencing even without a reference
Variation is low and varies from family to
family
Single copy nuclear genes present but low
coverage?
Is whole/partial chloroplast genome
sequencing a viable option for barcoding?
Reproduced from Crisp et al. 2004
25. acknowledgements
Friends of the Botanic Gardens Trust
Southern Cross University – Robert
Henry Nicole Rice Stirling Bowen
Evolutionary Ecology team at the Royal
Botanic Gardens Sydney
Emma McIntosh Alexander Dohms
Juelian Siow Ashlee Wakefield
Reproduced from Crisp et al. 2004