A 3 part talk presented at PAG Asia 2019 in Shenzhen- The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use. Presented by Huan Liu (CNGB), Scott Edmunds (GigaScience) & Stephen Tsui (CUHK). 8th June 2019
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use
1. The Digitalization of Ruili Botanical Garden Project:
Production, Curation and Re-Use
Stephen Kwok-Wing TsuiHuan Liu Scott C Edmunds
2. • 1,100 hectares, in Yunnan Province
• Rich biodiversity
• Living Biobank of China National
GeneBank
Ruili Botanical Garden
Part 1. Why?
Ruili Botanical
Garden
3. • 1st Phase of 10KP
• The 1st digitalized botanical garden
• Show the biodiversity and phyletic evolution and
interactions between environment, ecosystem, and
evolution
• Result of phase 1 is published in GigaScience
• Voucher specimen stored at HCNGB (CNGB Herbarium)
1,093 Samples
1093 Voucher
Specimen
49 Order
137 Family
761 Deep-
sequenced
689 Vascular
Species
54TB Data
DRBG
Result
4. Order
Raw
base(Gb)
Alismatales 66.3873
Apiales 70.0075
Araucariales 74.14
Arecales 68.8318
Asparagales 70.3465
Asterales 67.8382
Brassicales 68.474
Buxales 65.44
Caryophyllales 68.6558
Celastrales 75.8133
Commelinales 65.02
Cornales 76.396
Crossosomatale
s
60.2
Cucurbitales 65.11
Cupressales 73.54
Cyatheales 75.76
Dioscoreales 78.9
Dipsacales 58.6267
Equisetales 67.3
Ericales 68.1109
Fabales 69.9439
Fagales 68.14
Gentianales 70.1155
Gnetales 71.1267
Lamiales 69.3291
Laurales 71.9425
Liliales 71.4133
Magnoliales 69.0988
Malpighiales 68.1842
Malvales 66.2106
Myrtales 70.7924
Oxalidales 68.3533
Pandanales 72.6733
Pinales 61.04
Piperales 63.2533
Poales 69.6407
Polypodiales 68.588
Proteales 69.0733
Ranunculales 67.5644
Rosales 70.0468
Santalales 69.07
Sapindales 70.5628
Saxifragales 70.84
Schizaeales 62.57
Solanales 72.2389
Vitales 65.235
Zingiberales 67.4956
10 10 11 12 16
16
17
18
18
21
21
22
24
36
35
3543
41
50
50
54
64
>10 orders
Zingiberales Alismatales Ranunculales Fagales Apiales Arecales
Malvales Solanales Magnoliales Polypodiales Caryophyllales Myrtales
Ericales Asparagales Gentianales Sapindales Asterales Malpighiales
Lamiales Poales Rosales Fabales
1. 54 Tb of sequencing data with an average sequencing depth of 60X per species;
2. Reference phylogeny was re-constructed with 78 chloroplast genes for molecular
identification and other possible applications
3. Data publicly accessible at CNGBdb; https://db.cngb.org/cnsa
Scan and Visit
CNGBdb
Data
5. Huan Liu et. al., GigaScience, 2019
• Survey data of these 761 samples contributed greatly to the genome sequencing
and assembly
• Assembled chloroplast genome of each sample with annotation
• Selected 78 coding gene to construct phylogenetic tree of Ruili Botanical Garden
to present the biodiversity and phyletic evolution, and interactions between
environment, ecosystem, and evolution
Research and Application
8. EM(ectomycorrhizal)
Nitrogen fixer
AM (arbuscular mycorrhizal)
Future Work
• Phylogenetic and molecular evolutionary of controlling symbiosis establishment
• Digitalize major symbiosis species for different order of trees
• Unravel the genetic factors driving the symbiotic
relationship
• Understand the mechanisms underlying beneficial
plant-microbe interactions
9. Part 2: Dissemination
What is the most useful way to share 54TB of
unassembled sequencing data?
Scott Edmunds
10. The Ruili data challenge
• >1000 specimens & 54TB of raw sequencing data
• 60GB short read (BGISEQ 500), single library – usable?
• High-throughput imaging, building new herbarium
• Version-of-record, but species IDs evolving target
• How can we maximise the discoverability & usability?
11. The approach
• Credit early release of data with Data Note articles
• Have GigaDB repository to bring together and host datasets
• Create individually citeable Datacite DOIs
• Curation team organise structure & metadata (with guidance by peer
reviewers)
• Papers static, but GigaDB entries can be updated and linked (via
metadata & links/popups)
• GigaDB pages allow widgets for interaction
• Approach used for Rice3K & Avian Phylogenomic project
16. Does rich metadata increase discoverability? Testing
with RCT
https://osf.io/wzps8/
17. Added protocols to protocols.io
http://dx.doi.org/10.17504/protocols.io.pzqdp5w
https://www.protocols.io/groups/gigascience-journal
18. Data now available, so can people use it?
B. Purpurea = motherB. Variegata = father
19. Part 3. Genomic Analysis of Bauhinia Species
from the Ruili Botanical Garden Project
TSUI Kwok-Wing Stephen
Professor, School of Biomedical Sciences
Programme Director, MSc in Genomics and Bioinformatics
Associate Director, CUHK-BGI Innovative Institute of Trans-omics
Director, Centre for Microbial Genomics and Proteomics
Director, Hong Kong Bioinformatics Centre
20. Hong Kong Flora Emblem - Bauhinia blakeana
Bauhinia blakeana
(洋紫荊)
Hong Kong Emblem
21. Bauhinia blakeana – A Hybrid Plant
Bauhinia blakeana (洋紫荊, hybrid), Hong Kong Orchid
Tree, is a hybrid between Bauhinia variegata (宮粉羊蹄
甲, male) and Bauhinia purpurea (紅花羊蹄甲, female).
Bauhinia variegata
(宮粉羊蹄甲)
Bauhinia purpurea
(紅花羊蹄甲)
Bauhinia blakeana
(洋紫荊)
22. WGS Data from BGI 10K Project
• Organism: vascular plants Bauhinia
variegata
• Biosample: SAMN08770810
• NCBI number: SRR7121897
• Platform: BGISEQ-500
• Insert size: 200 bp
• Read number: 377,370,160*2
• Total read length: 75.47 Gbp
• Coverage depth: 300 X
23. Statistics of WGS data
• The 27-mer spectrum was
computed
• First peak (Coverage: 63X;
heterozygous)
• Second peak (Coverage: 127X;
homozygous)
• Estimated genome size: 245Mb
• Estimated heterozygous rate:
1.06%
24. Statistics of Bauhinia variegata Genome
Genome size 249 Mbp
Scaffold No. 30,075
Scaffold N50 20,620
Longest contig 292 kbp
Gap size 1.30 Mbp
Gap number 13,033
BUSCO Completeness 92.7%
25. Annotation of the Bauhinia variegata Genome
Gene prediction
• 31,248 genes
• 724 tRNA genes
• 18 rRNA genes
Repetitive sequences
• bases masked: 6,525,846 bp (2.67 %)
• Simple repeats: 4,899,189 bp (2.00 %)
• Low complexity: 1,631,550 bp (0.67 %)
27. Ruili Botanical Garden
In summary
• This project provides insight into the feasibility and technical
requirements for “planetary-scale” projects such as 10KP and the
EBP. 1K+ projects are achievable with current technology
• Current data very usable for gene discovery, plastid, and
mitochondrial assembly.
• Reference genomes difficult with this quality of data, but have
demonstrated species of interest can be studied – very suitable
for short postgrad or postdoc project.
• Species identification and genome improvement (Hi-C) on-going.
Plus root metagenomes – watch this space for updates…
• Use the data – join the project. Help yourself to reprints.
Read the research: https://doi.org/10.1093/gigascience/giz007
Download the data: http://dx.doi.org/10.5524/100502
Take the virtual tour: http://720yunnan.com/tour/a2b8096d43d7226d?scene=scene_d3627cc2a43314d
Notas do Editor
Includes sample metadata (in database only, not DataCite) and cross-species results (gene alignments & trees)
It showed us people in Hong Kong want to know more about this subject if given the opportunity. Working with the universities here to train students with this data, Stephen Tsui got his Masters students at the Chinese University of Hong Kong to already assemble most of this data.