SlideShare uma empresa Scribd logo
1 de 27
The Digitalization of Ruili Botanical Garden Project:
Production, Curation and Re-Use
Stephen Kwok-Wing TsuiHuan Liu Scott C Edmunds
• 1,100 hectares, in Yunnan Province
• Rich biodiversity
• Living Biobank of China National
GeneBank
Ruili Botanical Garden
Part 1. Why?
Ruili Botanical
Garden
• 1st Phase of 10KP
• The 1st digitalized botanical garden
• Show the biodiversity and phyletic evolution and
interactions between environment, ecosystem, and
evolution
• Result of phase 1 is published in GigaScience
• Voucher specimen stored at HCNGB (CNGB Herbarium)
1,093 Samples
1093 Voucher
Specimen
49 Order
137 Family
761 Deep-
sequenced
689 Vascular
Species
54TB Data
DRBG
Result
Order
Raw
base(Gb)
Alismatales 66.3873
Apiales 70.0075
Araucariales 74.14
Arecales 68.8318
Asparagales 70.3465
Asterales 67.8382
Brassicales 68.474
Buxales 65.44
Caryophyllales 68.6558
Celastrales 75.8133
Commelinales 65.02
Cornales 76.396
Crossosomatale
s
60.2
Cucurbitales 65.11
Cupressales 73.54
Cyatheales 75.76
Dioscoreales 78.9
Dipsacales 58.6267
Equisetales 67.3
Ericales 68.1109
Fabales 69.9439
Fagales 68.14
Gentianales 70.1155
Gnetales 71.1267
Lamiales 69.3291
Laurales 71.9425
Liliales 71.4133
Magnoliales 69.0988
Malpighiales 68.1842
Malvales 66.2106
Myrtales 70.7924
Oxalidales 68.3533
Pandanales 72.6733
Pinales 61.04
Piperales 63.2533
Poales 69.6407
Polypodiales 68.588
Proteales 69.0733
Ranunculales 67.5644
Rosales 70.0468
Santalales 69.07
Sapindales 70.5628
Saxifragales 70.84
Schizaeales 62.57
Solanales 72.2389
Vitales 65.235
Zingiberales 67.4956
10 10 11 12 16
16
17
18
18
21
21
22
24
36
35
3543
41
50
50
54
64
>10 orders
Zingiberales Alismatales Ranunculales Fagales Apiales Arecales
Malvales Solanales Magnoliales Polypodiales Caryophyllales Myrtales
Ericales Asparagales Gentianales Sapindales Asterales Malpighiales
Lamiales Poales Rosales Fabales
1. 54 Tb of sequencing data with an average sequencing depth of 60X per species;
2. Reference phylogeny was re-constructed with 78 chloroplast genes for molecular
identification and other possible applications
3. Data publicly accessible at CNGBdb; https://db.cngb.org/cnsa
Scan and Visit
CNGBdb
Data
Huan Liu et. al., GigaScience, 2019
• Survey data of these 761 samples contributed greatly to the genome sequencing
and assembly
• Assembled chloroplast genome of each sample with annotation
• Selected 78 coding gene to construct phylogenetic tree of Ruili Botanical Garden
to present the biodiversity and phyletic evolution, and interactions between
environment, ecosystem, and evolution
Research and Application
Media Coverage
Research Going On
These high quality assembled genomes will be further improved by Hi-C
analysis
Khaya
senegale
nsis
Swietenia
macrophyll
a
Tectona
grandis
(stLFR)
Mesua
ferrea
Ochroma
pyramidale
Dalbergi
a sissoo
Nyssa
yunnan
ensis
Averrhoa
carambola
Clausena
lansium
Alstonia
scholaris
Phrynium
placentariu
m
Dysoxylu
m
binectari
ferum
Total
sequences
59982 23737 13404 61053 399274 102170 288486 69402 38459 79280 127670 167153
Total size
385.25
Mb
294.77 Mb 306.08 Mb 554.39 Mb 2.27 Gb 664.2 Mb 1.47 Gb 500 Mb 310.5 Mb 518.6 Mb 589.8 Mb
746.4
Mb
Maximum
length
11.11
Mb
15.47 Mb 20.77 Mb 7.74 Mb 7.06 Mb 34.1 Mb 17.9 Mb 14.8 Mb 15.9 Mb 9.7 Mb 5.4 Mb 8.8 Mb
N50 length of
contig
44.03 kb 108.96 Kb 88.46 Kb 42.08 Kb 24.32 Kb 143.3 Kb 32.3 Kb 44.8 Kb 39.4 Kb 29.7 Kb 45.4 Kb 24.7 Kb
N50 length of
scaffold
2.44 Mb 5.76 Mb 5.84 Mb 746.22 Kb 316.19 Kb 7.17 Mb
985.6
Kb
2.16 Mb 2.24 Mb 1.01 Mb 610.6 Kb 601 Kb
N90 length of
contig
2.4 Kb 6.97 Kb 15.23 Kb 3.84 Kb 2.68 Kb 7.4 Kb 2.9 Kb 5.4 Kb 5.4 Kb 3.9 Kb 3.1 Kb 2.1 Kb
N90 length of
scaffold
3.2 Kb 8.00 Kb 137.0 Kb 4.69 Kb 3.79 Kb 9.1 Kb 3.6 Kb 2.0 Kb 6.8 Kb 5.1 Kb 3.7 Kb 2.7 Kb
Complete
BUSCOs
95.1% 95.2% 97.5% 93.2% 89.9% 91.0% 87.1% 92.2% 93.6% 92.5% 89.2% 93.9%
EM(ectomycorrhizal)
Nitrogen fixer
AM (arbuscular mycorrhizal)
Future Work
• Phylogenetic and molecular evolutionary of controlling symbiosis establishment
• Digitalize major symbiosis species for different order of trees
• Unravel the genetic factors driving the symbiotic
relationship
• Understand the mechanisms underlying beneficial
plant-microbe interactions
Part 2: Dissemination
What is the most useful way to share 54TB of
unassembled sequencing data?
Scott Edmunds
The Ruili data challenge
• >1000 specimens & 54TB of raw sequencing data
• 60GB short read (BGISEQ 500), single library – usable?
• High-throughput imaging, building new herbarium
• Version-of-record, but species IDs evolving target
• How can we maximise the discoverability & usability?
The approach
• Credit early release of data with Data Note articles
• Have GigaDB repository to bring together and host datasets
• Create individually citeable Datacite DOIs
• Curation team organise structure & metadata (with guidance by peer
reviewers)
• Papers static, but GigaDB entries can be updated and linked (via
metadata & links/popups)
• GigaDB pages allow widgets for interaction
• Approach used for Rice3K & Avian Phylogenomic project
Top level DOI
http://dx.doi.org/10.5524/100502
Genomic data DOIs
http://dx.doi.org/10.5524/101701
Imaging files
Chloroplast sequence
Digitized images
Link to NCBI
bioproject/raw data in SRA
Imaging data DOIs
http://dx.doi.org/10.5524/101294
GSC compliant sample
attributes
(exact geographic
location restricted
access)
Digitized images
Rich metadata includes
http://dx.doi.org/10.5524/101294
• GSC compliant sample attributes
• Geographic location/restricted access
• Environment (ENV Ontology)
• Herbarium Voucher number
• Phenotypic info (e.g. height)
• Related NCBI accessions
• Genome size & seq volume/coverage
DataCite Metadata (discoverability)
GigaDB Metadata (reusability)
• Authorship/ORCID details
• Relationship to other datasets
• License
• Title/abstract/date
• Keywords
+ schema.org Metadata (discoverability)
Does rich metadata increase discoverability? Testing
with RCT
https://osf.io/wzps8/
Added protocols to protocols.io
http://dx.doi.org/10.17504/protocols.io.pzqdp5w
https://www.protocols.io/groups/gigascience-journal
Data now available, so can people use it?
B. Purpurea = motherB. Variegata = father
Part 3. Genomic Analysis of Bauhinia Species
from the Ruili Botanical Garden Project
TSUI Kwok-Wing Stephen
Professor, School of Biomedical Sciences
Programme Director, MSc in Genomics and Bioinformatics
Associate Director, CUHK-BGI Innovative Institute of Trans-omics
Director, Centre for Microbial Genomics and Proteomics
Director, Hong Kong Bioinformatics Centre
Hong Kong Flora Emblem - Bauhinia blakeana
Bauhinia blakeana
(洋紫荊)
Hong Kong Emblem
Bauhinia blakeana – A Hybrid Plant
Bauhinia blakeana (洋紫荊, hybrid), Hong Kong Orchid
Tree, is a hybrid between Bauhinia variegata (宮粉羊蹄
甲, male) and Bauhinia purpurea (紅花羊蹄甲, female).
Bauhinia variegata
(宮粉羊蹄甲)
Bauhinia purpurea
(紅花羊蹄甲)
Bauhinia blakeana
(洋紫荊)
WGS Data from BGI 10K Project
• Organism: vascular plants Bauhinia
variegata
• Biosample: SAMN08770810
• NCBI number: SRR7121897
• Platform: BGISEQ-500
• Insert size: 200 bp
• Read number: 377,370,160*2
• Total read length: 75.47 Gbp
• Coverage depth: 300 X
Statistics of WGS data
• The 27-mer spectrum was
computed
• First peak (Coverage: 63X;
heterozygous)
• Second peak (Coverage: 127X;
homozygous)
• Estimated genome size: 245Mb
• Estimated heterozygous rate:
1.06%
Statistics of Bauhinia variegata Genome
Genome size 249 Mbp
Scaffold No. 30,075
Scaffold N50 20,620
Longest contig 292 kbp
Gap size 1.30 Mbp
Gap number 13,033
BUSCO Completeness 92.7%
Annotation of the Bauhinia variegata Genome
Gene prediction
• 31,248 genes
• 724 tRNA genes
• 18 rRNA genes
Repetitive sequences
• bases masked: 6,525,846 bp (2.67 %)
• Simple repeats: 4,899,189 bp (2.00 %)
• Low complexity: 1,631,550 bp (0.67 %)
Acknowledgements
CUHK
HS Kwan
Mandy Tang
Timothy Wang
Jamie Kwok
Angel Wan
BGI
Scott Edmunds
Rob L Davidson
Stephen Tong
Bauhinia Genome community (Crowdfunded)
Ruili Botanical Garden
In summary
• This project provides insight into the feasibility and technical
requirements for “planetary-scale” projects such as 10KP and the
EBP. 1K+ projects are achievable with current technology
• Current data very usable for gene discovery, plastid, and
mitochondrial assembly.
• Reference genomes difficult with this quality of data, but have
demonstrated species of interest can be studied – very suitable
for short postgrad or postdoc project.
• Species identification and genome improvement (Hi-C) on-going.
Plus root metagenomes – watch this space for updates…
• Use the data – join the project. Help yourself to reprints.
Read the research: https://doi.org/10.1093/gigascience/giz007
Download the data: http://dx.doi.org/10.5524/100502
Take the virtual tour: http://720yunnan.com/tour/a2b8096d43d7226d?scene=scene_d3627cc2a43314d

Mais conteúdo relacionado

Mais procurados

Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Surya Saha
 
[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes
Mads Albertsen
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Surya Saha
 

Mais procurados (20)

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
 
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008CAMERA Presentation at KNAW ICoMM Colloquium May 2008
CAMERA Presentation at KNAW ICoMM Colloquium May 2008
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han cao
 
HMW-DNA for long-read single-molecule sequencing
HMW-DNA for long-read single-molecule sequencingHMW-DNA for long-read single-molecule sequencing
HMW-DNA for long-read single-molecule sequencing
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
 
Trends In Genomics
Trends In GenomicsTrends In Genomics
Trends In Genomics
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes[2013.09.27] extracting genomes from metagenomes
[2013.09.27] extracting genomes from metagenomes
 
Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
 
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 

Semelhante a PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 

Semelhante a PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use (20)

Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcoding
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
 
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientistsRamil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
Pulse Genomics Comes of Age
Pulse Genomics Comes of AgePulse Genomics Comes of Age
Pulse Genomics Comes of Age
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 

Mais de GigaScience, BGI Hong Kong

Mais de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
Laurie Goodman: Sharing and Reusing Cell Image Data, ASCB/EMBO 2017 Subgroup ...
 

Último

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Último (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 

PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use

  • 1. The Digitalization of Ruili Botanical Garden Project: Production, Curation and Re-Use Stephen Kwok-Wing TsuiHuan Liu Scott C Edmunds
  • 2. • 1,100 hectares, in Yunnan Province • Rich biodiversity • Living Biobank of China National GeneBank Ruili Botanical Garden Part 1. Why? Ruili Botanical Garden
  • 3. • 1st Phase of 10KP • The 1st digitalized botanical garden • Show the biodiversity and phyletic evolution and interactions between environment, ecosystem, and evolution • Result of phase 1 is published in GigaScience • Voucher specimen stored at HCNGB (CNGB Herbarium) 1,093 Samples 1093 Voucher Specimen 49 Order 137 Family 761 Deep- sequenced 689 Vascular Species 54TB Data DRBG Result
  • 4. Order Raw base(Gb) Alismatales 66.3873 Apiales 70.0075 Araucariales 74.14 Arecales 68.8318 Asparagales 70.3465 Asterales 67.8382 Brassicales 68.474 Buxales 65.44 Caryophyllales 68.6558 Celastrales 75.8133 Commelinales 65.02 Cornales 76.396 Crossosomatale s 60.2 Cucurbitales 65.11 Cupressales 73.54 Cyatheales 75.76 Dioscoreales 78.9 Dipsacales 58.6267 Equisetales 67.3 Ericales 68.1109 Fabales 69.9439 Fagales 68.14 Gentianales 70.1155 Gnetales 71.1267 Lamiales 69.3291 Laurales 71.9425 Liliales 71.4133 Magnoliales 69.0988 Malpighiales 68.1842 Malvales 66.2106 Myrtales 70.7924 Oxalidales 68.3533 Pandanales 72.6733 Pinales 61.04 Piperales 63.2533 Poales 69.6407 Polypodiales 68.588 Proteales 69.0733 Ranunculales 67.5644 Rosales 70.0468 Santalales 69.07 Sapindales 70.5628 Saxifragales 70.84 Schizaeales 62.57 Solanales 72.2389 Vitales 65.235 Zingiberales 67.4956 10 10 11 12 16 16 17 18 18 21 21 22 24 36 35 3543 41 50 50 54 64 >10 orders Zingiberales Alismatales Ranunculales Fagales Apiales Arecales Malvales Solanales Magnoliales Polypodiales Caryophyllales Myrtales Ericales Asparagales Gentianales Sapindales Asterales Malpighiales Lamiales Poales Rosales Fabales 1. 54 Tb of sequencing data with an average sequencing depth of 60X per species; 2. Reference phylogeny was re-constructed with 78 chloroplast genes for molecular identification and other possible applications 3. Data publicly accessible at CNGBdb; https://db.cngb.org/cnsa Scan and Visit CNGBdb Data
  • 5. Huan Liu et. al., GigaScience, 2019 • Survey data of these 761 samples contributed greatly to the genome sequencing and assembly • Assembled chloroplast genome of each sample with annotation • Selected 78 coding gene to construct phylogenetic tree of Ruili Botanical Garden to present the biodiversity and phyletic evolution, and interactions between environment, ecosystem, and evolution Research and Application
  • 7. Research Going On These high quality assembled genomes will be further improved by Hi-C analysis Khaya senegale nsis Swietenia macrophyll a Tectona grandis (stLFR) Mesua ferrea Ochroma pyramidale Dalbergi a sissoo Nyssa yunnan ensis Averrhoa carambola Clausena lansium Alstonia scholaris Phrynium placentariu m Dysoxylu m binectari ferum Total sequences 59982 23737 13404 61053 399274 102170 288486 69402 38459 79280 127670 167153 Total size 385.25 Mb 294.77 Mb 306.08 Mb 554.39 Mb 2.27 Gb 664.2 Mb 1.47 Gb 500 Mb 310.5 Mb 518.6 Mb 589.8 Mb 746.4 Mb Maximum length 11.11 Mb 15.47 Mb 20.77 Mb 7.74 Mb 7.06 Mb 34.1 Mb 17.9 Mb 14.8 Mb 15.9 Mb 9.7 Mb 5.4 Mb 8.8 Mb N50 length of contig 44.03 kb 108.96 Kb 88.46 Kb 42.08 Kb 24.32 Kb 143.3 Kb 32.3 Kb 44.8 Kb 39.4 Kb 29.7 Kb 45.4 Kb 24.7 Kb N50 length of scaffold 2.44 Mb 5.76 Mb 5.84 Mb 746.22 Kb 316.19 Kb 7.17 Mb 985.6 Kb 2.16 Mb 2.24 Mb 1.01 Mb 610.6 Kb 601 Kb N90 length of contig 2.4 Kb 6.97 Kb 15.23 Kb 3.84 Kb 2.68 Kb 7.4 Kb 2.9 Kb 5.4 Kb 5.4 Kb 3.9 Kb 3.1 Kb 2.1 Kb N90 length of scaffold 3.2 Kb 8.00 Kb 137.0 Kb 4.69 Kb 3.79 Kb 9.1 Kb 3.6 Kb 2.0 Kb 6.8 Kb 5.1 Kb 3.7 Kb 2.7 Kb Complete BUSCOs 95.1% 95.2% 97.5% 93.2% 89.9% 91.0% 87.1% 92.2% 93.6% 92.5% 89.2% 93.9%
  • 8. EM(ectomycorrhizal) Nitrogen fixer AM (arbuscular mycorrhizal) Future Work • Phylogenetic and molecular evolutionary of controlling symbiosis establishment • Digitalize major symbiosis species for different order of trees • Unravel the genetic factors driving the symbiotic relationship • Understand the mechanisms underlying beneficial plant-microbe interactions
  • 9. Part 2: Dissemination What is the most useful way to share 54TB of unassembled sequencing data? Scott Edmunds
  • 10. The Ruili data challenge • >1000 specimens & 54TB of raw sequencing data • 60GB short read (BGISEQ 500), single library – usable? • High-throughput imaging, building new herbarium • Version-of-record, but species IDs evolving target • How can we maximise the discoverability & usability?
  • 11. The approach • Credit early release of data with Data Note articles • Have GigaDB repository to bring together and host datasets • Create individually citeable Datacite DOIs • Curation team organise structure & metadata (with guidance by peer reviewers) • Papers static, but GigaDB entries can be updated and linked (via metadata & links/popups) • GigaDB pages allow widgets for interaction • Approach used for Rice3K & Avian Phylogenomic project
  • 13. Genomic data DOIs http://dx.doi.org/10.5524/101701 Imaging files Chloroplast sequence Digitized images Link to NCBI bioproject/raw data in SRA
  • 14. Imaging data DOIs http://dx.doi.org/10.5524/101294 GSC compliant sample attributes (exact geographic location restricted access) Digitized images
  • 15. Rich metadata includes http://dx.doi.org/10.5524/101294 • GSC compliant sample attributes • Geographic location/restricted access • Environment (ENV Ontology) • Herbarium Voucher number • Phenotypic info (e.g. height) • Related NCBI accessions • Genome size & seq volume/coverage DataCite Metadata (discoverability) GigaDB Metadata (reusability) • Authorship/ORCID details • Relationship to other datasets • License • Title/abstract/date • Keywords + schema.org Metadata (discoverability)
  • 16. Does rich metadata increase discoverability? Testing with RCT https://osf.io/wzps8/
  • 17. Added protocols to protocols.io http://dx.doi.org/10.17504/protocols.io.pzqdp5w https://www.protocols.io/groups/gigascience-journal
  • 18. Data now available, so can people use it? B. Purpurea = motherB. Variegata = father
  • 19. Part 3. Genomic Analysis of Bauhinia Species from the Ruili Botanical Garden Project TSUI Kwok-Wing Stephen Professor, School of Biomedical Sciences Programme Director, MSc in Genomics and Bioinformatics Associate Director, CUHK-BGI Innovative Institute of Trans-omics Director, Centre for Microbial Genomics and Proteomics Director, Hong Kong Bioinformatics Centre
  • 20. Hong Kong Flora Emblem - Bauhinia blakeana Bauhinia blakeana (洋紫荊) Hong Kong Emblem
  • 21. Bauhinia blakeana – A Hybrid Plant Bauhinia blakeana (洋紫荊, hybrid), Hong Kong Orchid Tree, is a hybrid between Bauhinia variegata (宮粉羊蹄 甲, male) and Bauhinia purpurea (紅花羊蹄甲, female). Bauhinia variegata (宮粉羊蹄甲) Bauhinia purpurea (紅花羊蹄甲) Bauhinia blakeana (洋紫荊)
  • 22. WGS Data from BGI 10K Project • Organism: vascular plants Bauhinia variegata • Biosample: SAMN08770810 • NCBI number: SRR7121897 • Platform: BGISEQ-500 • Insert size: 200 bp • Read number: 377,370,160*2 • Total read length: 75.47 Gbp • Coverage depth: 300 X
  • 23. Statistics of WGS data • The 27-mer spectrum was computed • First peak (Coverage: 63X; heterozygous) • Second peak (Coverage: 127X; homozygous) • Estimated genome size: 245Mb • Estimated heterozygous rate: 1.06%
  • 24. Statistics of Bauhinia variegata Genome Genome size 249 Mbp Scaffold No. 30,075 Scaffold N50 20,620 Longest contig 292 kbp Gap size 1.30 Mbp Gap number 13,033 BUSCO Completeness 92.7%
  • 25. Annotation of the Bauhinia variegata Genome Gene prediction • 31,248 genes • 724 tRNA genes • 18 rRNA genes Repetitive sequences • bases masked: 6,525,846 bp (2.67 %) • Simple repeats: 4,899,189 bp (2.00 %) • Low complexity: 1,631,550 bp (0.67 %)
  • 26. Acknowledgements CUHK HS Kwan Mandy Tang Timothy Wang Jamie Kwok Angel Wan BGI Scott Edmunds Rob L Davidson Stephen Tong Bauhinia Genome community (Crowdfunded)
  • 27. Ruili Botanical Garden In summary • This project provides insight into the feasibility and technical requirements for “planetary-scale” projects such as 10KP and the EBP. 1K+ projects are achievable with current technology • Current data very usable for gene discovery, plastid, and mitochondrial assembly. • Reference genomes difficult with this quality of data, but have demonstrated species of interest can be studied – very suitable for short postgrad or postdoc project. • Species identification and genome improvement (Hi-C) on-going. Plus root metagenomes – watch this space for updates… • Use the data – join the project. Help yourself to reprints. Read the research: https://doi.org/10.1093/gigascience/giz007 Download the data: http://dx.doi.org/10.5524/100502 Take the virtual tour: http://720yunnan.com/tour/a2b8096d43d7226d?scene=scene_d3627cc2a43314d

Notas do Editor

  1. Includes sample metadata (in database only, not DataCite) and cross-species results (gene alignments & trees)
  2. It showed us people in Hong Kong want to know more about this subject if given the opportunity. Working with the universities here to train students with this data, Stephen Tsui got his Masters students at the Chinese University of Hong Kong to already assemble most of this data.