SlideShare uma empresa Scribd logo
1 de 25
Comparative genomics to the rescue
How complete is your plant genome sequence?
Klaas Vandepoele
Ghent University - VIB, Belgium
5th Plant Genomics & Gene Editing Congress
16-17 March 2017, Amsterdam
plaza_genomics
Plant genome sequencing is booming
 New and faster sequencing technologies
 Generating a draft genome sequence has become cheap
 The number of published plant genomes grows exponentially
2
>150 published
plant genomes Credit: Usadel lab
From read data to knowledge
The basic genome analysis toolkit:
 Genome assembly
 Structural annotation shows
where genes are
 Functional annotation tells you
what genes do
 Data availability of genome
sequence & gene annotation
Faciliate biological discovery
3
Yet another “draft” plant genome
 What is the quality and completeness of plant genome sequences?
4
The N50 denotes that 50% of
the total assembly length is
contained in scaffolds of
length N50 or longer
Quality of a genome: what to expect?
5
Quality of a genome: what to expect?
6
Transcript mapping
Transcript mapping: tools & settings
7
• The transcript mapping score is
stable (standard deviation < 1%)
in bin sizes of at least 3,000 ESTs
• Challenging to have a correct
estimation of the assembled gene
space. Influence of
 mapping tools
 coverage cutoffs
Intron-aware transcript mapping using GMAP
Transcript mapping: library size
8
• If the libraries contain more than
10,000 ESTs, the EST mapping
scores for A. thaliana libraries
converge to the same value as for
subsampling bins of >10,000 ESTs.
• RNA-Seq de novo assembled
transcripts can lead to the
 over-estimation of the
expected number of genes
(allelic transcripts, splice
variants and fragmented
transcripts)
 under-estimation due to the
failure to reconstruct low-
abundant transcripts
Estimating gene space completeness along an
evolutionary scale
9
evolutionary conserved
Species-specific
expected gene spaces influenced by
within-species
diversity
between-
species
diversity
CEGMA
248, single copy
BUSCO
952, single copy
PLAZA CoreGF
7k gene families
Transcript mapping
Species tree of life
PLAZA CoreGF
3k gene families
Biases in the expected “conserved” gene space
10
A diverse set of genome quality metrics
11
Evaluation
12
Evaluation
13
• Arabidopsis and Oryza have
consistent high Completeness
scores
• Over-estimation of
completeness by CEGMA
• Lolium: discrepancy between
genome vs gene set
completeness
Improving Lolium gene annotation
14
2 Transcriptomes, aligned with GenomeThreader
de novo assembly
Orthology-guided assembly
300k
80k
4 Proteomes, aligned with GenomeThreader
Brachypodium distachyon
Oryza sativa
Sorghum bicolor
Zea mays
16k
11k
11k
10k
2 Annotation sets
Byrne et al. (2015)
ab initio predictions
28k
41k
# loci
EVM consensus 39.967
Haas et al. (2008), Gremme et al. (2005), Ruttink et al. (2013)
Updated completeness scores Lolium
15 Completeness score (%)
75 80 90 9585 100
Byrne et al. (2015)
EVM consensus
>900 new coreGF loci found in the genome!
CEGMA
248, single copy
BUSCO
952, single copy
PLAZA CoreGF
7k gene families
Transcript mapping
Species tree of life
PLAZA CoreGF
3k gene families
Evaluation
16
• Arabidopsis and Oryza have
consistent high Completeness
scores
• Over-estimation of
completeness by CEGMA
• Lolium: discrepancy between
genome vs gene set
completeness
• Cicer: EST mapping score
much lower than BUSCO
geneset or coreGF score
More than half of the unmapped
sequences are of non-plant origin
(mostly from Fusarium oxysporum)
Proper taxonomic binning of
expected transcripts is essential!
Guidelines to assess the quality of a new genome sequence
1. Estimate genome size using different methods
2. Define and evaluate the expected gene space based on
transcript mapping AND evolutionary conservation
 Cleaning and mapping transcripts
 Prefer coreGF/BUSCO over CEGMA to model expected conserved
genes
3. Large differences in completeness scores between genome
assembly / annotated gene set can point to gene prediction
issues
4. To perform cross-species genome comparisons, focus on
genomes with complete and contiguous assemblies
17
Veeckman, E., Ruttink, T., and Vandepoele, K. (2016). Are We There Yet? Reliably Estimating
the Completeness of Plant Genome Sequences. Plant Cell 28, 1759-1768.
• Gene family annotation and phylogenetic trees
• Traceable functional annotation (GO/InterPro/MapMan)
• Colinearity and synteny
• Integrative gene orthology inference
 Highly integrative platform to translate knowledge from model to crop
• 55 species/genomes
• Highly scalable design
• Web-based mobile user interface
• Integrated Workbench for analysis
of sets of genes
http://bioinformatics.psb.ugent.be/plaza/
Coverage gene function information
19 blue = primary GO; green = GO projection (orthology + homology)
Gene descriptions
Gene Ontology (Biological Process)
TRAPID: analysis of non-model transcriptomes
20
 Homology-based ORFs detection incl. frameshift correction
 Gene family assignment
 Functional annotation based on Gene Ontology and/or protein domains
 Two reference databases: PLAZA 2.5 and OrthoMCL-DB
 Applications
 Sugar cane, wheat, Crocus sativa, conifers, Coffea arabica, Prunus
 Dinoflagellates, diatoms, worms, fishes
SRA Viridiplantae
Transcriptomic
Van Bel, … & Vandepoele, Genome Biology 2013
Drought Tolerance Conferred to Sugarcane by Association with
Gluconacetobacter diazotrophicus: A Transcriptomic View of
Hormone Pathways
21 Vargas et al., PLoS One 2014
Further reading
Veeckman, E., Ruttink, T., and Vandepoele, K. (2016). Are We There Yet? Reliably Estimating
the Completeness of Plant Genome Sequences. Plant Cell 28, 1759-1768.
Proost, S., Van Bel, M. … and Vandepoele, K. (2015). PLAZA 3.0: an access point for plant
comparative genomics. Nucleic Acids Research Jan;43(Database issue):D974-81
Vandepoele K (2017) A Guide to the PLAZA 3.0 Plant Comparative Genomic Database.
In Methods Mol Biol, Vol 1533, pp 183-200
Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., and Vandepoele, K. (2013).
TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-
Seq transcriptomes. Genome Biol 14, R134.
plaza_genomics
Code freely available to efficiently compute coreGF completeness score
Want a free PDF? Check out PLAZA poster
PLAZA 3.0 user statistics (2016)
 >11,000 users (+13%), 370K page views (+30%)
 Users from >95 countries
 Intensively used by
 academia (>400 citations)
 industry
24
Text-mining
Orthology
PLAZA Workbench
25
 Create a custom gene set (~experiment) using gene identifiers or BLAST
 External/internal gene IDs (e.g. AN3, AT5G28640, GRMZM2G180246_T01)
 BLAST interface can be used to map sequence data from a non-model species to a
reference species present in PLAZA
 A toolbox is available to analyze user-defined gene sets (~experiment)
 2,132 registered users processed 11,875 Workbench experiments

Mais conteúdo relacionado

Mais procurados

Genome editing in Plants with crispr/cas9
Genome editing in Plants with crispr/cas9Genome editing in Plants with crispr/cas9
Genome editing in Plants with crispr/cas9Shalu Jain, PhD
 
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...Chi-Ping Day
 
The new frontier of genome engineering
The new frontier of genome engineeringThe new frontier of genome engineering
The new frontier of genome engineeringPricyBark0
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009Sean Davis
 
CRISPR, cas9 in plant disease resistance
CRISPR, cas9 in plant disease resistance CRISPR, cas9 in plant disease resistance
CRISPR, cas9 in plant disease resistance N.H. Shankar Reddy
 
Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Ek Han Tan
 
RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...
RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...
RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...Noakhali Science and Technology University
 
Gene editing 1
Gene editing 1Gene editing 1
Gene editing 1ajayveeru
 
181214 Bioinformática vegetal
181214 Bioinformática vegetal181214 Bioinformática vegetal
181214 Bioinformática vegetalM. Gonzalo Claros
 
Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.Stephen Lieb
 
CRISPR – a novel tool for genome editing
CRISPR – a novel tool for genome editingCRISPR – a novel tool for genome editing
CRISPR – a novel tool for genome editingVigneshVikki10
 
Applications and potential of genome editing tools in vegetable breeding
Applications and potential of genome editing tools in vegetable breedingApplications and potential of genome editing tools in vegetable breeding
Applications and potential of genome editing tools in vegetable breedingNeha Verma
 
Genome Sequencing in Finger Millet
Genome Sequencing in Finger MilletGenome Sequencing in Finger Millet
Genome Sequencing in Finger MilletVivek Suthediya
 

Mais procurados (20)

mixotrophs
mixotrophs mixotrophs
mixotrophs
 
CRISPR /Cas9
CRISPR /Cas9CRISPR /Cas9
CRISPR /Cas9
 
Genome editing in Plants with crispr/cas9
Genome editing in Plants with crispr/cas9Genome editing in Plants with crispr/cas9
Genome editing in Plants with crispr/cas9
 
Editing genome
Editing genomeEditing genome
Editing genome
 
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
 
The new frontier of genome engineering
The new frontier of genome engineeringThe new frontier of genome engineering
The new frontier of genome engineering
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
CRISPR, cas9 in plant disease resistance
CRISPR, cas9 in plant disease resistance CRISPR, cas9 in plant disease resistance
CRISPR, cas9 in plant disease resistance
 
Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9
 
RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...
RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...
RNA-guided genome editing tool CRISPR-Cas9:Its Applications and Achievements ...
 
Gene editing 1
Gene editing 1Gene editing 1
Gene editing 1
 
181214 Bioinformática vegetal
181214 Bioinformática vegetal181214 Bioinformática vegetal
181214 Bioinformática vegetal
 
Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.Who owns CRISPR? - An update on the Interference.
Who owns CRISPR? - An update on the Interference.
 
MSU Transgenic and Genome Editing Facility
MSU Transgenic and Genome Editing FacilityMSU Transgenic and Genome Editing Facility
MSU Transgenic and Genome Editing Facility
 
CRISPR – a novel tool for genome editing
CRISPR – a novel tool for genome editingCRISPR – a novel tool for genome editing
CRISPR – a novel tool for genome editing
 
Dr amar Sharma
Dr amar SharmaDr amar Sharma
Dr amar Sharma
 
Crispr cas9
Crispr cas9Crispr cas9
Crispr cas9
 
Applications and potential of genome editing tools in vegetable breeding
Applications and potential of genome editing tools in vegetable breedingApplications and potential of genome editing tools in vegetable breeding
Applications and potential of genome editing tools in vegetable breeding
 
Genome Sequencing in Finger Millet
Genome Sequencing in Finger MilletGenome Sequencing in Finger Millet
Genome Sequencing in Finger Millet
 
ownership nbt.3393
ownership nbt.3393ownership nbt.3393
ownership nbt.3393
 

Destaque

Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopMonica Munoz-Torres
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesChung-Tsai Su
 
A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...Alexander Jueterbock
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopMonica Munoz-Torres
 
ACMG 2017 The Data Behind the Results - Bioinformatics for Clinicians
ACMG 2017 The Data Behind the Results - Bioinformatics for CliniciansACMG 2017 The Data Behind the Results - Bioinformatics for Clinicians
ACMG 2017 The Data Behind the Results - Bioinformatics for CliniciansErica Ramos
 
Global Education and Skills Forum 2017 - Educating Global Citizens
Global Education and Skills Forum  2017 -  Educating Global CitizensGlobal Education and Skills Forum  2017 -  Educating Global Citizens
Global Education and Skills Forum 2017 - Educating Global CitizensEduSkills OECD
 
Diagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounderDiagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounderMJ Cachón Yáñez
 
B2B Marketing and The Power of Twitter
B2B Marketing and The Power of TwitterB2B Marketing and The Power of Twitter
B2B Marketing and The Power of TwitterSteve Yanor
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
Club Brugge richt 3 nieuwe nv’s tegelijk op
Club Brugge richt 3 nieuwe nv’s tegelijk opClub Brugge richt 3 nieuwe nv’s tegelijk op
Club Brugge richt 3 nieuwe nv’s tegelijk opThierry Debels
 
Keynote Speaker Seminarium. Congreso Internacional de Retail 2017
Keynote Speaker Seminarium. Congreso Internacional de Retail 2017Keynote Speaker Seminarium. Congreso Internacional de Retail 2017
Keynote Speaker Seminarium. Congreso Internacional de Retail 2017Rodolfo Cremer, MBA, Ph.D.
 
Understanding P2P
Understanding P2PUnderstanding P2P
Understanding P2Purbanlabs
 
Farewell to the leftist working clas
Farewell to the leftist working clasFarewell to the leftist working clas
Farewell to the leftist working clasPeter Achterberg
 

Destaque (20)

Mapping the Workplace Genome
Mapping the Workplace GenomeMapping the Workplace Genome
Mapping the Workplace Genome
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Next Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and OpportunitiesNext Generation Sequencing Informatics - Challenges and Opportunities
Next Generation Sequencing Informatics - Challenges and Opportunities
 
A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...A decade into Next Generation Sequencing on marine non-model organisms: curre...
A decade into Next Generation Sequencing on marine non-model organisms: curre...
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo Workshop
 
Nutrigenomics
NutrigenomicsNutrigenomics
Nutrigenomics
 
ACMG 2017 The Data Behind the Results - Bioinformatics for Clinicians
ACMG 2017 The Data Behind the Results - Bioinformatics for CliniciansACMG 2017 The Data Behind the Results - Bioinformatics for Clinicians
ACMG 2017 The Data Behind the Results - Bioinformatics for Clinicians
 
Meetup talk
Meetup talkMeetup talk
Meetup talk
 
Global Education and Skills Forum 2017 - Educating Global Citizens
Global Education and Skills Forum  2017 -  Educating Global CitizensGlobal Education and Skills Forum  2017 -  Educating Global Citizens
Global Education and Skills Forum 2017 - Educating Global Citizens
 
Diagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounderDiagnóstico SEO Técnico con Herramientas #TheInbounder
Diagnóstico SEO Técnico con Herramientas #TheInbounder
 
B2B Marketing and The Power of Twitter
B2B Marketing and The Power of TwitterB2B Marketing and The Power of Twitter
B2B Marketing and The Power of Twitter
 
Depreciation
DepreciationDepreciation
Depreciation
 
Online votinh
Online votinh Online votinh
Online votinh
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
How to write Ruby extensions with Crystal
How to write Ruby extensions with CrystalHow to write Ruby extensions with Crystal
How to write Ruby extensions with Crystal
 
Club Brugge richt 3 nieuwe nv’s tegelijk op
Club Brugge richt 3 nieuwe nv’s tegelijk opClub Brugge richt 3 nieuwe nv’s tegelijk op
Club Brugge richt 3 nieuwe nv’s tegelijk op
 
Keynote Speaker Seminarium. Congreso Internacional de Retail 2017
Keynote Speaker Seminarium. Congreso Internacional de Retail 2017Keynote Speaker Seminarium. Congreso Internacional de Retail 2017
Keynote Speaker Seminarium. Congreso Internacional de Retail 2017
 
Understanding P2P
Understanding P2PUnderstanding P2P
Understanding P2P
 
Farewell to the leftist working clas
Farewell to the leftist working clasFarewell to the leftist working clas
Farewell to the leftist working clas
 

Semelhante a Comparative genomics to the rescue: How complete is your plant genome sequence?

Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationKiranKm11
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by SequencingSenthil Natesan
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGedifewGebrie
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012Kate Hertweck
 
Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)EdizonJambormias2
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalDr Anjani Kumar
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKeywan Hassani-Pak
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014Ek_Kul
 

Semelhante a Comparative genomics to the rescue: How complete is your plant genome sequence? (20)

Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome exploration
 
Pangenomics.pptx
Pangenomics.pptxPangenomics.pptx
Pangenomics.pptx
 
Iplant pag
Iplant pagIplant pag
Iplant pag
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptx
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012
 
Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)Spatial transcriptomics in Plant (for Agriculture)
Spatial transcriptomics in Plant (for Agriculture)
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx final
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network Miner
 
Seminar on crispr
Seminar on crisprSeminar on crispr
Seminar on crispr
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 

Mais de Klaas Vandepoele

Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...Klaas Vandepoele
 
TF2Network: unravelling gene regulatory networks and transcription factor fun...
TF2Network: unravelling gene regulatory networks and transcription factor fun...TF2Network: unravelling gene regulatory networks and transcription factor fun...
TF2Network: unravelling gene regulatory networks and transcription factor fun...Klaas Vandepoele
 
Inferring gene functions and regulatory interactions in plants using differen...
Inferring gene functions and regulatory interactions in plants using differen...Inferring gene functions and regulatory interactions in plants using differen...
Inferring gene functions and regulatory interactions in plants using differen...Klaas Vandepoele
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformKlaas Vandepoele
 
Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesKlaas Vandepoele
 
The complexity of plant genomes
The complexity of plant genomesThe complexity of plant genomes
The complexity of plant genomesKlaas Vandepoele
 

Mais de Klaas Vandepoele (6)

Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...
 
TF2Network: unravelling gene regulatory networks and transcription factor fun...
TF2Network: unravelling gene regulatory networks and transcription factor fun...TF2Network: unravelling gene regulatory networks and transcription factor fun...
TF2Network: unravelling gene regulatory networks and transcription factor fun...
 
Inferring gene functions and regulatory interactions in plants using differen...
Inferring gene functions and regulatory interactions in plants using differen...Inferring gene functions and regulatory interactions in plants using differen...
Inferring gene functions and regulatory interactions in plants using differen...
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomes
 
The complexity of plant genomes
The complexity of plant genomesThe complexity of plant genomes
The complexity of plant genomes
 

Último

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cherry
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Cherry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxCherry
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCherry
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxCherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptxCherry
 

Último (20)

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Early Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdfEarly Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdf
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 

Comparative genomics to the rescue: How complete is your plant genome sequence?

  • 1. Comparative genomics to the rescue How complete is your plant genome sequence? Klaas Vandepoele Ghent University - VIB, Belgium 5th Plant Genomics & Gene Editing Congress 16-17 March 2017, Amsterdam plaza_genomics
  • 2. Plant genome sequencing is booming  New and faster sequencing technologies  Generating a draft genome sequence has become cheap  The number of published plant genomes grows exponentially 2 >150 published plant genomes Credit: Usadel lab
  • 3. From read data to knowledge The basic genome analysis toolkit:  Genome assembly  Structural annotation shows where genes are  Functional annotation tells you what genes do  Data availability of genome sequence & gene annotation Faciliate biological discovery 3
  • 4. Yet another “draft” plant genome  What is the quality and completeness of plant genome sequences? 4 The N50 denotes that 50% of the total assembly length is contained in scaffolds of length N50 or longer
  • 5. Quality of a genome: what to expect? 5
  • 6. Quality of a genome: what to expect? 6 Transcript mapping
  • 7. Transcript mapping: tools & settings 7 • The transcript mapping score is stable (standard deviation < 1%) in bin sizes of at least 3,000 ESTs • Challenging to have a correct estimation of the assembled gene space. Influence of  mapping tools  coverage cutoffs Intron-aware transcript mapping using GMAP
  • 8. Transcript mapping: library size 8 • If the libraries contain more than 10,000 ESTs, the EST mapping scores for A. thaliana libraries converge to the same value as for subsampling bins of >10,000 ESTs. • RNA-Seq de novo assembled transcripts can lead to the  over-estimation of the expected number of genes (allelic transcripts, splice variants and fragmented transcripts)  under-estimation due to the failure to reconstruct low- abundant transcripts
  • 9. Estimating gene space completeness along an evolutionary scale 9 evolutionary conserved Species-specific expected gene spaces influenced by within-species diversity between- species diversity CEGMA 248, single copy BUSCO 952, single copy PLAZA CoreGF 7k gene families Transcript mapping Species tree of life PLAZA CoreGF 3k gene families
  • 10. Biases in the expected “conserved” gene space 10
  • 11. A diverse set of genome quality metrics 11
  • 13. Evaluation 13 • Arabidopsis and Oryza have consistent high Completeness scores • Over-estimation of completeness by CEGMA • Lolium: discrepancy between genome vs gene set completeness
  • 14. Improving Lolium gene annotation 14 2 Transcriptomes, aligned with GenomeThreader de novo assembly Orthology-guided assembly 300k 80k 4 Proteomes, aligned with GenomeThreader Brachypodium distachyon Oryza sativa Sorghum bicolor Zea mays 16k 11k 11k 10k 2 Annotation sets Byrne et al. (2015) ab initio predictions 28k 41k # loci EVM consensus 39.967 Haas et al. (2008), Gremme et al. (2005), Ruttink et al. (2013)
  • 15. Updated completeness scores Lolium 15 Completeness score (%) 75 80 90 9585 100 Byrne et al. (2015) EVM consensus >900 new coreGF loci found in the genome! CEGMA 248, single copy BUSCO 952, single copy PLAZA CoreGF 7k gene families Transcript mapping Species tree of life PLAZA CoreGF 3k gene families
  • 16. Evaluation 16 • Arabidopsis and Oryza have consistent high Completeness scores • Over-estimation of completeness by CEGMA • Lolium: discrepancy between genome vs gene set completeness • Cicer: EST mapping score much lower than BUSCO geneset or coreGF score More than half of the unmapped sequences are of non-plant origin (mostly from Fusarium oxysporum) Proper taxonomic binning of expected transcripts is essential!
  • 17. Guidelines to assess the quality of a new genome sequence 1. Estimate genome size using different methods 2. Define and evaluate the expected gene space based on transcript mapping AND evolutionary conservation  Cleaning and mapping transcripts  Prefer coreGF/BUSCO over CEGMA to model expected conserved genes 3. Large differences in completeness scores between genome assembly / annotated gene set can point to gene prediction issues 4. To perform cross-species genome comparisons, focus on genomes with complete and contiguous assemblies 17 Veeckman, E., Ruttink, T., and Vandepoele, K. (2016). Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences. Plant Cell 28, 1759-1768.
  • 18. • Gene family annotation and phylogenetic trees • Traceable functional annotation (GO/InterPro/MapMan) • Colinearity and synteny • Integrative gene orthology inference  Highly integrative platform to translate knowledge from model to crop • 55 species/genomes • Highly scalable design • Web-based mobile user interface • Integrated Workbench for analysis of sets of genes http://bioinformatics.psb.ugent.be/plaza/
  • 19. Coverage gene function information 19 blue = primary GO; green = GO projection (orthology + homology) Gene descriptions Gene Ontology (Biological Process)
  • 20. TRAPID: analysis of non-model transcriptomes 20  Homology-based ORFs detection incl. frameshift correction  Gene family assignment  Functional annotation based on Gene Ontology and/or protein domains  Two reference databases: PLAZA 2.5 and OrthoMCL-DB  Applications  Sugar cane, wheat, Crocus sativa, conifers, Coffea arabica, Prunus  Dinoflagellates, diatoms, worms, fishes SRA Viridiplantae Transcriptomic Van Bel, … & Vandepoele, Genome Biology 2013
  • 21. Drought Tolerance Conferred to Sugarcane by Association with Gluconacetobacter diazotrophicus: A Transcriptomic View of Hormone Pathways 21 Vargas et al., PLoS One 2014
  • 22. Further reading Veeckman, E., Ruttink, T., and Vandepoele, K. (2016). Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences. Plant Cell 28, 1759-1768. Proost, S., Van Bel, M. … and Vandepoele, K. (2015). PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Research Jan;43(Database issue):D974-81 Vandepoele K (2017) A Guide to the PLAZA 3.0 Plant Comparative Genomic Database. In Methods Mol Biol, Vol 1533, pp 183-200 Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., and Vandepoele, K. (2013). TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA- Seq transcriptomes. Genome Biol 14, R134. plaza_genomics Code freely available to efficiently compute coreGF completeness score Want a free PDF? Check out PLAZA poster
  • 23. PLAZA 3.0 user statistics (2016)  >11,000 users (+13%), 370K page views (+30%)  Users from >95 countries  Intensively used by  academia (>400 citations)  industry
  • 25. PLAZA Workbench 25  Create a custom gene set (~experiment) using gene identifiers or BLAST  External/internal gene IDs (e.g. AN3, AT5G28640, GRMZM2G180246_T01)  BLAST interface can be used to map sequence data from a non-model species to a reference species present in PLAZA  A toolbox is available to analyze user-defined gene sets (~experiment)  2,132 registered users processed 11,875 Workbench experiments