Improving pan-genome annotation using whole genome multiple alignment

•Transferir como PPTX, PDF•

3 gostaram•826 visualizações

Raunak Shrestha

Saúde e medicina

Raunak Shrestha
27th October 2011
Source:
Angiuoli SV, Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation
using whole genome multiple alignment. BMC Bioinformatics. 2011 Jun 30;12:272.

Background
• Describing genetic
diversity of some
organism is difficult on
the basis of a single
reference genome
• Pan-genomes
• greater intra-specific
genetic variation even in
closely related strains
• To aid gene-prediction &
annotation genome
sequence of the some
closely related strains
are required
2
http://en.wikipedia.org/wiki/File:Pan-genome-graphics.png

Background
3
Schnoes et. al., 2009
The change in
misannotation over time
in the NR database for
the 37 families
investigated.

Mugsy-Annotator (http://mugsy.sf.net)
• Steps:
1. Aligning multiple whole genomes,
2. mapping orthologs among the genomes,
3. identifying annotation anomalies
4
• Objectives :
1) identifying orthologs and
2) Evaluating the quality of
annotated gene
structures in prokaryotic
genomes.

Determining Orthologs
• Identifies orthologs on the basis of Whole Genome Alignment
(WGA), sequence position and length of sequence.
• expects one segment per organism in the whole genome
alignment.
• For segmental duplications:
• It will report separate ortholog groups for each copy only if whole
genome alignment identifies orthologous copies in other
genomes
• If not, it will not recognize the duplication and group under a
single ortholog
5

Identification of annotation inconsistencies
• Evaluate Start codon, Stop codon and Translation Initiation
Sites (TIS),
6

Data set
• Neisseria meningitidis (Nmen) dataset of 20 genomes
• Nmen verA contained 13 genomes
• Nmen verB contained 7 genomes
• Annotation pipeline differs between Nmen verA and Nmen verB
• A genome dataset of other 9 bacterial species from Refseq
database.
7

Comparison of the groups of
orthologs for 20 Nmen genomes
• Within the genes reported exclusively by any one method
• intra-genome BLASTP matches predicts most of the genes to be
paralogs (40 % for Mugsy-Annotator & 60% for OrthoMCL)
• Some have functional names that indicate transposases
• Some are hypothetical proteins
• Paper claims that OrthoMCL clusters paralogs and orthologs in a
single group
8

Run Time Performance
• Nmen dataset of 20 genomes
• single CPU in ~4 h
• ~2 h for WGA with Mugsy and
• ~2 h for comparing annotations with Mugsy-Annotator
• OrthoMCL consumed ~32 CPU hours
• WGA method is computationally efficient and has a significant
runtime performance advantage over BLAST based OrthoMCL
9

Consistencyof annotatedgenestructures in several
speciespan-genomes as reportedby Mugsy-Annotator
11

improve annotation consistency
• In case of inconsistency
in TIS, Mugsy-Annotator
suggests alternative
gene structures that
improve annotation
consistency
• Strategy -> to look for
the conserved TIS in the
close proximity to the
previously annotated
TIS
12

Conclusion
• aids in identifying and comparing gene content across a pan-
genome
• Aids annotation and re-annotation of genes within a pan-
genome rather than in a single genome
• Study demonstrates significant variation in annotation
primarily due to different bioinformatics approaches
available rather than the true biological variation
• Mugsy-Annotator : efficient, accurate method for finding
orthologs within a pan-genome
• Mugsy (WGA approach) is computationally efficient compared
to BLAST-based approaches for finding orthologs
13

Critique
• Musgy-Annotator requires pre-predicted annotation
information and is therefore not an independent annotation
tool
• Musgy-Annotator still finds difficult to determine the
segmental duplications and paralogs
• It would have been even better, if the author had measured
the performance of Musgy-Annotator for pan-genomes
dataset with larger evolutionary distance.
14

Mais conteúdo relacionado

Semelhante a Improving pan-genome annotation using whole genome multiple alignment

Apollo Workshop AGS2017 IntroductionMonica Munoz-Torres

An introduction to promoter prediction and analysisSarbesh D. Dangol

31961.pptDrParamAB

Transposable elements of AgavoideaeKate Hertweck

2015 beacon-metagenome-tutorialc.titus.brown

Rna lecturenishulpu

Molecular basis of evolution and softwares used in phylogenetic tree contructionUdayBhanushali111

Curation Introduction - Apollo WorkshopMonica Munoz-Torres

Unison: Enabling easy, rapid, and comprehensive proteomic miningReece Hart

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar

The Human Genome Project - Part IIIhhalhaddad

Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres

Prediction of protein functionLars Juhl Jensen

Comparative genomicsAthira RG

Data mining pptsai krishna

Hertweck bbl2012Kate Hertweck

genomic comparison comsats university of science information technology

Bls 303 l1.phylogeneticsBruno Mmassy

Bioinformatics.pptxshewaademu

Comparative genomicsJajati Keshari Nayak

Semelhante a Improving pan-genome annotation using whole genome multiple alignment (20)

Apollo Workshop AGS2017 Introduction

An introduction to promoter prediction and analysis

31961.ppt

Transposable elements of Agavoideae

2015 beacon-metagenome-tutorial

Rna lecture

Molecular basis of evolution and softwares used in phylogenetic tree contruction

Curation Introduction - Apollo Workshop

Unison: Enabling easy, rapid, and comprehensive proteomic mining

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...

The Human Genome Project - Part III

Apollo annotation guidelines for i5k projects Diaphorina citri

Prediction of protein function

Comparative genomics

Data mining ppt

Hertweck bbl2012

genomic comparison

Bls 303 l1.phylogenetics

Bioinformatics.pptx

Comparative genomics

Mais de Raunak Shrestha

A multidimensional strategy to detect polypharmacological targets in the abse...Raunak Shrestha

High-resolution genome-wide copy-number analysis suggests a monoclonal origin...Raunak Shrestha

Cross-species gene normalization by species inferenceRaunak Shrestha

In silico reconstruction of viral genomes from small RNAs improves virus-deri...Raunak Shrestha

DNA barcode sequence identification incorporating taxonomic hierarchy and wit...Raunak Shrestha

Proteins with complex architecture as potential targets for drug design: a ca...Raunak Shrestha

Systems Biology Approaches to CancerRaunak Shrestha

An Integrated Approach to Uncover Drivers of CancerRaunak Shrestha

Personalized Oncology Through Integrative High-Throughput Sequencing:Raunak Shrestha

Genomic architecture and evolution of clear cell renal cell carcinomas define...Raunak Shrestha

Emerging landscape of oncogenic signatures across human cancers Raunak Shrestha

Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...Raunak Shrestha

Mais de Raunak Shrestha (12)

A multidimensional strategy to detect polypharmacological targets in the abse...

High-resolution genome-wide copy-number analysis suggests a monoclonal origin...

Cross-species gene normalization by species inference

In silico reconstruction of viral genomes from small RNAs improves virus-deri...

DNA barcode sequence identification incorporating taxonomic hierarchy and wit...

Proteins with complex architecture as potential targets for drug design: a ca...

Systems Biology Approaches to Cancer

An Integrated Approach to Uncover Drivers of Cancer

Personalized Oncology Through Integrative High-Throughput Sequencing:

Genomic architecture and evolution of clear cell renal cell carcinomas define...

Emerging landscape of oncogenic signatures across human cancers

Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns...

Último

Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...astropune

Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha

Call Girls Kochi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableDipal Arora

Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiAlinaDevecerski

Russian Call Girls in Jaipur Riya WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh

Call Girls Ooty Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...jageshsingh5554

Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Lucknow Call girls - 8800925952 - 24x7 service with hotel roomdiscovermytutordmt

All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal

Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual NeedsGfnyt

Call Girls Aurangabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Dipal Arora

Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora

Chandrapur Call girls 8617370543 Provides all area service COD availableDipal Arora

VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY

Improving pan-genome annotation using whole genome multiple alignment

1. Raunak Shrestha 27th October 2011 Source: Angiuoli SV, Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics. 2011 Jun 30;12:272.

2. Background • Describing genetic diversity of some organism is difficult on the basis of a single reference genome • Pan-genomes • greater intra-specific genetic variation even in closely related strains • To aid gene-prediction & annotation genome sequence of the some closely related strains are required 2 http://en.wikipedia.org/wiki/File:Pan-genome-graphics.png

3. Background 3 Schnoes et. al., 2009 The change in misannotation over time in the NR database for the 37 families investigated.

4. Mugsy-Annotator (http://mugsy.sf.net) • Steps: 1. Aligning multiple whole genomes, 2. mapping orthologs among the genomes, 3. identifying annotation anomalies 4 • Objectives : 1) identifying orthologs and 2) Evaluating the quality of annotated gene structures in prokaryotic genomes.

5. Determining Orthologs • Identifies orthologs on the basis of Whole Genome Alignment (WGA), sequence position and length of sequence. • expects one segment per organism in the whole genome alignment. • For segmental duplications: • It will report separate ortholog groups for each copy only if whole genome alignment identifies orthologous copies in other genomes • If not, it will not recognize the duplication and group under a single ortholog 5

6. Identification of annotation inconsistencies • Evaluate Start codon, Stop codon and Translation Initiation Sites (TIS), 6

7. Data set • Neisseria meningitidis (Nmen) dataset of 20 genomes • Nmen verA contained 13 genomes • Nmen verB contained 7 genomes • Annotation pipeline differs between Nmen verA and Nmen verB • A genome dataset of other 9 bacterial species from Refseq database. 7

8. Comparison of the groups of orthologs for 20 Nmen genomes • Within the genes reported exclusively by any one method • intra-genome BLASTP matches predicts most of the genes to be paralogs (40 % for Mugsy-Annotator & 60% for OrthoMCL) • Some have functional names that indicate transposases • Some are hypothetical proteins • Paper claims that OrthoMCL clusters paralogs and orthologs in a single group 8

9. Run Time Performance • Nmen dataset of 20 genomes • single CPU in ~4 h • ~2 h for WGA with Mugsy and • ~2 h for comparing annotations with Mugsy-Annotator • OrthoMCL consumed ~32 CPU hours • WGA method is computationally efficient and has a significant runtime performance advantage over BLAST based OrthoMCL 9

10. 10

11. Consistencyof annotatedgenestructures in several speciespan-genomes as reportedby Mugsy-Annotator 11

12. improve annotation consistency • In case of inconsistency in TIS, Mugsy-Annotator suggests alternative gene structures that improve annotation consistency • Strategy -> to look for the conserved TIS in the close proximity to the previously annotated TIS 12

13. Conclusion • aids in identifying and comparing gene content across a pan- genome • Aids annotation and re-annotation of genes within a pan- genome rather than in a single genome • Study demonstrates significant variation in annotation primarily due to different bioinformatics approaches available rather than the true biological variation • Mugsy-Annotator : efficient, accurate method for finding orthologs within a pan-genome • Mugsy (WGA approach) is computationally efficient compared to BLAST-based approaches for finding orthologs 13

14. Critique • Musgy-Annotator requires pre-predicted annotation information and is therefore not an independent annotation tool • Musgy-Annotator still finds difficult to determine the segmental duplications and paralogs • It would have been even better, if the author had measured the performance of Musgy-Annotator for pan-genomes dataset with larger evolutionary distance. 14

15. 15 QUESTIONS ?

Notas do Editor

OrthoMCL: popular BLAST-based clustering method, performs a clustering of Reciprocal Best BLAST (RBB) matches between conceptual translations of genes to identify orthologs.

Improving pan-genome annotation using whole genome multiple alignment

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Improving pan-genome annotation using whole genome multiple alignment

Semelhante a Improving pan-genome annotation using whole genome multiple alignment (20)

Mais de Raunak Shrestha

Mais de Raunak Shrestha (12)

Último

Último (20)

Improving pan-genome annotation using whole genome multiple alignment

Notas do Editor