VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
Improving pan-genome annotation using whole genome multiple alignment
1. Raunak Shrestha
27th October 2011
Source:
Angiuoli SV, Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation
using whole genome multiple alignment. BMC Bioinformatics. 2011 Jun 30;12:272.
2. Background
• Describing genetic
diversity of some
organism is difficult on
the basis of a single
reference genome
• Pan-genomes
• greater intra-specific
genetic variation even in
closely related strains
• To aid gene-prediction &
annotation genome
sequence of the some
closely related strains
are required
2
http://en.wikipedia.org/wiki/File:Pan-genome-graphics.png
3. Background
3
Schnoes et. al., 2009
The change in
misannotation over time
in the NR database for
the 37 families
investigated.
4. Mugsy-Annotator (http://mugsy.sf.net)
• Steps:
1. Aligning multiple whole genomes,
2. mapping orthologs among the genomes,
3. identifying annotation anomalies
4
• Objectives :
1) identifying orthologs and
2) Evaluating the quality of
annotated gene
structures in prokaryotic
genomes.
5. Determining Orthologs
• Identifies orthologs on the basis of Whole Genome Alignment
(WGA), sequence position and length of sequence.
• expects one segment per organism in the whole genome
alignment.
• For segmental duplications:
• It will report separate ortholog groups for each copy only if whole
genome alignment identifies orthologous copies in other
genomes
• If not, it will not recognize the duplication and group under a
single ortholog
5
6. Identification of annotation inconsistencies
• Evaluate Start codon, Stop codon and Translation Initiation
Sites (TIS),
6
7. Data set
• Neisseria meningitidis (Nmen) dataset of 20 genomes
• Nmen verA contained 13 genomes
• Nmen verB contained 7 genomes
• Annotation pipeline differs between Nmen verA and Nmen verB
• A genome dataset of other 9 bacterial species from Refseq
database.
7
8. Comparison of the groups of
orthologs for 20 Nmen genomes
• Within the genes reported exclusively by any one method
• intra-genome BLASTP matches predicts most of the genes to be
paralogs (40 % for Mugsy-Annotator & 60% for OrthoMCL)
• Some have functional names that indicate transposases
• Some are hypothetical proteins
• Paper claims that OrthoMCL clusters paralogs and orthologs in a
single group
8
9. Run Time Performance
• Nmen dataset of 20 genomes
• single CPU in ~4 h
• ~2 h for WGA with Mugsy and
• ~2 h for comparing annotations with Mugsy-Annotator
• OrthoMCL consumed ~32 CPU hours
• WGA method is computationally efficient and has a significant
runtime performance advantage over BLAST based OrthoMCL
9
12. improve annotation consistency
• In case of inconsistency
in TIS, Mugsy-Annotator
suggests alternative
gene structures that
improve annotation
consistency
• Strategy -> to look for
the conserved TIS in the
close proximity to the
previously annotated
TIS
12
13. Conclusion
• aids in identifying and comparing gene content across a pan-
genome
• Aids annotation and re-annotation of genes within a pan-
genome rather than in a single genome
• Study demonstrates significant variation in annotation
primarily due to different bioinformatics approaches
available rather than the true biological variation
• Mugsy-Annotator : efficient, accurate method for finding
orthologs within a pan-genome
• Mugsy (WGA approach) is computationally efficient compared
to BLAST-based approaches for finding orthologs
13
14. Critique
• Musgy-Annotator requires pre-predicted annotation
information and is therefore not an independent annotation
tool
• Musgy-Annotator still finds difficult to determine the
segmental duplications and paralogs
• It would have been even better, if the author had measured
the performance of Musgy-Annotator for pan-genomes
dataset with larger evolutionary distance.
14
OrthoMCL: popular BLAST-based clustering method, performs a clustering of Reciprocal Best BLAST (RBB) matches between conceptual translations of genes to identify orthologs.