SlideShare uma empresa Scribd logo
1 de 48
Genome Annotation
Karan Veer Singh,
Scientist.
NBAGR, Karnal,
India

1
The Genome
•

The genome contains all the biological information required to
build and maintain any given living organism

•

The genome contains the organisms molecular history

•

Decoding the biological information encoded in these molecules
will have enormous impact in our understanding of biology
Genomics

1.

Structural genomics-genetic and physical mapping of genomes.

2.

Functional genomics-analysis of gene function (and non-genes).

3.

Comparative genomics-comparison of genomes across species.


Includes structural and functional genomics.



Evolutionary genomics.
Human Genome Project

The Human genome project promised to
revolutionise medicine and explain every
base of our DNA.
Large MEDICAL GENETICS focus
Identify variation in
the genome that is
disease causing

Determine how individual
genes play a role in health
and disease
Human Genome Project & Functional
Genome

It cost 3 billion dollars and took 10 years to complete (5 less than
initially predicted).
•

Approx 200 Mb still in progress
– Heterochromatin
– Repetitive
Genomics & Genome
annotation


First genome annotation software system was designed in 1995 by Dr.
Owen White with The Institute for Genomic Research that sequenced
and analyzed the first genome of a free-living organism to be decoded,
the bacterium Haemophilus influenzae



It involve assembling of the reads to form contigs then assembling with
a reference genome (reference assembly) or de novo assembly to
obtain the complete genome



Variations such as mutations, SNP, InDels etc can be identified



The genome is then annotated by structural and functional annotation



Mapping Image of Whole genome in an easily understandable manner.
Sequence to Annotation
Input1 to Genome Viewer- Variant
Annotation
Input2 to Genome Viewer- Structural
Annotation
 Structural

2.5.5)

Annotation- AUGUSTUS (version
Input3 to Genome Viewer-Functional
Annotation
Genome Annotation
 The

process of identifying the locations of
genes and the coding regions in a genome to
determe what those genes do

 Finding

and attaching the structural elements
and its related function to each genome
locations

11
Genome Annotation

gene structure prediction

gene function prediction

Identifying elements
(Introns/exons,CDS,stop,start)
in the genome

Attaching biological information
to these elements- eg: for which
12
protein exon will code for
Structural annotation
Structural annotation - identification of genomic elements
Open reading frame and their localisation
gene structure
coding regions
location of regulatory motifs
Functional annotation
Functional annotation- attaching biological
information to genomic elements
biochemical function
biological function
involved regulations
Genome annotation - workflow
Genome sequence

Repeats

Masked or un-masked genome sequence
Structural annotation-Gene finding
nc-RNAs (tRNA, rRNA),
Introns

Protein-coding genes
Functional annotation

View in Genome viewer
16
Genome Repeats & features
Polymorphic between individuals/populations
 Percentage of repetitive sequences in different organisms
Genome
Aedes aegypti

Genome Size
(Mb)

% Repeat
~70

Anopheles gambiae

260

~30

Culex pipiens







1,300

540

~50

Microsatellite
Minisatellite
Tandem repeat
Short tandem repeat
SSR

17
Finding repeats as a preliminary to gene prediction
 Repeat discovery

Homology based approaches
Use RepeatMasker to search the genome and mask the sequence

18
Masked sequence




Repeatmasked sequence is an artificial construction where those regions which
are thought to be repetitive are marked with X’s
Widely used to reduce the overhead of subsequent computational analyses and
to reduce the impact of TE’s in the final annotation set

>my sequence

>my sequence (repeatmasked)

atgagcttcgatagcgatcagctagcgatcaggct
actattggcttctctagactcgtctatctctatta
gctatcatctcgatagcgatcagctagcgatcagg
ctactattggcttcgatagcgatcagctagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctactattggctgatcttaggtcttctga
tcttct

atgagcttcgatagcgatcagctagcgatcaggct
actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxatctcgatagcgatcagctagcgatcagg
ctactattxxxxxxxxxxxxxxxxxxxtagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctxxxxxxxxxxxxxxxxxxxtcttctga
tcttct

Positions/locations are not affected by masking
19
Types of Masking- Hard or Soft?


Sometimes we want to mark up repetitive sequence but not to exclude it from
downstream analyses. This is achieved using a format known as soft-masked

>my sequence

>my sequence (softmasked)

ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC
TACTATTGGCTTCTCTAGACTCGTCTATCTCTATT
AGTATCATCTCGATAGCGATCAGCTAGCGATCAGG
CTACTATTGGCTTCGATAGCGATCAGCTAGCGATC
AGGCTACTATTGGCTTCGATAGCGATCAGCTAGCG
ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA
TCTTCT

ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC
TACTATTggcttctctagactcgtctatctctatt
agtatcATCTCGATAGCGATCAGCTAGCGATCAGG
CTACTATTggcttcgatagcgatcagcTAGCGATC
AGGCTACTATTggcttcgatagcgatcagcTAGCG
ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA
TCTTCT

>my sequence (hardmasked)
atgagcttcgatagcgatcagctagcgatcaggct
actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxatctcgatagcgatcagctagcgatcagg
ctactattxxxxxxxxxxxxxxxxxxxtagcgatc
aggctactattggcttcgatagcgatcagctagcg
atcaggctxxxxxxxxxxxxxxxxxxxtcttctga20
tcttct
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
21
Structural annotation
Identification of genomic elements
 Open

reading frame and their localization
 Coding regions
 Location of regulatory motifs
 Start/Stop
 Splice Sites
 Non coding Regions/RNA’s
 Introns

22
Methods
 Similarity
•

Similarity between sequences which does not necessarily infer any
evolutionary linkage

 Ab- initio prediction
•

Prediction of gene structure from first principles using only the genome
sequence

24
Genefinding
ab initio

similarity

25
ab initio prediction
Genome
Coding
potential
ATG & Stop
codons
Splice sites
ATG & Stop
codons
Coding
potential

Examples:
Genefinder, Augustus,
Glimmer, SNAP, fgenesh

26
Genefinding - similarity
 Use known coding sequence to define coding regions
 EST sequences
 Peptide sequences
Problem to handle fuzzy alignment regions around splice sites
Examples: EST2Genome, exonerate, genewise, Augustus,
Prodigal

Gene-finding - comparative
 Use two or more genomic sequences to predict genes based on
conservation of exon sequences
 Examples: Twinscan and SLAM
27
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
28
Genefinding - non-coding RNA genes

 Non-coding RNA genes can be predicted using knowledge of their
structure or by similarity with known examples

 tRNAscan - uses an HMM and co-variance model for prediction of
tRNA genes

 Rfam - a suite of HMM’s trained against a large number of different
RNA genes

29
Gene-finding omissions

Alternative isoforms
Currently there is no good method for predicting alternative isoforms
Only created where supporting transcript evidence is present
Pseudogenes
Each genome project has a fuzzy definition of pseudogenes
Badly curated/described across the board

Promoters
Rarely a priority for a genome project
Some algorithms exist but usually not integrated into an annotation set

30
Practical- structural annotation
Eukaryotes- AUGUSTUS (gene model)

~/Programs/augustus.2.5.5/bin/augustus --strand=both --genemodel=partial
--singlestrand=true --alternatives-from-evidence=true --alternatives-from-sampling=tr
--progress=true --gff3=on --uniqueGeneId=true --species=magnaporthe_grisea
our_genome.fasta >structural_annotation.gff

Prokaryotes – PRODIGAL (Codon Usage table)
~/Programs/prodigal.v2_60.linux -a protein_file.fa -g 11 –d nucleotide_exon_seq.fa
-f gff -i contigs.fa -o genes_quality.txt -s genes_score.txt -t genome_training_file.txt
31
Structural Annotation-output


Structural Annotation conducted using AUGUSTUS (version 2.5.5),
Magnaporthe_grisea as genome model
Functional
annotation

33
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
34
Functional annotation
Genome
Transcription

Primary Transcript
RNA processing

Processed mRNA

ATG

STOP

m 7G

AAAn

Translation

Polypeptide
Protein folding

Folded protein
Find function
Enzyme activity

Functional activity

A

B
35
Functional annotation
Attaching biological information to genomic elements
Biochemical

function
Biological function
Involved regulation and interactions
Expression

•

Utilize known structural annotation to predicted protein sequence

36
Functional annotation – Homology Based


Predicted Exons/CDS/ORF are searched against the non-redundant
protein database (NCBI, SwissProt) to search for similarities



Visually assess the top 5-10 hits to identify whether these have
been assigned a function



Functions are assigned

37
Functional annotation - Other features
 Other








features which can be determined

Signal peptides
Transmembrane domains
Low complexity regions
Various binding sites, glycosylation sites etc.
Protein Domain
Secretome

See http://expasy.org/tools/ for a good list of possible prediction algorithms

38
Functional annotation - Other features
(Ontologies)
 Use



of ontologies to annotate gene products

Gene Ontology (GO)




Cellular component
Molecular function
Biological process

39
Practical - FUNCTIONAL
ANNOTATION


Homology Based Method



setup blast database for nucleotide/protein



Blasting the genome.fasta for annotations (nucleotide/protein)



sorting for blast minimum E-value (>=0.01) for nucleotide/protein



assigning functions

40
Functional annotation- output

August 2008

Bioinformatics tools for Comparative Genomics
of Vectors

41
Conclusion


Annotation accuracy is dependent available supporting data at the
time of annotation; update information is necessary



Gene predictions will change over time as new data becomes
available (NCBI) that are much similar than previous ones



Functional assignments will change over time as new data becomes
available (characterization of hypothetical proteins)

42
Genome annotation - workflow
Genome sequence

Map repeats

Masked or un-masked
Gene finding- structural annotation
nc-RNAs, Introns

Protein-coding genes
Functional annotation

View in Genome viewer
43
Genome Viewer
The Files that can be visualised
Annotation files
Indel files
Consensus sequence

Comparative Genomics

44
Genome View

August 2008

45
46
47
48
Short Read track

49
Thank You
50

Mais conteúdo relacionado

Mais procurados

Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPuneet Kulyana
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted MutationAmit Kyada
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission ToolsRishikaMaji
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencingGoutham Sarovar
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGPuneet Kulyana
 
Chromosome walking jumping transposon tagging map based cloning
Chromosome walking jumping transposon tagging map based cloningChromosome walking jumping transposon tagging map based cloning
Chromosome walking jumping transposon tagging map based cloningPromila Sheoran
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)Ariful Islam Sagar
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLVidya Kalaivani Rajkumar
 

Mais procurados (20)

Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Genome mapping
Genome mapping Genome mapping
Genome mapping
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencing
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
 
Chromosome walking jumping transposon tagging map based cloning
Chromosome walking jumping transposon tagging map based cloningChromosome walking jumping transposon tagging map based cloning
Chromosome walking jumping transposon tagging map based cloning
 
Est database
Est databaseEst database
Est database
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 

Destaque

BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genomePaul Gardner
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discoveryAmit Ruchi Yadav
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Fine structure of gene
Fine structure of geneFine structure of gene
Fine structure of geneSayali28
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHODMusa Khan
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesGenome Reference Consortium
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 
Introduction to Database Concepts
Introduction to Database ConceptsIntroduction to Database Concepts
Introduction to Database ConceptsRosalyn Lemieux
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomicsUsman Arshad
 
Database system concepts
Database system conceptsDatabase system concepts
Database system conceptsKumar
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 

Destaque (20)

Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Fine structure of gene
Fine structure of geneFine structure of gene
Fine structure of gene
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
DNA SEQUENCING METHOD
DNA SEQUENCING METHODDNA SEQUENCING METHOD
DNA SEQUENCING METHOD
 
Bioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-statBioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-stat
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Introduction to Database Concepts
Introduction to Database ConceptsIntroduction to Database Concepts
Introduction to Database Concepts
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
 
Database system concepts
Database system conceptsDatabase system concepts
Database system concepts
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 

Semelhante a Genome annotation 2013

genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptMohamedHasan816582
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxChijiokeNsofor
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Functional annotation
Functional annotationFunctional annotation
Functional annotationRavi Gandham
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 

Semelhante a Genome annotation 2013 (20)

genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
 
Paper - Muhammad Gulraj
Paper - Muhammad GulrajPaper - Muhammad Gulraj
Paper - Muhammad Gulraj
 
proteome.pdf
proteome.pdfproteome.pdf
proteome.pdf
 
Thesis def
Thesis defThesis def
Thesis def
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Gene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptxGene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
M Sc Project
M Sc ProjectM Sc Project
M Sc Project
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
Genome project.pdf
Genome project.pdfGenome project.pdf
Genome project.pdf
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
NCBI
NCBINCBI
NCBI
 

Mais de Karan Veer Singh

Yak genetic resources of india
Yak genetic resources of indiaYak genetic resources of india
Yak genetic resources of indiaKaran Veer Singh
 
Social groups for awareness
Social groups for awarenessSocial groups for awareness
Social groups for awarenessKaran Veer Singh
 
Access and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic ResourcesAccess and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic ResourcesKaran Veer Singh
 
Indian acts governing different IPRs
Indian acts governing different IPRsIndian acts governing different IPRs
Indian acts governing different IPRsKaran Veer Singh
 
Ip protected invention in the field of biotechnology
Ip protected invention in the field of biotechnologyIp protected invention in the field of biotechnology
Ip protected invention in the field of biotechnologyKaran Veer Singh
 
Patent In Molecular Biology
Patent In Molecular BiologyPatent In Molecular Biology
Patent In Molecular BiologyKaran Veer Singh
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESKaran Veer Singh
 
Semen Banking for conservation of livestock biodiversity
Semen Banking for conservation of  livestock biodiversitySemen Banking for conservation of  livestock biodiversity
Semen Banking for conservation of livestock biodiversityKaran Veer Singh
 
DiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresisDiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresisKaran Veer Singh
 

Mais de Karan Veer Singh (20)

Pcr primer design
Pcr primer designPcr primer design
Pcr primer design
 
Yak genetic resources of india
Yak genetic resources of indiaYak genetic resources of india
Yak genetic resources of india
 
DNA Barcoding
DNA BarcodingDNA Barcoding
DNA Barcoding
 
Microsatellites Markers
Microsatellites  MarkersMicrosatellites  Markers
Microsatellites Markers
 
Tick identification guide
Tick identification guideTick identification guide
Tick identification guide
 
Social groups for awareness
Social groups for awarenessSocial groups for awareness
Social groups for awareness
 
Access and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic ResourcesAccess and Benefit sharing from Genetic Resources
Access and Benefit sharing from Genetic Resources
 
IPR
IPRIPR
IPR
 
Indian acts governing different IPRs
Indian acts governing different IPRsIndian acts governing different IPRs
Indian acts governing different IPRs
 
Ip protected invention in the field of biotechnology
Ip protected invention in the field of biotechnologyIp protected invention in the field of biotechnology
Ip protected invention in the field of biotechnology
 
Patent In Molecular Biology
Patent In Molecular BiologyPatent In Molecular Biology
Patent In Molecular Biology
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Semen Banking for conservation of livestock biodiversity
Semen Banking for conservation of  livestock biodiversitySemen Banking for conservation of  livestock biodiversity
Semen Banking for conservation of livestock biodiversity
 
DiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresisDiGE....2-D gel electrophoresis
DiGE....2-D gel electrophoresis
 
Tecto3
Tecto3Tecto3
Tecto3
 
Paradigm
ParadigmParadigm
Paradigm
 
Electrophoresis
ElectrophoresisElectrophoresis
Electrophoresis
 
Electrophoresis
ElectrophoresisElectrophoresis
Electrophoresis
 

Último

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 

Último (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 

Genome annotation 2013

  • 1. Genome Annotation Karan Veer Singh, Scientist. NBAGR, Karnal, India 1
  • 2. The Genome • The genome contains all the biological information required to build and maintain any given living organism • The genome contains the organisms molecular history • Decoding the biological information encoded in these molecules will have enormous impact in our understanding of biology
  • 3. Genomics 1. Structural genomics-genetic and physical mapping of genomes. 2. Functional genomics-analysis of gene function (and non-genes). 3. Comparative genomics-comparison of genomes across species.  Includes structural and functional genomics.  Evolutionary genomics.
  • 4. Human Genome Project The Human genome project promised to revolutionise medicine and explain every base of our DNA. Large MEDICAL GENETICS focus Identify variation in the genome that is disease causing Determine how individual genes play a role in health and disease
  • 5. Human Genome Project & Functional Genome It cost 3 billion dollars and took 10 years to complete (5 less than initially predicted). • Approx 200 Mb still in progress – Heterochromatin – Repetitive
  • 6. Genomics & Genome annotation  First genome annotation software system was designed in 1995 by Dr. Owen White with The Institute for Genomic Research that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium Haemophilus influenzae  It involve assembling of the reads to form contigs then assembling with a reference genome (reference assembly) or de novo assembly to obtain the complete genome  Variations such as mutations, SNP, InDels etc can be identified  The genome is then annotated by structural and functional annotation  Mapping Image of Whole genome in an easily understandable manner.
  • 8. Input1 to Genome Viewer- Variant Annotation
  • 9. Input2 to Genome Viewer- Structural Annotation  Structural 2.5.5) Annotation- AUGUSTUS (version
  • 10. Input3 to Genome Viewer-Functional Annotation
  • 11. Genome Annotation  The process of identifying the locations of genes and the coding regions in a genome to determe what those genes do  Finding and attaching the structural elements and its related function to each genome locations 11
  • 12. Genome Annotation gene structure prediction gene function prediction Identifying elements (Introns/exons,CDS,stop,start) in the genome Attaching biological information to these elements- eg: for which 12 protein exon will code for
  • 13. Structural annotation Structural annotation - identification of genomic elements Open reading frame and their localisation gene structure coding regions location of regulatory motifs
  • 14. Functional annotation Functional annotation- attaching biological information to genomic elements biochemical function biological function involved regulations
  • 15. Genome annotation - workflow Genome sequence Repeats Masked or un-masked genome sequence Structural annotation-Gene finding nc-RNAs (tRNA, rRNA), Introns Protein-coding genes Functional annotation View in Genome viewer 16
  • 16. Genome Repeats & features Polymorphic between individuals/populations  Percentage of repetitive sequences in different organisms Genome Aedes aegypti Genome Size (Mb) % Repeat ~70 Anopheles gambiae 260 ~30 Culex pipiens      1,300 540 ~50 Microsatellite Minisatellite Tandem repeat Short tandem repeat SSR 17
  • 17. Finding repeats as a preliminary to gene prediction  Repeat discovery Homology based approaches Use RepeatMasker to search the genome and mask the sequence 18
  • 18. Masked sequence   Repeatmasked sequence is an artificial construction where those regions which are thought to be repetitive are marked with X’s Widely used to reduce the overhead of subsequent computational analyses and to reduce the impact of TE’s in the final annotation set >my sequence >my sequence (repeatmasked) atgagcttcgatagcgatcagctagcgatcaggct actattggcttctctagactcgtctatctctatta gctatcatctcgatagcgatcagctagcgatcagg ctactattggcttcgatagcgatcagctagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctactattggctgatcttaggtcttctga tcttct atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga tcttct Positions/locations are not affected by masking 19
  • 19. Types of Masking- Hard or Soft?  Sometimes we want to mark up repetitive sequence but not to exclude it from downstream analyses. This is achieved using a format known as soft-masked >my sequence >my sequence (softmasked) ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTGGCTTCTCTAGACTCGTCTATCTCTATT AGTATCATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTGGCTTCGATAGCGATCAGCTAGCGATC AGGCTACTATTGGCTTCGATAGCGATCAGCTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT ATGAGCTTCGATAGCGCATCAGCTAGCGATCAGGC TACTATTggcttctctagactcgtctatctctatt agtatcATCTCGATAGCGATCAGCTAGCGATCAGG CTACTATTggcttcgatagcgatcagcTAGCGATC AGGCTACTATTggcttcgatagcgatcagcTAGCG ATCAGGCTACTATTGGCTGATCTTAGGTCTTCTGA TCTTCT >my sequence (hardmasked) atgagcttcgatagcgatcagctagcgatcaggct actattxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxatctcgatagcgatcagctagcgatcagg ctactattxxxxxxxxxxxxxxxxxxxtagcgatc aggctactattggcttcgatagcgatcagctagcg atcaggctxxxxxxxxxxxxxxxxxxxtcttctga20 tcttct
  • 20. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 21
  • 21. Structural annotation Identification of genomic elements  Open reading frame and their localization  Coding regions  Location of regulatory motifs  Start/Stop  Splice Sites  Non coding Regions/RNA’s  Introns 22
  • 22. Methods  Similarity • Similarity between sequences which does not necessarily infer any evolutionary linkage  Ab- initio prediction • Prediction of gene structure from first principles using only the genome sequence 24
  • 24. ab initio prediction Genome Coding potential ATG & Stop codons Splice sites ATG & Stop codons Coding potential Examples: Genefinder, Augustus, Glimmer, SNAP, fgenesh 26
  • 25. Genefinding - similarity  Use known coding sequence to define coding regions  EST sequences  Peptide sequences Problem to handle fuzzy alignment regions around splice sites Examples: EST2Genome, exonerate, genewise, Augustus, Prodigal Gene-finding - comparative  Use two or more genomic sequences to predict genes based on conservation of exon sequences  Examples: Twinscan and SLAM 27
  • 26. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 28
  • 27. Genefinding - non-coding RNA genes  Non-coding RNA genes can be predicted using knowledge of their structure or by similarity with known examples  tRNAscan - uses an HMM and co-variance model for prediction of tRNA genes  Rfam - a suite of HMM’s trained against a large number of different RNA genes 29
  • 28. Gene-finding omissions Alternative isoforms Currently there is no good method for predicting alternative isoforms Only created where supporting transcript evidence is present Pseudogenes Each genome project has a fuzzy definition of pseudogenes Badly curated/described across the board Promoters Rarely a priority for a genome project Some algorithms exist but usually not integrated into an annotation set 30
  • 29. Practical- structural annotation Eukaryotes- AUGUSTUS (gene model) ~/Programs/augustus.2.5.5/bin/augustus --strand=both --genemodel=partial --singlestrand=true --alternatives-from-evidence=true --alternatives-from-sampling=tr --progress=true --gff3=on --uniqueGeneId=true --species=magnaporthe_grisea our_genome.fasta >structural_annotation.gff Prokaryotes – PRODIGAL (Codon Usage table) ~/Programs/prodigal.v2_60.linux -a protein_file.fa -g 11 –d nucleotide_exon_seq.fa -f gff -i contigs.fa -o genes_quality.txt -s genes_score.txt -t genome_training_file.txt 31
  • 30. Structural Annotation-output  Structural Annotation conducted using AUGUSTUS (version 2.5.5), Magnaporthe_grisea as genome model
  • 32. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 34
  • 33. Functional annotation Genome Transcription Primary Transcript RNA processing Processed mRNA ATG STOP m 7G AAAn Translation Polypeptide Protein folding Folded protein Find function Enzyme activity Functional activity A B 35
  • 34. Functional annotation Attaching biological information to genomic elements Biochemical function Biological function Involved regulation and interactions Expression • Utilize known structural annotation to predicted protein sequence 36
  • 35. Functional annotation – Homology Based  Predicted Exons/CDS/ORF are searched against the non-redundant protein database (NCBI, SwissProt) to search for similarities  Visually assess the top 5-10 hits to identify whether these have been assigned a function  Functions are assigned 37
  • 36. Functional annotation - Other features  Other       features which can be determined Signal peptides Transmembrane domains Low complexity regions Various binding sites, glycosylation sites etc. Protein Domain Secretome See http://expasy.org/tools/ for a good list of possible prediction algorithms 38
  • 37. Functional annotation - Other features (Ontologies)  Use  of ontologies to annotate gene products Gene Ontology (GO)    Cellular component Molecular function Biological process 39
  • 38. Practical - FUNCTIONAL ANNOTATION  Homology Based Method  setup blast database for nucleotide/protein  Blasting the genome.fasta for annotations (nucleotide/protein)  sorting for blast minimum E-value (>=0.01) for nucleotide/protein  assigning functions 40
  • 39. Functional annotation- output August 2008 Bioinformatics tools for Comparative Genomics of Vectors 41
  • 40. Conclusion  Annotation accuracy is dependent available supporting data at the time of annotation; update information is necessary  Gene predictions will change over time as new data becomes available (NCBI) that are much similar than previous ones  Functional assignments will change over time as new data becomes available (characterization of hypothetical proteins) 42
  • 41. Genome annotation - workflow Genome sequence Map repeats Masked or un-masked Gene finding- structural annotation nc-RNAs, Introns Protein-coding genes Functional annotation View in Genome viewer 43
  • 42. Genome Viewer The Files that can be visualised Annotation files Indel files Consensus sequence Comparative Genomics 44
  • 44. 46
  • 45. 47
  • 46. 48

Notas do Editor

  1. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  2. Softmasking
  3. Softmasking
  4. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  5. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  6. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation
  7. Try to describe Genome annotation as a process Emphasize the ongoing nature of annotation. There is no real end point to the annotation process (only artificially defined ones) Best to think of this as a ‘best guess’ annotation