SlideShare uma empresa Scribd logo
1 de 115
 
FBW 09-12-2010 Wim Van Criekinge
Inhoud Lessen: Bioinformatica ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gene Prediction, HMM & ncRNA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
UNKNOWN PROTEIN SEQUENCE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BASIC INFORMATION COMES FROM SEQUENCE ,[object Object],[object Object],[object Object],[object Object]
Additional analysis of protein sequences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FINDING CONSERVED PATTERNS IN PROTEIN SEQUENCES ,[object Object],[object Object],[object Object],[object Object],[object Object]
PATTERNS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PROFILES ,[object Object],[object Object],[object Object],[object Object]
HIDDEN MARKOV MODELS (HMM) ,[object Object],[object Object],[object Object],[object Object],[object Object],HMM
Sequence
Gene Prediction, HMM & ncRNA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is an ontology? ,[object Object],[object Object],[object Object]
Why Create Ontologies? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Summary ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The Three Ontologies
DAG Structure Directed acyclic graph: each child may have one or more parents
Example - Molecular Function
Example - Biological Process
Example - Cellular Location
AmiGO browser
GO: Applications ,[object Object],[object Object],[object Object]
Gene Prediction, HMM & ncRNA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Problem:  Given a very long DNA sequence, identify coding regions (including intron splice sites) and their predicted protein sequences Computational Gene Finding
Eukaryotic gene structure Computational Gene Finding
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computational Gene Finding
Genefinder
 
GENE STRUCTURE INFORMATION - POSITION ON PHYSICAL MAP  This gene structure corresponds to the position on the physical map
[object Object],[object Object],[object Object],[object Object],[object Object]
GENE STRUCTURE INFORMATION - POSITION  This gene structure relates to the Position:  ,[object Object]
GENE STRUCTURE INFORMATION - PREDICTED GENE STRUCTURE  This gene structure relates to the predicted gene structures Boxes are Exons, thin lines (or springs) are Introns
Find the open reading frames GAAAAAGCTCCTGCCCAATCTGAAATGGTTAGCCTATCTTTCCACCGT Any sequence has 3 potential reading frames (+1, +2, +3) Its complement also has three potential reading frames (-1, -2, -3) 6 possible reading frames The triplet, non-punctuated nature of the genetic code helps us out 64 potential codons 61 true codons 3 stop codons (TGA, TAA, TAG) Random distribution app. 1/21 codons will be a stop E  K  A  P  A  Q  S  E  M  V  S  L  S  F  H  R K  K  L  L  P  N  L  K  W  L  A  Y  L  S  T K  S  S  C  P  I  *  N  G  *  P  I  F  P  P
GENE STRUCTURE INFORMATION - OPEN READING FRAMES  This gene structure relates to Open reading Frames ,[object Object],[object Object]
[object Object],[object Object],GENE STRUCTURE INFORMATION - START CODONS  This gene structure represents Start Codons
[object Object],[object Object],[object Object],[object Object],[object Object],Computational Gene Finding:  Hexanucleotide frequencies
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
GENE STRUCTURE INFORMATION - CODING POTENTIAL  This gene structure corresponds to the Coding Potential  ,[object Object],[object Object]
blastn (EST) For raw DNA sequence analysis blastx is extremely useful Will probe your DNA sequence against the protein database A match (homolog) gives you some ideas regarding function One problem are all of the genome sequences Will get matches to genome databases that are strictly identified by sequence homology – often you need some experimental evidence
[object Object],[object Object],[object Object],[object Object]
GENE STRUCTURE INFORMATION - EST MATCHES  This gene structure relates to Est Matches ,[object Object],[object Object]
Borodovsky et al., 1999, Organization of the Prokaryotic Genome (Charlebois, ed) pp. 11-34 New generation of programs to predict gene coding sequences based on a non-random repeat pattern (eg. Glimmer, GeneMark) – actually pretty good
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Computational Gene Finding
GENE STRUCTURE INFORMATION - REPEAT FAMILIES  This gene structure corresponds to Repeat Families ,[object Object],[object Object]
GENE STRUCTURE INFORMATION - REPEATS  This gene structure relates to Repeats ,[object Object],[object Object]
Exon/intron boundaries
[object Object],[object Object],[object Object],[object Object],Computational Gene Finding:  Splice junctions
GENE STRUCTURE INFORMATION - PUTATIVE SPLICE SITES  This gene structure shows putative splice sites  ,[object Object],[object Object],[object Object],[object Object]
Gene Prediction, HMM & ncRNA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
[object Object],[object Object],[object Object],[object Object],Towards profiles (PSSM) with indels – insertions and/or deletions
[object Object],delete continue Hidden Markov Models: Graphical models of sequences A .1 C .05 D .2 E .08 F .01 Gap A .04 C .1 D .01 E .2 F .02 Gap A .2 C .01 D .05 E .1 F .06 insert
[object Object],[object Object],[object Object],[object Object],[object Object],Hidden Markov Chain
Marchov Chain for DNA
Markov chain with begin and end
[object Object],[object Object],[object Object],[object Object],[object Object],Markov Models: Graphical models of sequences
[object Object],Example sequences:   1234  234  14  121214  2123334  Markov Models Begin Emit 1 Emit 2 Emit 4 Emit 3 End
[object Object],[object Object],[object Object],Hidden Markov Models: Probabilistic Markov Models 0.5 0.5 0.25 0.75 0.9 0.1 0.2 0.8 1.0 Begin Emit 1 Emit 2 Emit 4 Emit 3 End
[object Object],[object Object],Hidden Markov Models: Probablistic Emmision 0.5 0.5 0.25 0.75 0.9 0.1 0.2 0.8 1.0 Begin A (0.8)  B (0.2) B  (0.7)  C (0.3) C (0.1)  D   (0.9) C  (0.6) A(0.4) End
[object Object],[object Object],[object Object],Hidden Markov Models 0.5 0.5 0.25 0.75 0.9 0.1 0.2 0.8 1.0 Begin A (0.8)  B (0.2) B  (0.7)  C (0.3) C (0.1)  D   (0.9) C  (0.6) A(0.4) End
Hidden Markov Models
Hidden Markov Models: The occasionally dishonest casino
Hidden Markov Models: The occasionally dishonest casino
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Use of Hidden Markov Models
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Applications of Hidden Markov Models
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Hidden Markov Models Resources
Example TMHMM Beyond Kyte-Doolitlle …
HMM in protein analysis ,[object Object]
 
Hidden Markov model for gene structure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classic Programs for gene finding ,[object Object],[object Object],[object Object],[object Object],[object Object]
GENSCAN not to be confused with GeneScan, a commercial product ,[object Object],[object Object],[object Object],[object Object],Hidden Markov Models: Gene Finding Software
Conservation of Gene Features ,[object Object]
Composite Approaches ,[object Object],[object Object],[object Object]
Gene Prediction: more complex … ,[object Object],[object Object],[object Object],[object Object]
Length preference 5’ ss intcomp branch 3’ ss
 
Contents-Schedule RNA genes Besides the 6000 protein coding-genes, there is: 140 ribosomal RNA genes 275 transfer RNA gnes 40 small nuclear RNA genes >100 small nucleolar genes ? pRNA in   29 rotary packaging motor ( Simpson et el. Nature 408:745-750,2000) Cartilage-hair hypoplasmia mapped to an RNA  (Ridanpoa et al. Cell 104:195-203,2001) The human Prader-Willi ciritical region  (Cavaille et al. PNAS 97:14035-7, 2000)
 
 
 
 
RNA genes can be hard to detects UGAGGUAGUAGGUUGUAUAGU C.elegans  let-27; 21 nt  (Pasquinelli et al. Nature 408:86-89,2000) Often small Sometimes multicopy and redundant Often not polyadenylated  (not represented in ESTs) Immune to frameshift and nonsense mutations No open reading frame, no codon bias Often evolving rapidly in primary sequence miRNA genes
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Lin-4
Let-7  (lethal-7) was also mapped to a ncRNA gene with a 21-nucleotide product The small let-7 RNA is also thought to be a post-transcriptional negative regulator for lin-41 and lin-42 100% conserved in all bilaterally symmetrical animals (not jellyfish and sponges) Sometimes called stRNAs, small temporal RNAs Let-7 (Pasquinelli et al. Nature 408:86-89,2000)
 
Two computational analysis problems ,[object Object],[object Object],[object Object],[object Object]
 
 
 
 
 
 
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> a S S -> a aS
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aa S S -> aa SS
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aaS S -> aa S S S -> aa gSc uS
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aaS S -> aaS S S -> aagSc uS
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aag S cu S S -> aag aSu cu gSc
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aaga S aucugg S cc S -> aaga cSg aucuggc gSc cc
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc
Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers  A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc A C G U * A A A A A G G G G G C C C C C C C U U U * * * * *
 
 
The power of comparative analysis ,[object Object],[object Object],[object Object],[object Object]
Compensatory substitutions that maintain the structure U U C  G  U  A  A  U G  C A  UCGAC  3’ G C 5’
Evolutionary conservation of RNA molecules can be revealed by identification of compensatory substitutions
…………
[object Object],[object Object]
Function on ncRNAs
ncRNAs & RNAi
Therapeutic Applications ,[object Object],[object Object]
 

Mais conteúdo relacionado

Mais procurados

Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
Naima Tahsin
 
Gene Prediction Using Hidden Markov Model and Recurrent Neural Network
Gene Prediction Using Hidden Markov Model and Recurrent Neural NetworkGene Prediction Using Hidden Markov Model and Recurrent Neural Network
Gene Prediction Using Hidden Markov Model and Recurrent Neural Network
Ahmed Hani Ibrahim
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading Frames
Osama Zahid
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
Rai University
 
Role of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andRole of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies and
Sarla Rao
 

Mais procurados (20)

Bioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-statBioalgo 2012-01-gene-prediction-stat
Bioalgo 2012-01-gene-prediction-stat
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
BIOL335: How to annotate a genome
BIOL335: How to annotate a genomeBIOL335: How to annotate a genome
BIOL335: How to annotate a genome
 
Finding genes
Finding genesFinding genes
Finding genes
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Gene Prediction Using Hidden Markov Model and Recurrent Neural Network
Gene Prediction Using Hidden Markov Model and Recurrent Neural NetworkGene Prediction Using Hidden Markov Model and Recurrent Neural Network
Gene Prediction Using Hidden Markov Model and Recurrent Neural Network
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading Frames
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Gene expression profiling
Gene expression profilingGene expression profiling
Gene expression profiling
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Finding ORF
Finding ORFFinding ORF
Finding ORF
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Role of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies andRole of transcriptomics in gene expression studies and
Role of transcriptomics in gene expression studies and
 
Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 

Destaque

Methods for detecting mutations in DNA
Methods for detecting mutations in DNAMethods for detecting mutations in DNA
Methods for detecting mutations in DNA
Laia Gil
 
Diagnosis Of Genetic Disorders & Infectious Diseases
Diagnosis Of Genetic Disorders & Infectious DiseasesDiagnosis Of Genetic Disorders & Infectious Diseases
Diagnosis Of Genetic Disorders & Infectious Diseases
Prasanthperceptron
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
nadeem akhter
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
nadeem akhter
 

Destaque (20)

Identification of disease genes
Identification of disease genesIdentification of disease genes
Identification of disease genes
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
Genetics introduction to Medicos
Genetics introduction to MedicosGenetics introduction to Medicos
Genetics introduction to Medicos
 
Attachments (1)
Attachments (1)Attachments (1)
Attachments (1)
 
Secondary metabolites
Secondary metabolitesSecondary metabolites
Secondary metabolites
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
Genetic Testing (Eastern Biotech & Life Sciences)
Genetic Testing (Eastern Biotech & Life Sciences)Genetic Testing (Eastern Biotech & Life Sciences)
Genetic Testing (Eastern Biotech & Life Sciences)
 
Lecture 2 animal cell biotechnology
Lecture 2  animal cell biotechnologyLecture 2  animal cell biotechnology
Lecture 2 animal cell biotechnology
 
Methods for detecting mutations in DNA
Methods for detecting mutations in DNAMethods for detecting mutations in DNA
Methods for detecting mutations in DNA
 
Diagnosis Of Genetic Disorders & Infectious Diseases
Diagnosis Of Genetic Disorders & Infectious DiseasesDiagnosis Of Genetic Disorders & Infectious Diseases
Diagnosis Of Genetic Disorders & Infectious Diseases
 
Systems biology and biotechnology of Streptomyces species for the production ...
Systems biology and biotechnology of Streptomyces species for the production ...Systems biology and biotechnology of Streptomyces species for the production ...
Systems biology and biotechnology of Streptomyces species for the production ...
 
Genetic screening
Genetic screeningGenetic screening
Genetic screening
 
Genetic testing
Genetic testingGenetic testing
Genetic testing
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
 
Genetic screening & gene therapy
Genetic screening & gene therapyGenetic screening & gene therapy
Genetic screening & gene therapy
 
Genetic screening Dr.Padmesh
Genetic screening  Dr.PadmeshGenetic screening  Dr.Padmesh
Genetic screening Dr.Padmesh
 
Protein structure classification
Protein structure classificationProtein structure classification
Protein structure classification
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
MOLECULAR TOOLS IN DIAGNOSIS AND CHARACTERIZATION OF INFECTIOUS DISEASES
MOLECULAR TOOLS IN  DIAGNOSIS AND CHARACTERIZATION OF INFECTIOUS DISEASES MOLECULAR TOOLS IN  DIAGNOSIS AND CHARACTERIZATION OF INFECTIOUS DISEASES
MOLECULAR TOOLS IN DIAGNOSIS AND CHARACTERIZATION OF INFECTIOUS DISEASES
 

Semelhante a Bioinformatica 08-12-2011-t8-go-hmm

RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
Toyin23
 
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docxDrug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
jacksnathalie
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 

Semelhante a Bioinformatica 08-12-2011-t8-go-hmm (20)

Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
 
2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge
 
Bioinformatica t8-go-hmm
Bioinformatica t8-go-hmmBioinformatica t8-go-hmm
Bioinformatica t8-go-hmm
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
3302 3305
3302 33053302 3305
3302 3305
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Thesis def
Thesis defThesis def
Thesis def
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docxDrug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
 
Bioinformatics2015.pdf
Bioinformatics2015.pdfBioinformatics2015.pdf
Bioinformatics2015.pdf
 
Bioinformatics2015.pdf
Bioinformatics2015.pdfBioinformatics2015.pdf
Bioinformatics2015.pdf
 
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 

Mais de Prof. Wim Van Criekinge

Mais de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Último (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Bioinformatica 08-12-2011-t8-go-hmm

  • 1.  
  • 2. FBW 09-12-2010 Wim Van Criekinge
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. DAG Structure Directed acyclic graph: each child may have one or more parents
  • 21. Example - Cellular Location
  • 23.
  • 24.
  • 25. Problem: Given a very long DNA sequence, identify coding regions (including intron splice sites) and their predicted protein sequences Computational Gene Finding
  • 26. Eukaryotic gene structure Computational Gene Finding
  • 27.
  • 29.  
  • 30. GENE STRUCTURE INFORMATION - POSITION ON PHYSICAL MAP This gene structure corresponds to the position on the physical map
  • 31.
  • 32.
  • 33. GENE STRUCTURE INFORMATION - PREDICTED GENE STRUCTURE This gene structure relates to the predicted gene structures Boxes are Exons, thin lines (or springs) are Introns
  • 34. Find the open reading frames GAAAAAGCTCCTGCCCAATCTGAAATGGTTAGCCTATCTTTCCACCGT Any sequence has 3 potential reading frames (+1, +2, +3) Its complement also has three potential reading frames (-1, -2, -3) 6 possible reading frames The triplet, non-punctuated nature of the genetic code helps us out 64 potential codons 61 true codons 3 stop codons (TGA, TAA, TAG) Random distribution app. 1/21 codons will be a stop E K A P A Q S E M V S L S F H R K K L L P N L K W L A Y L S T K S S C P I * N G * P I F P P
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. blastn (EST) For raw DNA sequence analysis blastx is extremely useful Will probe your DNA sequence against the protein database A match (homolog) gives you some ideas regarding function One problem are all of the genome sequences Will get matches to genome databases that are strictly identified by sequence homology – often you need some experimental evidence
  • 41.
  • 42.
  • 43. Borodovsky et al., 1999, Organization of the Prokaryotic Genome (Charlebois, ed) pp. 11-34 New generation of programs to predict gene coding sequences based on a non-random repeat pattern (eg. Glimmer, GeneMark) – actually pretty good
  • 44.
  • 45.
  • 46.
  • 48.
  • 49.
  • 50.
  • 51.  
  • 52.
  • 53.
  • 54.
  • 56. Markov chain with begin and end
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 63. Hidden Markov Models: The occasionally dishonest casino
  • 64. Hidden Markov Models: The occasionally dishonest casino
  • 65.
  • 66.
  • 67.
  • 68. Example TMHMM Beyond Kyte-Doolitlle …
  • 69.
  • 70.  
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77. Length preference 5’ ss intcomp branch 3’ ss
  • 78.  
  • 79. Contents-Schedule RNA genes Besides the 6000 protein coding-genes, there is: 140 ribosomal RNA genes 275 transfer RNA gnes 40 small nuclear RNA genes >100 small nucleolar genes ? pRNA in  29 rotary packaging motor ( Simpson et el. Nature 408:745-750,2000) Cartilage-hair hypoplasmia mapped to an RNA (Ridanpoa et al. Cell 104:195-203,2001) The human Prader-Willi ciritical region (Cavaille et al. PNAS 97:14035-7, 2000)
  • 80.  
  • 81.  
  • 82.  
  • 83.  
  • 84. RNA genes can be hard to detects UGAGGUAGUAGGUUGUAUAGU C.elegans let-27; 21 nt (Pasquinelli et al. Nature 408:86-89,2000) Often small Sometimes multicopy and redundant Often not polyadenylated (not represented in ESTs) Immune to frameshift and nonsense mutations No open reading frame, no codon bias Often evolving rapidly in primary sequence miRNA genes
  • 85.
  • 86. Let-7 (lethal-7) was also mapped to a ncRNA gene with a 21-nucleotide product The small let-7 RNA is also thought to be a post-transcriptional negative regulator for lin-41 and lin-42 100% conserved in all bilaterally symmetrical animals (not jellyfish and sponges) Sometimes called stRNAs, small temporal RNAs Let-7 (Pasquinelli et al. Nature 408:86-89,2000)
  • 87.  
  • 88.
  • 89.  
  • 90.  
  • 91.  
  • 92.  
  • 93.  
  • 94.  
  • 95. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS
  • 96. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> a S S -> a aS
  • 97. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aa S S -> aa SS
  • 98. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aa S S S -> aa gSc uS
  • 99. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaS S S -> aagSc uS
  • 100. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aag S cu S S -> aag aSu cu gSc
  • 101. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aaga S aucugg S cc S -> aaga cSg aucuggc gSc cc
  • 102. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc
  • 103. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc
  • 104. Basic CFG “ production rules” S -> aS S -> Sa S -> aSu S -> SS Context-free grammers A CFG “derivation” S -> aS S -> aaS S -> aaSS S -> aagScuS S -> aagaSucugSc S -> aagaSaucuggScc S -> aagacSgaucuggcgSccc S -> aagacuSgaucuggcgSccc S -> aagacuuSgaucuggcgaSccc S -> aagacuucSgaucuggcgacSccc S -> aagacuucgSgaucuggcgacaSccc S -> aagacuucggaucuggcgacaccc A C G U * A A A A A G G G G G C C C C C C C U U U * * * * *
  • 105.  
  • 106.  
  • 107.
  • 108. Compensatory substitutions that maintain the structure U U C G U A A U G C A UCGAC 3’ G C 5’
  • 109. Evolutionary conservation of RNA molecules can be revealed by identification of compensatory substitutions
  • 111.
  • 114.
  • 115.