SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Gene Prediction Using Hidden
Markov Model
&
Recurrent Neural Network
Ahmed Hani AlGhidani
MSc Student in Computer Science at Cairo University
Research and SDE at RDI Egypt
ahmed.hani@rdi-eg.com
Agenda
• DNA Structure
- Eukaryotic and Prokaryotic Cells
• Gene Prediction Methods
- Empirical Methods
- Ab initio Methods
• Hidden Markov Model
- Existed HMM-based systems
• Recurrent Neural Network
• Other Methods
DNA Structure
DNA Structure (Cont.)
• Prokaryotic Cells
• Most of DNA is coding
• No Introns
• Promoters
DNA Structure (Cont.)
• Eukaryotic Cells
• Exons (Coding)
• Introns (Non-Coding)
• Acceptors (End of Intron in 5’ direction)
• Donors (Start of Intron in 5’ direction)
DNA Structure (Cont.)
• Eukaryotic Cells (cont.)
Gene Prediction
• Get the exons regions that would be
translated to Amino Acid (Protein)
Gene Prediction (Cont.)
• Empirical methods are used for specifically
Prokaryotic cells
• Most of it is coding regions and no introns
• Feature Engineering method
• Open Reading Frames (ORFs)
Gene Prediction (Cont.)
Gene Prediction (Cont.)
• Pros
- Simple and easy for implementation
- Works well with Prokaryotic DNA
because of its simplicity
• Cons
- Bad performance in large sequences
- Works bad with complex DNA such as
Eukaryotic DNA
Gene Prediction (Cont.)
• Ab initio methods for Eukaryotic cells
• Depend on statistical methods and
computational models
• Features Engineering could be involved in
the computations
• Hidden Markov Model and Recurrent
Neural Networks
Hidden Markov Model
• The basic idea is Markov Chains
•
• Set of finite states
• Transition Matrix
Hidden Markov Model (Cont.)
Hidden Markov Model (Cont.)
• Practically, it may be hard to access the
patterns or classes that we want to predict
• We need indicators (visible states) to
obtain the hidden patterns
Hidden Markov Model (Cont.)
Hidden Markov Model (Cont.)
• Observations Probability Estimation
- Estimate the probability of observation
sequence given the model
• Optimal Hidden State Sequence
- Determine the optimal sequence of the
hidden states
• HMM Parameters Estimation
- Get the model parameters that maximizes
the probability of specific observations
given specific states
Hidden Markov Model (Cont.)
• In Gene Prediction, the observations are
the A, C, G, T sequences, and the hidden
states are Exons, Introns and Other
• Use the training data to set the model
parameters (problem 3) using Baum-
Welch algorithm
• For the given observations, we predict the
states (problem 2) using Viterbi algorithm
Hidden Markov Model (Cont.)
Hidden Markov Model (Cont.)
Neural Network (Cont.)
• Unexplored area in Bioinformatics
• No need for features engineering
• Outperforms old-school Machine Learning
• Based on Biological philiosophy!
Neural Network (Cont.)
Recurrent Neural Networks
Recurrent Neural Networks
(Cont.)
Recurrent Neural Networks
(Cont.)
• Acceptor/Donor experiments
Recurrent Neural Networks
(Cont.)
• Exons/Introns still in progress
• Dataset size is 800K sequences
• Sequences aren’t fixed-size
• LSTM instead of Vanilla RNN
• Tensorflow
Other Methods
• Naive Bayesian + Statistical Features
• Hidden Markov Model Support Vector
Machine (HMM-SVM)
• Open Reading Frames + Hidden Markov
Model
• Open Reading Frames + Statistical
Features + Hidden Markov Model
References
• http://bpg.utoledo.edu/~afedorov/lab/eid.html
• http://www.ece.drexel.edu/gailr/ECE-S690-503/markov_models.ppt.pdf
• http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-
7-62
• https://github.com/AhmedHani/Hidden-Markov-Model
• https://ahmedhanibrahim.wordpress.com/2015/10/25/hidden-markov-
models-hmms-part-i/
• http://www.cbcb.umd.edu/software/Glim-
merHMM/man.shtml?tid%5B%5D=44&=Apply
• http://www.math.uwaterloo.ca/~aghodsib/courses/w05stat440/w05stat44
0-notes/feb27.pdf
• https://en.wikipedia.org/wiki/GLIMMER
• https://ocw.mit.edu/courses/electrical-engineering-and-computer-sci-
ence/6-096-algorithms-for-computational-biology-spring-2005/lecture-
notes/lecture7.pdf
• https://www.cs.us.es/~fran/students/julian/gene_finding/gene_find-
ing.html
• http://www.nature.com/nbt/journal/v25/n8/full/nbt0807-883.html
• http://gobics.de/mario/papers/diss.pdf
• https://www.ncbi.nlm.nih.gov/books/NBK21132/
• https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junc-

Mais conteúdo relacionado

Mais procurados (20)

prediction methods for ORF
prediction methods for ORFprediction methods for ORF
prediction methods for ORF
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
gene prediction programs
gene prediction programsgene prediction programs
gene prediction programs
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Restriction Mapping
Restriction MappingRestriction Mapping
Restriction Mapping
 
Gene prediction strategies
Gene prediction strategies Gene prediction strategies
Gene prediction strategies
 
Biological networks
Biological networksBiological networks
Biological networks
 
Est database
Est databaseEst database
Est database
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
String.pptx
String.pptxString.pptx
String.pptx
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 

Destaque

A Study on the Video Scene Retrieving System
A Study on the Video Scene Retrieving SystemA Study on the Video Scene Retrieving System
A Study on the Video Scene Retrieving SystemYoshika Osawa
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyayabhishek upadhyay
 
Constraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionConstraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionChristian Have
 
subtractive hybridization
subtractive hybridizationsubtractive hybridization
subtractive hybridizationSakshi Saxena
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading FramesOsama Zahid
 
Clinical trails
Clinical trailsClinical trails
Clinical trailsGaurav Kr
 
Artificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionArtificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionVigneshwer Dhinakaran
 
TensorFlow User Group #1
TensorFlow User Group #1TensorFlow User Group #1
TensorFlow User Group #1陽平 山口
 
126 micro array study for gene expression
126 micro array study for gene expression126 micro array study for gene expression
126 micro array study for gene expressionSHAPE Society
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksThe Integral Worm
 

Destaque (20)

Hmm
Hmm Hmm
Hmm
 
A Study on the Video Scene Retrieving System
A Study on the Video Scene Retrieving SystemA Study on the Video Scene Retrieving System
A Study on the Video Scene Retrieving System
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyay
 
Constraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionConstraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap Resolution
 
subtractive hybridization
subtractive hybridizationsubtractive hybridization
subtractive hybridization
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading Frames
 
Clinical trails
Clinical trailsClinical trails
Clinical trails
 
hopfield neural network
hopfield neural networkhopfield neural network
hopfield neural network
 
Artificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionArtificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognition
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
TensorFlow User Group #1
TensorFlow User Group #1TensorFlow User Group #1
TensorFlow User Group #1
 
HMM (Hidden Markov Model)
HMM (Hidden Markov Model)HMM (Hidden Markov Model)
HMM (Hidden Markov Model)
 
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
 
126 micro array study for gene expression
126 micro array study for gene expression126 micro array study for gene expression
126 micro array study for gene expression
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
BLAST
BLASTBLAST
BLAST
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 

Semelhante a Gene Prediction Using HMMs and RNNs

Hidden Markov Modelling for hYdrological Applicationsssssssss
Hidden Markov Modelling for hYdrological ApplicationsssssssssHidden Markov Modelling for hYdrological Applicationsssssssss
Hidden Markov Modelling for hYdrological Applicationsssssssssenock18
 
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODSPREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODSBilal Nizami
 
Real-time fMRI Machile Learning
Real-time fMRI Machile LearningReal-time fMRI Machile Learning
Real-time fMRI Machile LearningSpencer
 
Ff meeting 25nov03
Ff meeting 25nov03Ff meeting 25nov03
Ff meeting 25nov03moniajit
 
Quantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, CiscoQuantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, CiscoVishnu Pendyala
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
 
Evolution algorithms
Evolution algorithmsEvolution algorithms
Evolution algorithmsAndrii Babii
 
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...Wilfried Elmenreich
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
 
Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMNumenta
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...Natalio Krasnogor
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysisrobertstevens65
 
Motif Finding.pdf
Motif Finding.pdfMotif Finding.pdf
Motif Finding.pdfShimoFcis
 

Semelhante a Gene Prediction Using HMMs and RNNs (20)

Hidden Markov Modelling for hYdrological Applicationsssssssss
Hidden Markov Modelling for hYdrological ApplicationsssssssssHidden Markov Modelling for hYdrological Applicationsssssssss
Hidden Markov Modelling for hYdrological Applicationsssssssss
 
artificial neural network-gene prediction
artificial neural network-gene predictionartificial neural network-gene prediction
artificial neural network-gene prediction
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 
sequencea.ppt
sequencea.pptsequencea.ppt
sequencea.ppt
 
1 5
1 51 5
1 5
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODSPREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
 
Real-time fMRI Machile Learning
Real-time fMRI Machile LearningReal-time fMRI Machile Learning
Real-time fMRI Machile Learning
 
Ff meeting 25nov03
Ff meeting 25nov03Ff meeting 25nov03
Ff meeting 25nov03
 
Quantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, CiscoQuantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, Cisco
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Hybrid multichannel signal separation using supervised nonnegative matrix fac...
Hybrid multichannel signal separation using supervised nonnegative matrix fac...
 
Evolution algorithms
Evolution algorithmsEvolution algorithms
Evolution algorithms
 
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
Machine Learning Techniques for the Smart Grid – Modeling of Solar Energy usi...
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...Online Divergence Switching for  Superresolution-Based  Nonnegative Matrix Fa...
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...
 
Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTM
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysis
 
Motif Finding.pdf
Motif Finding.pdfMotif Finding.pdf
Motif Finding.pdf
 

Gene Prediction Using HMMs and RNNs

  • 1. Gene Prediction Using Hidden Markov Model & Recurrent Neural Network Ahmed Hani AlGhidani MSc Student in Computer Science at Cairo University Research and SDE at RDI Egypt ahmed.hani@rdi-eg.com
  • 2. Agenda • DNA Structure - Eukaryotic and Prokaryotic Cells • Gene Prediction Methods - Empirical Methods - Ab initio Methods • Hidden Markov Model - Existed HMM-based systems • Recurrent Neural Network • Other Methods
  • 4. DNA Structure (Cont.) • Prokaryotic Cells • Most of DNA is coding • No Introns • Promoters
  • 5. DNA Structure (Cont.) • Eukaryotic Cells • Exons (Coding) • Introns (Non-Coding) • Acceptors (End of Intron in 5’ direction) • Donors (Start of Intron in 5’ direction)
  • 6. DNA Structure (Cont.) • Eukaryotic Cells (cont.)
  • 7. Gene Prediction • Get the exons regions that would be translated to Amino Acid (Protein)
  • 8. Gene Prediction (Cont.) • Empirical methods are used for specifically Prokaryotic cells • Most of it is coding regions and no introns • Feature Engineering method • Open Reading Frames (ORFs)
  • 10. Gene Prediction (Cont.) • Pros - Simple and easy for implementation - Works well with Prokaryotic DNA because of its simplicity • Cons - Bad performance in large sequences - Works bad with complex DNA such as Eukaryotic DNA
  • 11. Gene Prediction (Cont.) • Ab initio methods for Eukaryotic cells • Depend on statistical methods and computational models • Features Engineering could be involved in the computations • Hidden Markov Model and Recurrent Neural Networks
  • 12. Hidden Markov Model • The basic idea is Markov Chains • • Set of finite states • Transition Matrix
  • 14. Hidden Markov Model (Cont.) • Practically, it may be hard to access the patterns or classes that we want to predict • We need indicators (visible states) to obtain the hidden patterns
  • 16. Hidden Markov Model (Cont.) • Observations Probability Estimation - Estimate the probability of observation sequence given the model • Optimal Hidden State Sequence - Determine the optimal sequence of the hidden states • HMM Parameters Estimation - Get the model parameters that maximizes the probability of specific observations given specific states
  • 17. Hidden Markov Model (Cont.) • In Gene Prediction, the observations are the A, C, G, T sequences, and the hidden states are Exons, Introns and Other • Use the training data to set the model parameters (problem 3) using Baum- Welch algorithm • For the given observations, we predict the states (problem 2) using Viterbi algorithm
  • 20. Neural Network (Cont.) • Unexplored area in Bioinformatics • No need for features engineering • Outperforms old-school Machine Learning • Based on Biological philiosophy!
  • 24. Recurrent Neural Networks (Cont.) • Acceptor/Donor experiments
  • 25. Recurrent Neural Networks (Cont.) • Exons/Introns still in progress • Dataset size is 800K sequences • Sequences aren’t fixed-size • LSTM instead of Vanilla RNN • Tensorflow
  • 26. Other Methods • Naive Bayesian + Statistical Features • Hidden Markov Model Support Vector Machine (HMM-SVM) • Open Reading Frames + Hidden Markov Model • Open Reading Frames + Statistical Features + Hidden Markov Model
  • 27. References • http://bpg.utoledo.edu/~afedorov/lab/eid.html • http://www.ece.drexel.edu/gailr/ECE-S690-503/markov_models.ppt.pdf • http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105- 7-62 • https://github.com/AhmedHani/Hidden-Markov-Model • https://ahmedhanibrahim.wordpress.com/2015/10/25/hidden-markov- models-hmms-part-i/ • http://www.cbcb.umd.edu/software/Glim- merHMM/man.shtml?tid%5B%5D=44&=Apply • http://www.math.uwaterloo.ca/~aghodsib/courses/w05stat440/w05stat44 0-notes/feb27.pdf • https://en.wikipedia.org/wiki/GLIMMER • https://ocw.mit.edu/courses/electrical-engineering-and-computer-sci- ence/6-096-algorithms-for-computational-biology-spring-2005/lecture- notes/lecture7.pdf • https://www.cs.us.es/~fran/students/julian/gene_finding/gene_find- ing.html • http://www.nature.com/nbt/journal/v25/n8/full/nbt0807-883.html • http://gobics.de/mario/papers/diss.pdf • https://www.ncbi.nlm.nih.gov/books/NBK21132/ • https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junc-