Gene Prediction Using HMMs and RNNs

Gene Prediction Using Hidden
Markov Model
&
Recurrent Neural Network
Ahmed Hani AlGhidani
MSc Student in Computer Science at Cairo University
Research and SDE at RDI Egypt
ahmed.hani@rdi-eg.com

Agenda
• DNA Structure
- Eukaryotic and Prokaryotic Cells
• Gene Prediction Methods
- Empirical Methods
- Ab initio Methods
• Hidden Markov Model
- Existed HMM-based systems
• Recurrent Neural Network
• Other Methods

DNA Structure (Cont.)
• Prokaryotic Cells
• Most of DNA is coding
• No Introns
• Promoters

• Eukaryotic Cells
• Exons (Coding)
• Introns (Non-Coding)
• Acceptors (End of Intron in 5’ direction)
• Donors (Start of Intron in 5’ direction)

• Eukaryotic Cells (cont.)

Gene Prediction
• Get the exons regions that would be
translated to Amino Acid (Protein)

Gene Prediction (Cont.)
• Empirical methods are used for specifically
Prokaryotic cells
• Most of it is coding regions and no introns
• Feature Engineering method
• Open Reading Frames (ORFs)

• Pros
- Simple and easy for implementation
- Works well with Prokaryotic DNA
because of its simplicity
• Cons
- Bad performance in large sequences
- Works bad with complex DNA such as
Eukaryotic DNA

• Ab initio methods for Eukaryotic cells
• Depend on statistical methods and
computational models
• Features Engineering could be involved in
the computations
• Hidden Markov Model and Recurrent
Neural Networks

Hidden Markov Model
• The basic idea is Markov Chains
•
• Set of finite states
• Transition Matrix

Hidden Markov Model (Cont.)
• Practically, it may be hard to access the
patterns or classes that we want to predict
• We need indicators (visible states) to
obtain the hidden patterns

• Observations Probability Estimation
- Estimate the probability of observation
sequence given the model
• Optimal Hidden State Sequence
- Determine the optimal sequence of the
hidden states
• HMM Parameters Estimation
- Get the model parameters that maximizes
the probability of specific observations
given specific states

• In Gene Prediction, the observations are
the A, C, G, T sequences, and the hidden
states are Exons, Introns and Other
• Use the training data to set the model
parameters (problem 3) using Baum-
Welch algorithm
• For the given observations, we predict the
states (problem 2) using Viterbi algorithm

Neural Network (Cont.)
• Unexplored area in Bioinformatics
• No need for features engineering
• Outperforms old-school Machine Learning
• Based on Biological philiosophy!

Recurrent Neural Networks
(Cont.)

(Cont.)
• Acceptor/Donor experiments

(Cont.)
• Exons/Introns still in progress
• Dataset size is 800K sequences
• Sequences aren’t fixed-size
• LSTM instead of Vanilla RNN
• Tensorflow

Other Methods
• Naive Bayesian + Statistical Features
• Hidden Markov Model Support Vector
Machine (HMM-SVM)
• Open Reading Frames + Hidden Markov
Model
• Open Reading Frames + Statistical
Features + Hidden Markov Model

References
• http://bpg.utoledo.edu/~afedorov/lab/eid.html
• http://www.ece.drexel.edu/gailr/ECE-S690-503/markov_models.ppt.pdf
• http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-
7-62
• https://github.com/AhmedHani/Hidden-Markov-Model
• https://ahmedhanibrahim.wordpress.com/2015/10/25/hidden-markov-
models-hmms-part-i/
• http://www.cbcb.umd.edu/software/Glim-
merHMM/man.shtml?tid%5B%5D=44&=Apply
• http://www.math.uwaterloo.ca/~aghodsib/courses/w05stat440/w05stat44
0-notes/feb27.pdf
• https://en.wikipedia.org/wiki/GLIMMER
• https://ocw.mit.edu/courses/electrical-engineering-and-computer-sci-
ence/6-096-algorithms-for-computational-biology-spring-2005/lecture-
notes/lecture7.pdf
• https://www.cs.us.es/~fran/students/julian/gene_finding/gene_find-
ing.html
• http://www.nature.com/nbt/journal/v25/n8/full/nbt0807-883.html
• http://gobics.de/mario/papers/diss.pdf
• https://www.ncbi.nlm.nih.gov/books/NBK21132/
• https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junc-

Gene Prediction Using HMMs and RNNs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Gene Prediction Using HMMs and RNNs

Semelhante a Gene Prediction Using HMMs and RNNs (20)

Gene Prediction Using HMMs and RNNs