SlideShare a Scribd company logo
1 of 69
Download to read offline
Advanced Computing Laboratory
Electrical and Computer Engineering
Seoul National University
Taehoon Lee SungrohYoon
BoostedCategorical Restricted Boltzmann Machine
forComputational Prediction of Splice Junctions
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
2/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
Motivation
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
Motivation
negative positive
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
Motivation
negative positive
easy to
misclassify
query images
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
Motivation
negative positive
easy to
misclassify
query images
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
• We propose a new RBM training method called boosted CD.
Motivation
negative positive
easy to
misclassify
query images
3/25
• Deep Neural Networks (DNN) show human level performance on many
recognition tasks.
• We focus on class-imbalanced prediction.
• Insufficient samples to represent the true distribution of a class.
• Q. How can we learn minor but important features using neural networks?
• We propose a new RBM training method called boosted CD.
• We also devise a regularization term for sparsity of DNA sequences.
Motivation
negative positive
easy to
misclassify
query images
3/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
gene expression
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
gene expression
exon
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
DNA
RNA
protein
gene expression
exon
intron
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
exon
GT: false boundary
GT: true boundary
ACGTCGACTGCTACGTAGCAGCGA
TACGTACCGATCATCACTATCATC
GAGGTACGATCGATCGATCGATCA
GTCGATCGTCGTTCAGTCAGTCGA
TATCAGTCATATGCACATCTCAGT
DNA
RNA
protein
gene expression
exon
intron
4/25
• Genetic information flows through the gene expression process.
• DNA: a sequence of four types of nucleotides (A,G,T,C).
• Gene: a segment of DNA (the basic unit of heredity).
(Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem
exon
GT: false boundary
GT: true boundary
ACGTCGACTGCTACGTAGCAGCGA
TACGTACCGATCATCACTATCATC
GAGGTACGATCGATCGATCGATCA
GTCGATCGTCGTTCAGTCAGTCGA
TATCAGTCATATGCACATCTCAGT
DNA
RNA
protein
gene expression
GT (or AG)
16K
76M
true sites
exon
intron
160K
(=0.21% over 76M)
4/25
• Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
1
2
5/25
• Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
We want to construct a learning model which can boost prediction
performance in a complementary way to alignment-based method.
1
2
1
2
5/25
• Two approaches:
• Machine learning-based:
• ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991),
• SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007),
• HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006).
• Sequence alignment-based:
• TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010),
RUM (Grant et al., 2011).
PreviousWork on Junction Prediction
We want to construct a learning model which can boost prediction
performance in a complementary way to alignment-based method.
1
2
1
2
We propose a learning model based on (multilayer) RBMs
and its training scheme.
5/25
• Training methods of RBM
• RBM for categorical values
• Softmax input units (Salakhutdinov et al., ICML 2007).
• Class-imbalance problems
• Refer to a review byGalar et al. (IEEET SMC 2012).
Related Methodologies
Description
Training
cost
Noise
handling
Class-imbalance
handling
CD (Hinton,
Neural Comp. 2002)
Standard and
widely used
- - -
Persistent CD
(Tieleman, ICML 2008)
Use of a single
Markov chain
- -
Parallel tempering
(Cho et al., IJCNN 2010)
Simultaneous Markov
chains generation
6/25
Main Contributions
7/25
Main Contributions
New RBM training methods
called boosted CD
7/25
Main Contributions
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
7/25
Main Contributions
Significant boosts in splicing
prediction performance
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
7/25
Main Contributions
Significant boosts in splicing
prediction performance
Robustness to high-dimensional
class-imbalanced data
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
7/25
Main Contributions
Significant boosts in splicing
prediction performance
Robustness to high-dimensional
class-imbalanced data
New RBM training methods
called boosted CD
New penalty term to handle
sparsity of DNA sequences
25/25
The ability to detect subtle
non-canonical splicing signals
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
8/25
• RBM is a type of logistic belief network whose structure is a bipartite graph.
• Nodes:
• Input layer:
• Hidden layer:
• Probability of a configuration :
•
•
• Each node is a stochastic binary unit:
•
• can be used as a feature.
Restricted Boltzmann Machines
9/25
• Training weights to minimize negative log-likelihood of data.
• Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps.
• The CD-𝑘 updates after seeing example 𝒗:
Contrastive Divergence (CD) forTraining RBMs
𝒗(0)
= 𝒗
𝒉(0) 𝒉(1) 𝒉(𝑘)
𝒗(1)
𝒗(𝑘)
10/25
• Training weights to minimize negative log-likelihood of data.
• Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps.
• The CD-𝑘 updates after seeing example 𝒗:
Contrastive Divergence (CD) forTraining RBMs
approximated by
k-step Markov chain
𝒗(0)
= 𝒗
𝒉(0) 𝒉(1) 𝒉(𝑘)
𝒗(1)
𝒗(𝑘)
10/25
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
11/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
Overview of Proposed Methodology
12/25
• Boosting is a meta-algorithm which converts weak learners to strong ones.
• Most boosting algorithms consist of iteratively learning weak classifiers with
respect to a distribution and adding them to a final strong classifier.
• The main variation between many boosting algorithms:
• The method of weighting training data points and hypotheses.
• AdaBoost, LPBoost,TotalBoost, …
What Boosting Is
from lecture notes @ UCIrvine CS 271 Fall 2007
13/25
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
BoostedContrastive Divergence (1/2)
14/25
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
BoostedContrastive Divergence (1/2)
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
assign higher
weights to
rare samples
14/25
hardly
observed
regions
• Contrastive divergence training is looped over all mini-batches and known to
be stable.
• However, for a class-imbalance distribution, we need to assign higher weights
to rare samples in order to jump to unseen examples byGibbs chains.
BoostedContrastive Divergence (1/2)
assign lower
weights to
ordinary samples
assign higher
weights to
rare samples
14/25
hardly
observed
regions
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by the proposed
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
• If we assign the same weight to all the data, the performance ofGibbs
sampling would degrade in the regions that are hardly observed.
• Whenever sampling, we therefore re-weight each observation by the energy
of its reconstruction 𝐸(𝒗 𝑛
(𝑘), 𝒉 𝑛
(𝑘)
).
15/25
BoostedContrastive Divergence (2/2)
Relative locations of samples
and corresponding Markov
chains by PT
Relative locations of samples
and corresponding Markov
chains by the proposed
Relative locations of samples
and corresponding Markov
chains by CD
hardly
observed
regions
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
Categorical Gradient
16/25
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
16/25
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
sparsity term
16/25
reconstruction with and w/o
the sparsity term
• For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001).
• A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively.
• In encoded binary vectors, 75% of the elements are zero.
• To resolve sparsity of 1-hot encoding vectors, we devise a new regularization
technique that incorporates prior knowledge on the sparsity.
Categorical Gradient
sparsity term
16/25
reconstruction with and w/o
the sparsity term
derived from
the sparsity term
ProposedTraining Algorithm
categorical gradient
boosted CD
17/25
• Motivation
• Preliminary
• Boosted contrastive divergence
• Categorical restricted Boltzmann machine
• Experiment results
• Conclusion
Outline
18/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
false acceptor 1false donor 1
19/25
• Data preparation:
• Real human DNA sequences with known boundary information.
• GWH dataset: 2-class (boundary or not).
• UCSC dataset: 3-class (acceptor, donor, or non-boundary).
Results
Effects of
categorical gradient
Effects of boosting
Effects on the
splicing prediction
CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG
true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor
false acceptor 1false donor 1
19/25
• The proposed method shows the best performance in terms of reconstruction
error for both training and testing.
• Compare to the softmax approach, the proposed regularized RBM succeeds in
achieving lower error by slightly sacrificing the probability sum constraint.
Results: Effects ofCategorical Gradient
Data: chromosome 19 in
GWH-donor
Sequence Length: 200nt
(800 dimension)
# of iterations: 500
Learning rate: 0.1
L2-decay: 0.001
over-fitted best
20/25
• For simulating a class-
imbalance situation
• we randomly
dropped samples
with different drop
rates for different
classes.
Results: Effects of Boosting
• For simulating a class-
imbalance situation
• we randomly
dropped samples
with different drop
rates for different
classes.
Results: Effects of Boosting
Description
Training
cost
Noise
handling
Class-imbalance
handling
CD (Hinton,
Neural Comp. 2002)
Standard and
widely used
- - -
Persistent CD
(Tieleman, ICML 2008)
Use of a single
Markov chain
- -
Parallel tempering
(Cho et al., IJCNN 2010)
Simultaneous Markov
chains generation
Proposed boosted CD Reweighting samples -
Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
22/25
Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
Insensitivity to sequence lengths
22/25
Results: Improved Performance and Robustness
2-class classification performance 3-class classification Runtime
Insensitivity to sequence lengths Robustness to negative samples
22/25
exon intron
• (Important biological finding) non-canonical splicing can arise if:
• Introns containGCA or NAA sequences at their boundaries.
• Exons include contiguousA’s around the boundaries.
Results: Identification of Non-Canonical Splice Sites
We used 162,951
examples excluding
canonical splice sites.
23/25
• We proposed a new RBM training method called boosted CD with categorical
gradients that improves conventionalCD for class-imbalanced data.
• Significant boosts in splicing prediction in terms of accuracy and runtime.
• Increased robustness to high-dimensional class-imbalanced data.
• The proposed scheme shows the ability to detect subtle non-canonical
splicing signals that often could not be identified by traditional methods.
• Future work: additional validation using various class-imbalance datasets.
24/25
Conclusion
• Our lab members
• Financial supports
• ICML 2015 travel scholarship
Acknowledgements
June 2, 2015
25/25
• Our lab members
• Financial supports
• ICML 2015 travel scholarship
Acknowledgements
June 2, 2015
25/25
• The proposed DBN showed xx% higher performance in terms of the F1-score.
• RNN is appropriate for sequence modeling. However, splicing signals are often
too far from the boundaries and hard to maintain splicing information.
Backup:Comparison with Recurrent Neural Networks (RNNs)
To be placed
Backup/25

More Related Content

Similar to ICML2015 Slides

powerpoint feb
powerpoint febpowerpoint feb
powerpoint febimu409
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”Dr.(Mrs).Gethsiyal Augasta
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceHansol Kang
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"Lviv Startup Club
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15Hao Zhuang
 
Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...Atef Alghzzy
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachAlexander Rakhlin
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 

Similar to ICML2015 Slides (20)

Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
adjoint10_nilsvanvelzen
adjoint10_nilsvanvelzenadjoint10_nilsvanvelzen
adjoint10_nilsvanvelzen
 
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
Dmytro Panchenko "Cracking Kaggle: Human Protein Atlas"
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
 
Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...Density based spatial clustering of applications with noises for dna methylat...
Density based spatial clustering of applications with noises for dna methylat...
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

ICML2015 Slides

  • 1. Advanced Computing Laboratory Electrical and Computer Engineering Seoul National University Taehoon Lee SungrohYoon BoostedCategorical Restricted Boltzmann Machine forComputational Prediction of Splice Junctions
  • 2. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 2/25
  • 3. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. Motivation 3/25
  • 4. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. Motivation negative positive 3/25
  • 5. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. Motivation negative positive easy to misclassify query images 3/25
  • 6. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. • Q. How can we learn minor but important features using neural networks? Motivation negative positive easy to misclassify query images 3/25
  • 7. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. • Q. How can we learn minor but important features using neural networks? • We propose a new RBM training method called boosted CD. Motivation negative positive easy to misclassify query images 3/25
  • 8. • Deep Neural Networks (DNN) show human level performance on many recognition tasks. • We focus on class-imbalanced prediction. • Insufficient samples to represent the true distribution of a class. • Q. How can we learn minor but important features using neural networks? • We propose a new RBM training method called boosted CD. • We also devise a regularization term for sparsity of DNA sequences. Motivation negative positive easy to misclassify query images 3/25
  • 9. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem 4/25
  • 10. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein 4/25
  • 11. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein gene expression 4/25
  • 12. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein gene expression exon 4/25
  • 13. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem DNA RNA protein gene expression exon intron 4/25
  • 14. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem exon GT: false boundary GT: true boundary ACGTCGACTGCTACGTAGCAGCGA TACGTACCGATCATCACTATCATC GAGGTACGATCGATCGATCGATCA GTCGATCGTCGTTCAGTCAGTCGA TATCAGTCATATGCACATCTCAGT DNA RNA protein gene expression exon intron 4/25
  • 15. • Genetic information flows through the gene expression process. • DNA: a sequence of four types of nucleotides (A,G,T,C). • Gene: a segment of DNA (the basic unit of heredity). (Splice) Junction Prediction: ExtremelyClass-Imbalanced Problem exon GT: false boundary GT: true boundary ACGTCGACTGCTACGTAGCAGCGA TACGTACCGATCATCACTATCATC GAGGTACGATCGATCGATCGATCA GTCGATCGTCGTTCAGTCAGTCGA TATCAGTCATATGCACATCTCAGT DNA RNA protein gene expression GT (or AG) 16K 76M true sites exon intron 160K (=0.21% over 76M) 4/25
  • 16. • Two approaches: • Machine learning-based: • ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991), • SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007), • HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006). • Sequence alignment-based: • TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010), RUM (Grant et al., 2011). PreviousWork on Junction Prediction 1 2 5/25
  • 17. • Two approaches: • Machine learning-based: • ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991), • SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007), • HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006). • Sequence alignment-based: • TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010), RUM (Grant et al., 2011). PreviousWork on Junction Prediction We want to construct a learning model which can boost prediction performance in a complementary way to alignment-based method. 1 2 1 2 5/25
  • 18. • Two approaches: • Machine learning-based: • ANN (Stormo et al., 1982; Noordewier et al., 1990; Brunak et al., 1991), • SVM (Degroeve et al., 2005; Huang et al., 2006; Sonnenburg et al., 2007), • HMM (Reese et al., 1997; Pertea et al., 2001; Baten et al., 2006). • Sequence alignment-based: • TopHat (Trapnell et al., 2010), MapSplice (Wang et al., 2010), RUM (Grant et al., 2011). PreviousWork on Junction Prediction We want to construct a learning model which can boost prediction performance in a complementary way to alignment-based method. 1 2 1 2 We propose a learning model based on (multilayer) RBMs and its training scheme. 5/25
  • 19. • Training methods of RBM • RBM for categorical values • Softmax input units (Salakhutdinov et al., ICML 2007). • Class-imbalance problems • Refer to a review byGalar et al. (IEEET SMC 2012). Related Methodologies Description Training cost Noise handling Class-imbalance handling CD (Hinton, Neural Comp. 2002) Standard and widely used - - - Persistent CD (Tieleman, ICML 2008) Use of a single Markov chain - - Parallel tempering (Cho et al., IJCNN 2010) Simultaneous Markov chains generation 6/25
  • 21. Main Contributions New RBM training methods called boosted CD 7/25
  • 22. Main Contributions New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 7/25
  • 23. Main Contributions Significant boosts in splicing prediction performance New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 7/25
  • 24. Main Contributions Significant boosts in splicing prediction performance Robustness to high-dimensional class-imbalanced data New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 7/25
  • 25. Main Contributions Significant boosts in splicing prediction performance Robustness to high-dimensional class-imbalanced data New RBM training methods called boosted CD New penalty term to handle sparsity of DNA sequences 25/25 The ability to detect subtle non-canonical splicing signals
  • 26. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 8/25
  • 27. • RBM is a type of logistic belief network whose structure is a bipartite graph. • Nodes: • Input layer: • Hidden layer: • Probability of a configuration : • • • Each node is a stochastic binary unit: • • can be used as a feature. Restricted Boltzmann Machines 9/25
  • 28. • Training weights to minimize negative log-likelihood of data. • Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps. • The CD-𝑘 updates after seeing example 𝒗: Contrastive Divergence (CD) forTraining RBMs 𝒗(0) = 𝒗 𝒉(0) 𝒉(1) 𝒉(𝑘) 𝒗(1) 𝒗(𝑘) 10/25
  • 29. • Training weights to minimize negative log-likelihood of data. • Run the MCMC chain 𝒗(0), 𝒗(1),… , 𝒗(𝑘) for 𝑘 steps. • The CD-𝑘 updates after seeing example 𝒗: Contrastive Divergence (CD) forTraining RBMs approximated by k-step Markov chain 𝒗(0) = 𝒗 𝒉(0) 𝒉(1) 𝒉(𝑘) 𝒗(1) 𝒗(𝑘) 10/25
  • 30. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 11/25
  • 31. Overview of Proposed Methodology 12/25
  • 32. Overview of Proposed Methodology 12/25
  • 33. Overview of Proposed Methodology 12/25
  • 34. Overview of Proposed Methodology 12/25
  • 35. Overview of Proposed Methodology 12/25
  • 36. Overview of Proposed Methodology 12/25
  • 37. • Boosting is a meta-algorithm which converts weak learners to strong ones. • Most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. • The main variation between many boosting algorithms: • The method of weighting training data points and hypotheses. • AdaBoost, LPBoost,TotalBoost, … What Boosting Is from lecture notes @ UCIrvine CS 271 Fall 2007 13/25
  • 38. • Contrastive divergence training is looped over all mini-batches and known to be stable. BoostedContrastive Divergence (1/2) 14/25
  • 39. • Contrastive divergence training is looped over all mini-batches and known to be stable. BoostedContrastive Divergence (1/2) 14/25 hardly observed regions
  • 40. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) 14/25 hardly observed regions
  • 41. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) 14/25 hardly observed regions
  • 42. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) assign higher weights to rare samples 14/25 hardly observed regions
  • 43. • Contrastive divergence training is looped over all mini-batches and known to be stable. • However, for a class-imbalance distribution, we need to assign higher weights to rare samples in order to jump to unseen examples byGibbs chains. BoostedContrastive Divergence (1/2) assign lower weights to ordinary samples assign higher weights to rare samples 14/25 hardly observed regions
  • 44. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2)
  • 45. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2) Relative locations of samples and corresponding Markov chains by CD hardly observed regions
  • 46. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2) Relative locations of samples and corresponding Markov chains by the proposed Relative locations of samples and corresponding Markov chains by CD hardly observed regions
  • 47. • If we assign the same weight to all the data, the performance ofGibbs sampling would degrade in the regions that are hardly observed. • Whenever sampling, we therefore re-weight each observation by the energy of its reconstruction 𝐸(𝒗 𝑛 (𝑘), 𝒉 𝑛 (𝑘) ). 15/25 BoostedContrastive Divergence (2/2) Relative locations of samples and corresponding Markov chains by PT Relative locations of samples and corresponding Markov chains by the proposed Relative locations of samples and corresponding Markov chains by CD hardly observed regions
  • 48. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. Categorical Gradient 16/25
  • 49. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. • To resolve sparsity of 1-hot encoding vectors, we devise a new regularization technique that incorporates prior knowledge on the sparsity. Categorical Gradient 16/25
  • 50. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. • To resolve sparsity of 1-hot encoding vectors, we devise a new regularization technique that incorporates prior knowledge on the sparsity. Categorical Gradient sparsity term 16/25 reconstruction with and w/o the sparsity term
  • 51. • For biological sequences, 1-hot encoding is widely used (Baldi & Brunak, 2001). • A,C,G, andT are encoded by 1000, 0100, 0010, and 0001, respectively. • In encoded binary vectors, 75% of the elements are zero. • To resolve sparsity of 1-hot encoding vectors, we devise a new regularization technique that incorporates prior knowledge on the sparsity. Categorical Gradient sparsity term 16/25 reconstruction with and w/o the sparsity term derived from the sparsity term
  • 53. • Motivation • Preliminary • Boosted contrastive divergence • Categorical restricted Boltzmann machine • Experiment results • Conclusion Outline 18/25
  • 54. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction 19/25
  • 55. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG 19/25
  • 56. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor 19/25
  • 57. • Data preparation: • Real human DNA sequences with known boundary information. Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor false acceptor 1false donor 1 19/25
  • 58. • Data preparation: • Real human DNA sequences with known boundary information. • GWH dataset: 2-class (boundary or not). • UCSC dataset: 3-class (acceptor, donor, or non-boundary). Results Effects of categorical gradient Effects of boosting Effects on the splicing prediction CGTAGCAGCGATACGTACCGATCGTCACTATCATCGAGGTACGAGAGATCGATCGGCAACG true acceptor 1 true donor 1 true acceptor 2 non-canonical true donor false acceptor 1false donor 1 19/25
  • 59. • The proposed method shows the best performance in terms of reconstruction error for both training and testing. • Compare to the softmax approach, the proposed regularized RBM succeeds in achieving lower error by slightly sacrificing the probability sum constraint. Results: Effects ofCategorical Gradient Data: chromosome 19 in GWH-donor Sequence Length: 200nt (800 dimension) # of iterations: 500 Learning rate: 0.1 L2-decay: 0.001 over-fitted best 20/25
  • 60. • For simulating a class- imbalance situation • we randomly dropped samples with different drop rates for different classes. Results: Effects of Boosting
  • 61. • For simulating a class- imbalance situation • we randomly dropped samples with different drop rates for different classes. Results: Effects of Boosting Description Training cost Noise handling Class-imbalance handling CD (Hinton, Neural Comp. 2002) Standard and widely used - - - Persistent CD (Tieleman, ICML 2008) Use of a single Markov chain - - Parallel tempering (Cho et al., IJCNN 2010) Simultaneous Markov chains generation Proposed boosted CD Reweighting samples -
  • 62. Results: Improved Performance and Robustness 2-class classification performance 3-class classification Runtime 22/25
  • 63. Results: Improved Performance and Robustness 2-class classification performance 3-class classification Runtime Insensitivity to sequence lengths 22/25
  • 64. Results: Improved Performance and Robustness 2-class classification performance 3-class classification Runtime Insensitivity to sequence lengths Robustness to negative samples 22/25
  • 65. exon intron • (Important biological finding) non-canonical splicing can arise if: • Introns containGCA or NAA sequences at their boundaries. • Exons include contiguousA’s around the boundaries. Results: Identification of Non-Canonical Splice Sites We used 162,951 examples excluding canonical splice sites. 23/25
  • 66. • We proposed a new RBM training method called boosted CD with categorical gradients that improves conventionalCD for class-imbalanced data. • Significant boosts in splicing prediction in terms of accuracy and runtime. • Increased robustness to high-dimensional class-imbalanced data. • The proposed scheme shows the ability to detect subtle non-canonical splicing signals that often could not be identified by traditional methods. • Future work: additional validation using various class-imbalance datasets. 24/25 Conclusion
  • 67. • Our lab members • Financial supports • ICML 2015 travel scholarship Acknowledgements June 2, 2015 25/25
  • 68. • Our lab members • Financial supports • ICML 2015 travel scholarship Acknowledgements June 2, 2015 25/25
  • 69. • The proposed DBN showed xx% higher performance in terms of the F1-score. • RNN is appropriate for sequence modeling. However, splicing signals are often too far from the boundaries and hard to maintain splicing information. Backup:Comparison with Recurrent Neural Networks (RNNs) To be placed Backup/25