SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Sequence Modeling:
Recurrent and Recursive
Nets (part 2)
M. Sohaib Alam
17 June, 2017
Deep Learning Textbook Study
Meetup Group
Bidirectional RNNs
Motivation:
Sequences where context matters; ideally
have knowledge about the future as well as
past, e.g. speech, hand-writing.
h(t)
: state of sub-RNN moving forward in time
g(t)
: state of sub-RNN moving backward in
time
Extend architecture to go forward/backward
in n dimensions (involving 2n sub-RNNs), e.g.
4 sub-RNNs with input 2d images can
capture long-range lateral interactions
between features, but more expensive to train
than convolutional neural nets.
Encoder-Decoder Sequence-to-Sequence
Architectures
Allow for input and output sequences of different
lengths. Applications include speech recognition,
machine translation or question/answering.
C: vector, or sequence of vectors, summarizing the
input sequence X = (x(1)
, …, x(n)
).
Encoder: input RNN
Decoder: output RNN
Both RNNs trained jointly to maximize average of
log P (y(1)
, …, yn_y
| x(1)
, …, x(n_x)
)
Deep Recurrent Networks
Typically, RNNs can be decomposed into 3
blocks:
- Input-to-hidden
- Hidden-to-hidden
- Hidden-to-ouput
Basic idea here: introduce depth in each of the
above blocks.
Fig (a) : Lower levels transform raw input to
more appropriate transformation
Fig (b) : Add extra layers in the recurrence
relationship
Fig (c) : Mitigate longer distance from t to t+1 by
adding skip connections
Recursive Neural Networks
Generalize computational graph from chain to a
tree.
For sequence of length T, the depth (number of
compositions of non-linear operations) can be
reduced from O(T) to O(log T) (simplest way to
see this is to solve for 2depth
~ T, assuming
branching factor of 2).
Open question: How to best structure the tree. In
practice, depends on the problem at hand.
Ideally, the learner itself infers and implements
the appropriate structure given the input.
Challenge of Long-Term Dependencies
Basic problem:
- Gradients propagated over several time steps tend to either vanish or explode
We can think of recurrence relation
as a simple RNN lacking inputs and non-linear activation function. This can be simplified to
so that if W admits an eigen-decomposition of the form
with Q an orthogonal matrix, further simplifying the recurrence relation to
Thus eigenvalues ei
with |ei
| < 1 will tend to decay to zero, while those with |ei
| > 1 will tend to
explode, eventually causing any component of h(0)
that is not aligned with the largest eigenvector to
be discarded.
Challenge of Long-Term Dependencies
Problem inherent to RNNs. For non-recurrent networks, we can always choose different weights
at different time-steps.
Imagine a scalar weight w getting multiplied by itself several times at each time step.
● The product wt
will either vanish or explode given the magnitude of w.
● On the other hand, if every w(t)
is independent but identically distributed with mean 0 and
variance v, then the state at time is product of all w(t)
’s and the variance of the product is
O(vn
).
For non-recurrent deep feedforward networks, we may achieve some desired variance v*
by
sampling individual weights with variance (v*
)1/n
, and thus avoid the vanishing and exploding
gradient problem.
Open problem: Allow an RNN to learn long-term dependencies without vanishing/exploding
parameters.
Echo State Networks
Hidden-to-hidden and input-to-hidden weights are usually most difficult parameters to learn in an
RNN.
Echo State Networks (ESNs): Set recurrent weights such that hidden units capture history of
past inputs, and learn only the output weights.
Liquid State Machines: Same as above, except uses binary output neurons instead of
continuous-valued hidden units used for ESNs.
This approach is collectively referred to as reservoir computing (hidden units form a reservoir of
temporal features, capturing different aspects of input history).
Echo State Networks
Spectral radius: Largest eigenvalue of the Jacobian at time t,
Suppose J has eigenvector v with eigenvalue lambda. Further suppose we want to
back-propagate a gradient vector g back in time, and compare this to back-propagating the
perturbed vector (g + delta v). The two different executions, after n propagation steps, diverge by
delta |lambda|^n, which if |lambda| > 1, can grow exponentially large, and if |lambda| < 1, can
vanish. (We can similarly replace back-propagation with forward propagation and removing
non-linearity).
Strategy in ESNs is to fix weights to have some bounded spectral radius, such that information is
carried through time but does not explode/vanish.
Strategies for Multiple Time Scales
Design models that operate at multiple time scales, e.g. some parts operating at fine-grained time
scales, others at more coarse-grained scales.
Adding Skip Connections Through Time:
Add direct connections from variables in distant past to variables in present, instead of just from time
t to time t+1.
Leaky Units and a Spectrum of Different Time Scales:
Design units with linear self-connections and weights near 1. As an analogy, consider accumulating
the running average mu(t) of some variable v(t) via
When alpha is close to 1, the running average remembers the past for a long time. Hidden units with
such linear self-connections and weights close to 1 can behave similarly.
Removing connections:
Remove length-one connections and replace them with length-(larger number) connections.
LSTM and Other Gated RNNs
As of now, the most effective sequence models used in practical applications are gated RNNs:
including long short-term memory (LSTM) and networks based on the gated recurrent unit.
Basic idea: Create paths through time with derivatives that don’t explode/vanish, by allowing
connection weights to become functions of time. Leaky units can allow network to accumulate
information over a long period of time, but there should also be a forgetting mechanism when that
info becomes irrelevant. Ideally, we want the network itself to decide when to forget.
LSTM
The state unit si
(t)
has a linear self-loop similar to
the leaky units described in the previous section.
The self-loop weight is controlled by a forget gate
unit
x(t)
: current input vector
h(t)
: current hidden layer vector
bf
: biases
Uf
: input weights
Wf
: recurrent weights
LSTM “cell”
LSTM
The LSTM cell internal state is then updated as
follows
where
b: biases
U: input weights
W: recurrent weights
gi
(t)
: external input gate, similar to forget gate but
with its own parameters
LSTM “cell”
The output of the LSTM cell can also be shut off via
the output gate qi
(t)
One can choose to use the cell state si
(t)
as an extra
input (with its own weight) into the three gates of
the i-th unit, as shown in the figure.
LSTMs have been shown to learn long-term
dependencies more easily than simple recurrent
architectures.
LSTM
LSTM “cell”
Other Gated RNNs
Main difference from LSTM: Single gating unit simultaneously controls the forgetting factor and decision
to update unit.
u: update gate
r: reset gate
Both gates can individually ignore parts of the state vector.
Optimization for Long-Term Dependencies
Basic problem: Vanishing and exploding gradients when optimizing RNNs over many time steps.
Clipping Gradients: Cost function can have sharp cliffs as a function of the weights/biases. Gradient
direction can change dramatically within a short distance. Solution: reduce step-size in direction of
gradient if it gets too large:
where v is the norm threshold, and g is gradient.
Optimization for Long-Term Dependencies
Regularizing to Encourage Information Flow: Previous technique helps with exploding gradients,
but not vanishing gradients. Ideally, we’d like to be as large as
so that it maintains its magnitude as it gets back-propagated. We could therefore use the following
term as a regularizer to achieve this effect:

Mais conteúdo relacionado

Mais procurados

Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural NetworksSangwoo Mo
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance TheoryNaveen Kumar
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashrisheetal katkar
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Jason Tsai
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so WellChun-Ming Chang
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 

Mais procurados (20)

Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance Theory
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
 
Why Batch Normalization Works so Well
Why Batch Normalization Works so WellWhy Batch Normalization Works so Well
Why Batch Normalization Works so Well
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 

Semelhante a Recurrent and Recursive Nets (part 2)

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Universitat Politècnica de Catalunya
 
Role of Tensors in Machine Learning
Role of Tensors in Machine LearningRole of Tensors in Machine Learning
Role of Tensors in Machine LearningAnima Anandkumar
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.pptManiMaran230751
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionSaumyaMundra3
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)EmmanuelJosterSsenjo
 
lepibwp74jd2rz.pdf
lepibwp74jd2rz.pdflepibwp74jd2rz.pdf
lepibwp74jd2rz.pdfSajalTyagi6
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputingfikrul islamy
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputingfikrul islamy
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksSteve Nouri
 
Chaos in Small-World Networks
Chaos in Small-World NetworksChaos in Small-World Networks
Chaos in Small-World NetworksXin-She Yang
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networksswapnac12
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesAbhijitVenkatesh1
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)Susang Kim
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesYannis Flet-Berliac
 
Applications of Wavelet Transform
Applications of Wavelet TransformApplications of Wavelet Transform
Applications of Wavelet Transformijtsrd
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxSagarTekwani4
 

Semelhante a Recurrent and Recursive Nets (part 2) (20)

Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
 
Role of Tensors in Machine Learning
Role of Tensors in Machine LearningRole of Tensors in Machine Learning
Role of Tensors in Machine Learning
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)
 
lepibwp74jd2rz.pdf
lepibwp74jd2rz.pdflepibwp74jd2rz.pdf
lepibwp74jd2rz.pdf
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
 
Secrets of supercomputing
Secrets of supercomputingSecrets of supercomputing
Secrets of supercomputing
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
Chaos in Small-World Networks
Chaos in Small-World NetworksChaos in Small-World Networks
Chaos in Small-World Networks
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
Applying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language ServicesApplying Deep Learning Machine Translation to Language Services
Applying Deep Learning Machine Translation to Language Services
 
Applications of Wavelet Transform
Applications of Wavelet TransformApplications of Wavelet Transform
Applications of Wavelet Transform
 
Conformer review
Conformer reviewConformer review
Conformer review
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 

Último

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 

Último (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Recurrent and Recursive Nets (part 2)

  • 1. Sequence Modeling: Recurrent and Recursive Nets (part 2) M. Sohaib Alam 17 June, 2017 Deep Learning Textbook Study Meetup Group
  • 2. Bidirectional RNNs Motivation: Sequences where context matters; ideally have knowledge about the future as well as past, e.g. speech, hand-writing. h(t) : state of sub-RNN moving forward in time g(t) : state of sub-RNN moving backward in time Extend architecture to go forward/backward in n dimensions (involving 2n sub-RNNs), e.g. 4 sub-RNNs with input 2d images can capture long-range lateral interactions between features, but more expensive to train than convolutional neural nets.
  • 3. Encoder-Decoder Sequence-to-Sequence Architectures Allow for input and output sequences of different lengths. Applications include speech recognition, machine translation or question/answering. C: vector, or sequence of vectors, summarizing the input sequence X = (x(1) , …, x(n) ). Encoder: input RNN Decoder: output RNN Both RNNs trained jointly to maximize average of log P (y(1) , …, yn_y | x(1) , …, x(n_x) )
  • 4. Deep Recurrent Networks Typically, RNNs can be decomposed into 3 blocks: - Input-to-hidden - Hidden-to-hidden - Hidden-to-ouput Basic idea here: introduce depth in each of the above blocks. Fig (a) : Lower levels transform raw input to more appropriate transformation Fig (b) : Add extra layers in the recurrence relationship Fig (c) : Mitigate longer distance from t to t+1 by adding skip connections
  • 5. Recursive Neural Networks Generalize computational graph from chain to a tree. For sequence of length T, the depth (number of compositions of non-linear operations) can be reduced from O(T) to O(log T) (simplest way to see this is to solve for 2depth ~ T, assuming branching factor of 2). Open question: How to best structure the tree. In practice, depends on the problem at hand. Ideally, the learner itself infers and implements the appropriate structure given the input.
  • 6. Challenge of Long-Term Dependencies Basic problem: - Gradients propagated over several time steps tend to either vanish or explode We can think of recurrence relation as a simple RNN lacking inputs and non-linear activation function. This can be simplified to so that if W admits an eigen-decomposition of the form with Q an orthogonal matrix, further simplifying the recurrence relation to Thus eigenvalues ei with |ei | < 1 will tend to decay to zero, while those with |ei | > 1 will tend to explode, eventually causing any component of h(0) that is not aligned with the largest eigenvector to be discarded.
  • 7. Challenge of Long-Term Dependencies Problem inherent to RNNs. For non-recurrent networks, we can always choose different weights at different time-steps. Imagine a scalar weight w getting multiplied by itself several times at each time step. ● The product wt will either vanish or explode given the magnitude of w. ● On the other hand, if every w(t) is independent but identically distributed with mean 0 and variance v, then the state at time is product of all w(t) ’s and the variance of the product is O(vn ). For non-recurrent deep feedforward networks, we may achieve some desired variance v* by sampling individual weights with variance (v* )1/n , and thus avoid the vanishing and exploding gradient problem. Open problem: Allow an RNN to learn long-term dependencies without vanishing/exploding parameters.
  • 8. Echo State Networks Hidden-to-hidden and input-to-hidden weights are usually most difficult parameters to learn in an RNN. Echo State Networks (ESNs): Set recurrent weights such that hidden units capture history of past inputs, and learn only the output weights. Liquid State Machines: Same as above, except uses binary output neurons instead of continuous-valued hidden units used for ESNs. This approach is collectively referred to as reservoir computing (hidden units form a reservoir of temporal features, capturing different aspects of input history).
  • 9. Echo State Networks Spectral radius: Largest eigenvalue of the Jacobian at time t, Suppose J has eigenvector v with eigenvalue lambda. Further suppose we want to back-propagate a gradient vector g back in time, and compare this to back-propagating the perturbed vector (g + delta v). The two different executions, after n propagation steps, diverge by delta |lambda|^n, which if |lambda| > 1, can grow exponentially large, and if |lambda| < 1, can vanish. (We can similarly replace back-propagation with forward propagation and removing non-linearity). Strategy in ESNs is to fix weights to have some bounded spectral radius, such that information is carried through time but does not explode/vanish.
  • 10. Strategies for Multiple Time Scales Design models that operate at multiple time scales, e.g. some parts operating at fine-grained time scales, others at more coarse-grained scales. Adding Skip Connections Through Time: Add direct connections from variables in distant past to variables in present, instead of just from time t to time t+1. Leaky Units and a Spectrum of Different Time Scales: Design units with linear self-connections and weights near 1. As an analogy, consider accumulating the running average mu(t) of some variable v(t) via When alpha is close to 1, the running average remembers the past for a long time. Hidden units with such linear self-connections and weights close to 1 can behave similarly. Removing connections: Remove length-one connections and replace them with length-(larger number) connections.
  • 11. LSTM and Other Gated RNNs As of now, the most effective sequence models used in practical applications are gated RNNs: including long short-term memory (LSTM) and networks based on the gated recurrent unit. Basic idea: Create paths through time with derivatives that don’t explode/vanish, by allowing connection weights to become functions of time. Leaky units can allow network to accumulate information over a long period of time, but there should also be a forgetting mechanism when that info becomes irrelevant. Ideally, we want the network itself to decide when to forget.
  • 12. LSTM The state unit si (t) has a linear self-loop similar to the leaky units described in the previous section. The self-loop weight is controlled by a forget gate unit x(t) : current input vector h(t) : current hidden layer vector bf : biases Uf : input weights Wf : recurrent weights LSTM “cell”
  • 13. LSTM The LSTM cell internal state is then updated as follows where b: biases U: input weights W: recurrent weights gi (t) : external input gate, similar to forget gate but with its own parameters LSTM “cell”
  • 14. The output of the LSTM cell can also be shut off via the output gate qi (t) One can choose to use the cell state si (t) as an extra input (with its own weight) into the three gates of the i-th unit, as shown in the figure. LSTMs have been shown to learn long-term dependencies more easily than simple recurrent architectures. LSTM LSTM “cell”
  • 15. Other Gated RNNs Main difference from LSTM: Single gating unit simultaneously controls the forgetting factor and decision to update unit. u: update gate r: reset gate Both gates can individually ignore parts of the state vector.
  • 16. Optimization for Long-Term Dependencies Basic problem: Vanishing and exploding gradients when optimizing RNNs over many time steps. Clipping Gradients: Cost function can have sharp cliffs as a function of the weights/biases. Gradient direction can change dramatically within a short distance. Solution: reduce step-size in direction of gradient if it gets too large: where v is the norm threshold, and g is gradient.
  • 17. Optimization for Long-Term Dependencies Regularizing to Encourage Information Flow: Previous technique helps with exploding gradients, but not vanishing gradients. Ideally, we’d like to be as large as so that it maintains its magnitude as it gets back-propagated. We could therefore use the following term as a regularizer to achieve this effect: