SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Zoneout: Regularization RNNs by Randomly
Preserving Hidden Activations
Krueger et al. In CoRR 2016
Federico Raue
Reading Group at DFKI
27-September-2016
Content
Dropout in Feed-forward Networks
Related Work
Dropout in RNN
Stochastic Depth
Zoneout
Experiments
Sequential Permuted MNIST
Character level – Penn Treebank
Word level – Penn Treebank
Conclusions
Dropout in Feed-forward Networks
Dropout in Feed-forward Networks
1
1
N. Srivastava et al. (2014). “Dropout: A Simple Way to Prevent Neural
Networks from Overfitting”. In: Journal of Machine Learning Research 15.
Dropout in Feed-forward Networks
Related Work
Dropout in RNN
Train a pseudo-ensemble model2
the source network is the parent model
each sampled model is the child model
noise process → sampling node masks → extract subnetworks
2
P. Bachman et al. (2014). “Learning with pseudo-ensembles”. In:
Advances in Neural Information Processing Systems.
Dropout in RNN
Figure: First attempts of Dropout in RNN34
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
Dropout in RNN
Figure: First attempts of Dropout in RNN34
Only apply to dropout feed-forward connections (up to stack)
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
Dropout in RNN
Figure: First attempts of Dropout in RNN34
Only apply to dropout feed-forward connections (up to stack), and
not recurrent connection (forward through time)
3
V. Pham et al. (2014). “Dropout improves recurrent neural networks for
handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR),
2014 14th International Conference on. IEEE.
4
W. Zaremba et al. (2014). “Recurrent neural network regularization”. In:
arXiv preprint arXiv:1409.2329.
Dropout in RNN
Vanilla RNN
ht = f (Wh[xt, ht−1] + bh])
Dropout in RNN
Vanilla RNN
ht = f (Wh[xt, ht−1] + bh])
Vanilla RNN + (recurrent) dropout
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
Dropout in LSTM




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = ft ∗ ct−1 + it ∗ gt
ht = ot ∗ f (ct)
Dropout in LSTM
Dropout in LSTM5




it
ft
ot
gt



 =




σ(Wi [xt, d(ht)] + bi )
σ(Wf [xt, d(ht)] + bf )
σ(Wo[xt, d(ht)] + bo)
f (Wg [xt, d(ht)] + bg )




ct = ft ∗ ct−1 + it ∗ gt
ht = ot ∗ f (ct)
5
Y. Gal (2015). “A theoretically grounded application of dropout in
recurrent neural networks”. In: arXiv preprint arXiv:1512.05287.
Dropout in LSTM6




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = d(ft ∗ ct−1 + it ∗ gt)
ht = ot ∗ f (ct)
6
T. Moon et al. (2015). “Rnndrop: A novel dropout for rnns in asr”. In:
2015 IEEE Workshop on Automatic Speech Recognition and Understanding
(ASRU). IEEE.
Dropout in LSTM7




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = ft ∗ ct−1 + it ∗ d(gt)
ht = ot ∗ f (ct)
7
S. Semeniuta et al. (2016). “Recurrent Dropout without Memory Loss”.
In: arXiv preprint arXiv:1603.05118.
Dropout in LSTM – Summary
Stochastic Depth
Stochastic Depth8
8
G. Huang et al. (2016). “Deep networks with stochastic depth”. In: arXiv
preprint arXiv:1603.09382.
Zoneout
Zoneout9
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
9
D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly
Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
Zoneout9
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0
9
D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly
Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
Zoneout9
ht = f (Wh[xt, d(ht−1)] + bh])
d(x) =
mask ∗ x if training phase
(1 − p)x otherwise,
Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0
Zoneout: τt = pt ∗ ˜τt + (1 − pt) ∗ 1
9
D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly
Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
Zoneout
Figure: Zoneout vs Recurrent Dropout
Again – LSTM equations




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = ft ∗ ct−1 + it ∗ gt
ht = ot ∗ f (ct)
LSTM equations – Zoneout




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = pt ∗ ct−1 + (1 − pt) ∗ (ft ∗ ct−1 + it ∗ gt)
ht = pt ∗ ht−1 + (1 − pt) ∗ (ot ∗ f (ct))
Zoneout + Recurrent Dropout




it
ft
ot
gt



 =




σ(Wi [xt, ht] + bi )
σ(Wf [xt, ht] + bf )
σ(Wo[xt, ht] + bo)
f (Wg [xt, ht] + bg )




ct = (ft ∗ ct−1 + d(it ∗ gt)) recurrent dropout
ht = ((1 − pt) ∗ ot + pt ∗ ot−1) ∗ f (ct) zoneout
Experiments
Sequential Permuted MNIST (1/3)
Sequential MNIST: pixels of an image representing a
number are presented to a RNN one at a time, in lexographic
order (left to right, top to bottom)
Permuted Sequential MNIST: the pixels are represented in
a (fixed) random order
Error Classification
Sequential Permuted MNIST (2/3)
Sequential Permuted MNIST (3/3)
Penn Treebank Corpus
Character level – Penn Treebank (1/2)
BPC = − log2 P(xt+1|yt)
xt+1 correct symbol
yt output of the algorithm
Character level – Penn Treebank (2/2)
BPC = − log2 P(xt+1|yt)
Word level – Penn Treebank (1/2)
Perplexity = dH(p)
= 2− x p(x) log2 p(x)
Word level – Penn Treebank (2/2)
Conclusions
Conclusions
Instead of dropping out neurons, zoneout neurons
More robust to changes in the hidden state
Identity connections of zoneout improve the flow of
information through the network
Conclusions
Instead of dropping out neurons, zoneout neurons
More robust to changes in the hidden state
Identity connections of zoneout improve the flow of
information through the network
Future Work: Adapt the set of probabilities of updating
various units based on the sequence input
References I
Bachman, P. et al. (2014). “Learning with pseudo-ensembles”. In:
Advances in Neural Information Processing Systems,
pp. 3365–3373.
Gal, Y. (2015). “A theoretically grounded application of dropout in
recurrent neural networks”. In: arXiv preprint arXiv:1512.05287.
Huang, G. et al. (2016). “Deep networks with stochastic depth”.
In: arXiv preprint arXiv:1603.09382.
Krueger, D. et al. (2016). “Zoneout: Regularizing RNNs by
Randomly Preserving Hidden Activations”. In: arXiv preprint
arXiv:1606.01305.
Moon, T. et al. (2015). “Rnndrop: A novel dropout for rnns in
asr”. In: 2015 IEEE Workshop on Automatic Speech
Recognition and Understanding (ASRU). IEEE, pp. 65–70.
References II
Pham, V. et al. (2014). “Dropout improves recurrent neural
networks for handwriting recognition”. In: Frontiers in
Handwriting Recognition (ICFHR), 2014 14th International
Conference on. IEEE, pp. 285–290.
Semeniuta, S. et al. (2016). “Recurrent Dropout without Memory
Loss”. In: arXiv preprint arXiv:1603.05118.
Srivastava, N. et al. (2014). “Dropout: A Simple Way to Prevent
Neural Networks from Overfitting”. In: Journal of Machine
Learning Research 15, pp. 1929–1958.
Zaremba, W. et al. (2014). “Recurrent neural network
regularization”. In: arXiv preprint arXiv:1409.2329.

Mais conteúdo relacionado

Mais procurados

40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenext
40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenext40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenext
40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenextYahoo!デベロッパーネットワーク
 
Deep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesDeep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesKang Pilsung
 
Mastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchMastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchSanFengChang
 
Security in wireless sensor networks
Security in wireless sensor networksSecurity in wireless sensor networks
Security in wireless sensor networksPiyush Mittal
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLinaro
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSELiang Yan
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
3GPP F1インターフェース(TS38.470-f50)の概要
3GPP F1インターフェース(TS38.470-f50)の概要3GPP F1インターフェース(TS38.470-f50)の概要
3GPP F1インターフェース(TS38.470-f50)の概要Tetsuya Hasegawa
 
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)Takeshi HASEGAWA
 
3GPP TR23.711-e00まとめ
3GPP TR23.711-e00まとめ3GPP TR23.711-e00まとめ
3GPP TR23.711-e00まとめTetsuya Hasegawa
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering enginesYash Darak
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience aniadkar
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesSudarshan Dhondaley
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理Takuya ASADA
 

Mais procurados (20)

40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenext
40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenext40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenext
40000 コンテナを動かす SRE チームに至るまでの道 1/25(土) SRE NEXT 2020 発表資料 #srenext
 
Deep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesDeep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniques
 
Mastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchMastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree search
 
Security in wireless sensor networks
Security in wireless sensor networksSecurity in wireless sensor networks
Security in wireless sensor networks
 
LAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel AwarenessLAS16-403: GDB Linux Kernel Awareness
LAS16-403: GDB Linux Kernel Awareness
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSE
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
3GPP F1インターフェース(TS38.470-f50)の概要
3GPP F1インターフェース(TS38.470-f50)の概要3GPP F1インターフェース(TS38.470-f50)の概要
3GPP F1インターフェース(TS38.470-f50)の概要
 
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
エンジニアなら知っておきたい「仮想マシン」のしくみ v1.1 (hbstudy 17)
 
Jitsi Meetとは?
Jitsi Meetとは?Jitsi Meetとは?
Jitsi Meetとは?
 
3GPP TR23.711-e00まとめ
3GPP TR23.711-e00まとめ3GPP TR23.711-e00まとめ
3GPP TR23.711-e00まとめ
 
Web clustering engines
Web clustering enginesWeb clustering engines
Web clustering engines
 
Coda file system
Coda file systemCoda file system
Coda file system
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Shoaib
ShoaibShoaib
Shoaib
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理
 

Destaque

Biological inspired system applied to Computer Vision
Biological inspired system applied to Computer VisionBiological inspired system applied to Computer Vision
Biological inspired system applied to Computer VisionFederico Raue
 
Accordion Book
Accordion BookAccordion Book
Accordion Bookquicarroll
 
Washington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper SeriesWashington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper Seriescrysatal16
 
Social Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil SassoSocial Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil SassoSasso Marketing, Inc.
 
Wyklad 2
Wyklad 2Wyklad 2
Wyklad 2marwron
 
Innovation in the public sector oecd eu
Innovation in the public sector oecd eu Innovation in the public sector oecd eu
Innovation in the public sector oecd eu Tommaso Balbo
 
Parent portal a year on
Parent portal a year onParent portal a year on
Parent portal a year onDominic Tester
 
Presentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñozPresentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñozjoel muñoz
 
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...DATAVERSITY
 
accordion book project
accordion book projectaccordion book project
accordion book projectmmudd
 
Accordion Blocks module walkthrough
Accordion Blocks module walkthroughAccordion Blocks module walkthrough
Accordion Blocks module walkthroughAzri Solutions
 
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)Avi Dey
 
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)guimera
 

Destaque (20)

Biological inspired system applied to Computer Vision
Biological inspired system applied to Computer VisionBiological inspired system applied to Computer Vision
Biological inspired system applied to Computer Vision
 
Accordion Book
Accordion BookAccordion Book
Accordion Book
 
Washington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper SeriesWashington & Lee Legal Studies Paper Series
Washington & Lee Legal Studies Paper Series
 
Social Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil SassoSocial Media for Attorneys by Phil Sasso
Social Media for Attorneys by Phil Sasso
 
Wyklad 2
Wyklad 2Wyklad 2
Wyklad 2
 
Pat1[1]
Pat1[1]Pat1[1]
Pat1[1]
 
Innovation in the public sector oecd eu
Innovation in the public sector oecd eu Innovation in the public sector oecd eu
Innovation in the public sector oecd eu
 
Parent portal a year on
Parent portal a year onParent portal a year on
Parent portal a year on
 
Herramientas web 2
Herramientas web 2Herramientas web 2
Herramientas web 2
 
Accordion book
Accordion bookAccordion book
Accordion book
 
Presentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñozPresentación1 angel ortiz ft joel muñoz
Presentación1 angel ortiz ft joel muñoz
 
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
Real-World Data Governance: The Data Governance Road Show from DGIQ – Intervi...
 
Tudo poesia
Tudo poesiaTudo poesia
Tudo poesia
 
accordion book project
accordion book projectaccordion book project
accordion book project
 
Accordion Blocks module walkthrough
Accordion Blocks module walkthroughAccordion Blocks module walkthrough
Accordion Blocks module walkthrough
 
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
Kepps donor advisory_board_fairfax_dc_hub (1) (1) (1)
 
Electric Guitar Overview
Electric Guitar OverviewElectric Guitar Overview
Electric Guitar Overview
 
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
ROSSETTI, Dante Gabriel, Featured Paintings in Detail (2)
 
Building for the Future
Building for the FutureBuilding for the Future
Building for the Future
 
Hindu Gods
Hindu GodsHindu Gods
Hindu Gods
 

Semelhante a Zoneout

On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network ModelingYang Zhang
 
diffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdfdiffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdf수철 박
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化Ryo Hayakawa
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationArthur Mensch
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...André Panisson
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.pptManiMaran230751
 
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCTed Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCMLconf
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Ana Luísa Pinho
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksGiuseppe Broccolo
 
Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Oleg Ovcharenko
 
Winter school-pq2016v2
Winter school-pq2016v2Winter school-pq2016v2
Winter school-pq2016v2Ludovic Perret
 
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...SMART Infrastructure Facility
 

Semelhante a Zoneout (20)

On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network Modeling
 
diffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdfdiffusion 모델부터 DALLE2까지.pdf
diffusion 모델부터 DALLE2까지.pdf
 
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
CLIM Program: Remote Sensing Workshop, Multilayer Modeling and Analysis of Co...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
近似メッセージ伝搬法に基づく離散値ベクトル再構成の一般化
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt
 
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCTed Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural Networks
 
Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...Surface-related multiple elimination through orthogonal encoding in the laten...
Surface-related multiple elimination through orthogonal encoding in the laten...
 
Winter school-pq2016v2
Winter school-pq2016v2Winter school-pq2016v2
Winter school-pq2016v2
 
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
SMART Seminar Series: "A journey in the zoo of Turing patterns: the topology ...
 
Technical
TechnicalTechnical
Technical
 

Último

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Último (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

Zoneout

  • 1. Zoneout: Regularization RNNs by Randomly Preserving Hidden Activations Krueger et al. In CoRR 2016 Federico Raue Reading Group at DFKI 27-September-2016
  • 2. Content Dropout in Feed-forward Networks Related Work Dropout in RNN Stochastic Depth Zoneout Experiments Sequential Permuted MNIST Character level – Penn Treebank Word level – Penn Treebank Conclusions
  • 4. Dropout in Feed-forward Networks 1 1 N. Srivastava et al. (2014). “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. In: Journal of Machine Learning Research 15.
  • 7. Dropout in RNN Train a pseudo-ensemble model2 the source network is the parent model each sampled model is the child model noise process → sampling node masks → extract subnetworks 2 P. Bachman et al. (2014). “Learning with pseudo-ensembles”. In: Advances in Neural Information Processing Systems.
  • 8. Dropout in RNN Figure: First attempts of Dropout in RNN34 3 V. Pham et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE. 4 W. Zaremba et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.
  • 9. Dropout in RNN Figure: First attempts of Dropout in RNN34 Only apply to dropout feed-forward connections (up to stack) 3 V. Pham et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE. 4 W. Zaremba et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.
  • 10. Dropout in RNN Figure: First attempts of Dropout in RNN34 Only apply to dropout feed-forward connections (up to stack), and not recurrent connection (forward through time) 3 V. Pham et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE. 4 W. Zaremba et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.
  • 11. Dropout in RNN Vanilla RNN ht = f (Wh[xt, ht−1] + bh])
  • 12. Dropout in RNN Vanilla RNN ht = f (Wh[xt, ht−1] + bh]) Vanilla RNN + (recurrent) dropout ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise,
  • 13. Dropout in LSTM     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = ft ∗ ct−1 + it ∗ gt ht = ot ∗ f (ct)
  • 15. Dropout in LSTM5     it ft ot gt     =     σ(Wi [xt, d(ht)] + bi ) σ(Wf [xt, d(ht)] + bf ) σ(Wo[xt, d(ht)] + bo) f (Wg [xt, d(ht)] + bg )     ct = ft ∗ ct−1 + it ∗ gt ht = ot ∗ f (ct) 5 Y. Gal (2015). “A theoretically grounded application of dropout in recurrent neural networks”. In: arXiv preprint arXiv:1512.05287.
  • 16. Dropout in LSTM6     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = d(ft ∗ ct−1 + it ∗ gt) ht = ot ∗ f (ct) 6 T. Moon et al. (2015). “Rnndrop: A novel dropout for rnns in asr”. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE.
  • 17. Dropout in LSTM7     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = ft ∗ ct−1 + it ∗ d(gt) ht = ot ∗ f (ct) 7 S. Semeniuta et al. (2016). “Recurrent Dropout without Memory Loss”. In: arXiv preprint arXiv:1603.05118.
  • 18. Dropout in LSTM – Summary
  • 20. Stochastic Depth8 8 G. Huang et al. (2016). “Deep networks with stochastic depth”. In: arXiv preprint arXiv:1603.09382.
  • 22. Zoneout9 ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise, 9 D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
  • 23. Zoneout9 ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise, Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0 9 D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
  • 24. Zoneout9 ht = f (Wh[xt, d(ht−1)] + bh]) d(x) = mask ∗ x if training phase (1 − p)x otherwise, Dropout: τt = pt ∗ ˜τt + (1 − pt) ∗ 0 Zoneout: τt = pt ∗ ˜τt + (1 − pt) ∗ 1 9 D. Krueger et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305.
  • 25. Zoneout Figure: Zoneout vs Recurrent Dropout
  • 26. Again – LSTM equations     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = ft ∗ ct−1 + it ∗ gt ht = ot ∗ f (ct)
  • 27. LSTM equations – Zoneout     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = pt ∗ ct−1 + (1 − pt) ∗ (ft ∗ ct−1 + it ∗ gt) ht = pt ∗ ht−1 + (1 − pt) ∗ (ot ∗ f (ct))
  • 28. Zoneout + Recurrent Dropout     it ft ot gt     =     σ(Wi [xt, ht] + bi ) σ(Wf [xt, ht] + bf ) σ(Wo[xt, ht] + bo) f (Wg [xt, ht] + bg )     ct = (ft ∗ ct−1 + d(it ∗ gt)) recurrent dropout ht = ((1 − pt) ∗ ot + pt ∗ ot−1) ∗ f (ct) zoneout
  • 30. Sequential Permuted MNIST (1/3) Sequential MNIST: pixels of an image representing a number are presented to a RNN one at a time, in lexographic order (left to right, top to bottom) Permuted Sequential MNIST: the pixels are represented in a (fixed) random order Error Classification
  • 34. Character level – Penn Treebank (1/2) BPC = − log2 P(xt+1|yt) xt+1 correct symbol yt output of the algorithm
  • 35. Character level – Penn Treebank (2/2) BPC = − log2 P(xt+1|yt)
  • 36. Word level – Penn Treebank (1/2) Perplexity = dH(p) = 2− x p(x) log2 p(x)
  • 37. Word level – Penn Treebank (2/2)
  • 39. Conclusions Instead of dropping out neurons, zoneout neurons More robust to changes in the hidden state Identity connections of zoneout improve the flow of information through the network
  • 40. Conclusions Instead of dropping out neurons, zoneout neurons More robust to changes in the hidden state Identity connections of zoneout improve the flow of information through the network Future Work: Adapt the set of probabilities of updating various units based on the sequence input
  • 41. References I Bachman, P. et al. (2014). “Learning with pseudo-ensembles”. In: Advances in Neural Information Processing Systems, pp. 3365–3373. Gal, Y. (2015). “A theoretically grounded application of dropout in recurrent neural networks”. In: arXiv preprint arXiv:1512.05287. Huang, G. et al. (2016). “Deep networks with stochastic depth”. In: arXiv preprint arXiv:1603.09382. Krueger, D. et al. (2016). “Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations”. In: arXiv preprint arXiv:1606.01305. Moon, T. et al. (2015). “Rnndrop: A novel dropout for rnns in asr”. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, pp. 65–70.
  • 42. References II Pham, V. et al. (2014). “Dropout improves recurrent neural networks for handwriting recognition”. In: Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on. IEEE, pp. 285–290. Semeniuta, S. et al. (2016). “Recurrent Dropout without Memory Loss”. In: arXiv preprint arXiv:1603.05118. Srivastava, N. et al. (2014). “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”. In: Journal of Machine Learning Research 15, pp. 1929–1958. Zaremba, W. et al. (2014). “Recurrent neural network regularization”. In: arXiv preprint arXiv:1409.2329.