SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
VAE-type Deep Generative
Models (Especially RNN + VAE)
Kenta Oono oono@preferred.jp
Preferred Networks Inc.
25th Jun. 2016
Tokyo Webmining @FreakOut
1/34
Notations
• x: observable (visible) variables
• z: latent (hidden) variables
• D = {x1, x2, …, xN}: training dataset
• KL(q || p): KL divergence between two distributions q and p
• θ: parameters of generative model
• φ: parameters of inference model
• pθ: probability distribution modelled by generative model
• qφ: probability distribution modelled by inference model
• N(µ, σ2): Gaussian Distribution with mean µ and standard deviation σ
• Ber(p): Bernoulli Distribution with parameter p
• A := B, B =: A : Define A by B.
• Ex~p[ f (x)] : Expectation of f(x) with respect to x drawn from p. Namely, ∫ f(x) p(x) dx.
2/34
Abbreviations
• NN: Neural Network
• RNN: Recurrent Neural Network
• CNN: Convolutional Neural Network
• ELBO: Evidence Lower BOund
• AE: Auto Encoder
• VAE: Variational Auto Encoder
• LSTM: Long Short-Term Memory
• NLL: Negative Log-Likelihood
3/34
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
4/34
Generative models and discriminative
models
• Discriminative model
• Models p(z | x)
• e.g. SVM, Logistic Regression Naïve Bayes Classifier etc.
• Generative model ← Todayʼs Topic
• Models p(x, z) or p(x)
• e.g. RBM, HMM, VAE etc.
5/34
Recent trend of generative models by NN
• Helmholtz machine type ← Todayʼs Topic
• Model p(x, z) as p(z) p(x | z)
• Prepare two NNs: Generative model and Inference model
• Use variational inference and train models to maximize ELBO
• e.g. VAE, ADGM, DRAW, IWAE, VRNN etc.
• Generative Adversarial Network (GAN) type
• Model p(x, z) as p(z) p(x | z)
• Prepare two NNs: Generator and Discriminator
• Train models by solving min-max problem
• e.g. GAN, DCGAN, LAPGAN, f-GAN, InfoGAN etc.
• Auto regressive type
• Model p(x) as Πi p(xi | x1, …, xi-1)
• e.g. Pixel RNN, MADE, NADE etc. 6/34
NN as a probabilistic model
• We assume p(x, z) are parameterized by NN whose
parameter (e.g. weights, biases) is θ and denote it by pθ(x, z).
• Training reduces to find θ that maximize some objective
function.
7/34
NN as a probabilistic model (example)
• prior: pθ(z) = N(0, 1)
• generation: pθ(x | z) = N(x | µθ(z), σθ
2 (z))
• µθ and σθ are deteministic NNs which
takes z as a input and outputs scalar
value.
• Although pθ(x | z) is, simple, pθ(x) can
represent complex distribution.
8/34
z
µ σ2
z ~ N(0, 1)
x x ~ N(x | µθ, σθ
2 )
deterministic NNs
sampling
pθ(x)
= ∫ pθ (x | z) pθ (z) dz
= ∫ N(x | µθ(z), σθ
2 (z)) pθ (z) dz
Generation pθ(x | z)
Difficulty of generative models
• Posterior pθ(z | x) is intractable.
9_34
z
x
pθ (x | z) is easy
to sample
×
pθ(z | x) is
intractable
pθ(z | x)
= pθ (x | z) pθ (z) / pθ (x) (Bayesʼ Thm.)
= pθ (x | z) pθ (z) / ∫ pθ (x, z’) dz’
= pθ (x | z) pθ (z) / ∫ pθ (x | z’) pθ (z’) dz’
• In typical situation, we cannot
calculate the integral analytically.
• When zʼ is high-dimensional, the
integral is difficult to estimate (e.g.
MCMC)
Variational inference
• Instead of posterior distribution pθ(z | x),
we consider the set of distributions
{qφ(z | x)}φ∈Φ .
• Φ is a some set of parameters.
• In addition to θ, we try to find φ that
approximates pθ(z | x) well in training.
• Choice of qφ(z | x)
• Easy to calculate or be sampled from.
• e.g. Mean field approximation
• e.g. VAE : NN with params. φ
10_34
Note: To fully describe the distribution qφ, we
need to specify qφ(x). Typically we employ the
empirical distribution of training dataset.
z
x
×
z
x
approximate
Inference
model
qφ(z | x)
Generative
model
pθ (z | x)
Evidence Lower BOund (ELBO)
• Consider single training example x.
11_34
L(x; θ)
L~(x; θ, φ)
difference
= KL(q(z | x) || p(z | x))
L(x; θ) := log pθ(x)
= log ∫ pθ(x, z)dz
= log ∫ qφ(z | x) pθ(x, z) / qφ(z | x) dz
≧ ∫ qφ(z | x) log pθ(x, z) / qφ(z | x) dz (Jensen)
=: L~(x; θ, φ)
• Instead of L(x; θ), we maximize L~(x; θ, φ)
with respect to θ and φ.
• We call L~ Evidence Lower BOund (ELBO).
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
12_34
Variational AutoEncoder (VAE)
[Kingma+13]
• Use NN as an inference model.
• Training with backpropagation.
• How to calculate gradient?
• REINFORCE (a.k.a Likelihood Ratio (LR))
• Control Variate
• Reparameterization trick [Kingma+13]
(a.k.a Stochastic Gradient Variational
Bayes (SGVB) [Rezende+14])
13/34
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint
arXiv:1312.6114.
Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic backpropagation and approximate
inference in deep generative models. arXiv preprint arXiv:1401.4082.
x
x’
Decoder
= Generative
model
Encoder
=Inference
model
z
Training Procedure
• ELBO L~(x; θ, φ) equals to Ez~q(z | x) [log p(x | z)] - KL(q(z | x) || p(z))
• 1st term: Reconstruction Loss
• 2nd term: Regularization Loss
14/34
z
x
Inference
model
qφ
z
x’
Generative
model
pθ
2. Inference model tries to
make posterior close to the
prior of generation model
4. Generation model tries to
reconstruct the input data
Calculate Reconstruction loss
1. Input is fed to
inference model
3. Latent variable is pass
generation model.
Calculate regularized loss
NN +
sampling
NN +
sampling
Generation
• We can generate data points with trained generative models.
15/34
z
x’
Generative
model
pθ
NN +
sampling
1. sample from prior
~ pθ(z) (e.g. N(0, 1))
2. propagate down
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Misc.
• Inverse DRAW, VAE + GAN
• Conclusion
16/34
Variational Recurrent AutoEncoder (VRAE)
[Fabius+14]
• The modification of VAE where two models (inference model
and generative model) are replaced with RNNs.
17_34
Fabius, O., & van Amersfoort, J. R. (2014). Variational recurrent auto-
encoders. arXiv preprint arXiv:1412.6581.
ht ht+1 hT
z h0
x1’
xt-1 xt xT-1
ht
xt+1’
Encoder
Decoder
RNN
RNNht-1
xt’
Variational RNN (VRNN) [Chung+15]
• Inference and generative
models share the hidden
state h and update it
throughout time. Latent
variable z is sampled from
the state.
18_34
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. C., & Bengio, Y. (2015). A recurrent latent variable
model for sequential data. In Advances in neural information processing systems (pp. 2980-2988).
ht-1 ht ht+1
xt xt+1
ht-1 ht-1
xt’ xt+1
’
zt’ zt+1’
Encoder
Decoder
zt zt+1
RNN RNN
DRAW [Gregor+15]
• “Generative model of natural images that operates by
making a large number of small contributions to an additive
canvas using an attention model”.
• Inference and generative models are independent RNNs.
19/34
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). DRAW: A
recurrent neural network for image generation. arXiv preprint arXiv:1502.04623.
DRAW without attention [Gregor+15]
20/34
x
ht
e
ht
d
Δct
+
x
ht+1
e
ht+1
d
Δct+1
ct +ct-1 ct+1
Encoder
Decoder
zt zt+1
cT
x’
σ
RNN
RNN
RNN
RNN
RNN
RNN
DRAW [Gregor+15]
21/34
x
rt
ht
e
ht
d
Δct
+
x
rt+1
ht+1
e
ht+1
d
Δct+1
at at+1
at
ct +ct-1
at+1
ct+1
zt zt+1
cT
x’
σ
RNN
RNN
RNN
RNN
RNN
RNN
Encoder
Decoder
Convolutional DRAW [Gregor+16]
• The variant of DRAW with following modifications:
• Linear connections are replaced with convolutions (including
connections in LSTM).
• Read and write attention mechanisms are removed.
• Instead of sampling from Standard Gaussian distribution in DRAW,
prior of generative model depends on decoderʼs state.
• But details of the implementation is not fully described in the
paper ...
22/34
Gregor, K., Besse, F., Rezende, D. J., Danihelka, I., & Wierstra, D. (2016).
Towards Conceptual Compression. arXiv preprint arXiv:1604.08772.
alignDRAW [Mansimov+15]
• Generate image from its caption.
23/34
Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2015). Generating images
from captions with attention. arXiv preprint arXiv:1511.02793.
Implemantation of convolutional DRAW
with Chainer
24
Reconstruction
Generation
Generation (linear connection)
My implementation of
convolutional DRAW
25/34
y
x
+
eembe
ht
e LSTM ht
e
ztembd
+ht
d LSTM ht
d
Δct
+ct ct+1
µt
d σt
d2
µt
e σt
e2
Convolution
Linear
Identity
Samplingct
-
xt+1
’
σ
NLL loss
Deconvolution
y
Encoder
Decoder
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
26/34
VAE + GAN [Larsen+15]
• Use generative model of VAE as
the generator of GAN.
27/34
Larsen, A. B. L., Sønderby, S. K., & Winther, O. (2015). Autoencoding beyond
pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300.
Inverse DRAW
• a
28/34https://openai.com/requests-for-research/#inverse-draw
cf. InfoGAN[Chen+16]
• Make latent variables of GAN interpretable.
29/34
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN:
Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets.
arXiv preprint arXiv:1606.03657.
Agenda
• Mathematical formulation of generative models.
• Variational Auto Encoder (VAE)
• Variants of VAE: RNN + VAE
• RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW
• Chainer implementation of (Convolutional) DRAW
• Other VAE-like models
• Inverse DRAW, VAE + GAN
• Conclusion
30/34
Challenges of VAE-like generative models
• Compared to GAN, the images generated by VAE-like models
are said to be blurry.
• Difficulty of evaluation.
• The following common evaluation criteria are independent in some
situation [Theis+15].
• average log-likelihood
• Parzen window estimates
• visual fidelity of samples
• We can evaluate exactly only lower bound of log-likelihood.
• Generation of high dimensional images is still challenging.
31/34
Theis, L., Oord, A. V. D., & Bethge, M. (2015). A note on the
evaluation of generative models. arXiv preprint arXiv:1511.01844.
Many many topics are not covered today.
• VAE + Gaussian Process
• VAE-DGP, Variational GP, Recurrent GP
• Tighter lower bound of log-likelihood
• Importance Weighted AE
• Generative model with more complex prior distribution
• Hierachical Variational Model, Auxiliary Deep Generative Model,
Hamiltonial Variational Inference, Normalizing Flow, Gradient Flow,
Inverse Autoregressive Flow,
• Automatic Variational Inference
32/34
Related conferences, workshops and blogs
• NIPS 2015
• Advances in Approximate Bayesian Inference (AABI)
• http://approximateinference.org/accepted/
• Black Box Learning and Inference
• http://www.blackboxworkshop.org
• ICLR 2016
• http://www.iclr.cc/doku.php?id=iclr2016:main
• OpenAI
• Blog: Generative Models
• https://openai.com/blog/generative-models/
33/34
Summary
• VAE is a generative model that parameterize the inference
and generative models with NNs and optimize them by
maximizing the ELBO of loglikelihood.
• Recently the variant of VAE is proposed including RVAE,
VRNN, and (Convolutional) DRAW.
• Introduced the implementation of generative model with
Chainer.
34/34

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoder[Ridge-i 論文よみかい] Wasserstein auto encoder
[Ridge-i 論文よみかい] Wasserstein auto encoder
 
VQ-VAE
VQ-VAEVQ-VAE
VQ-VAE
 
計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-計算論的学習理論入門 -PAC学習とかVC次元とか-
計算論的学習理論入門 -PAC学習とかVC次元とか-
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習
 
混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム混合ガウスモデルとEMアルゴリスム
混合ガウスモデルとEMアルゴリスム
 
PRML輪読#14
PRML輪読#14PRML輪読#14
PRML輪読#14
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
 
[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)
 
About Unsupervised Image-to-Image Translation
About Unsupervised Image-to-Image TranslationAbout Unsupervised Image-to-Image Translation
About Unsupervised Image-to-Image Translation
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
PRML11章
PRML11章PRML11章
PRML11章
 
Sliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデルSliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデル
 
Variational autoencoder
Variational autoencoderVariational autoencoder
Variational autoencoder
 
Icml2018読み会_overview&GANs
Icml2018読み会_overview&GANsIcml2018読み会_overview&GANs
Icml2018読み会_overview&GANs
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 

Semelhante a VAE-type Deep Generative Models

Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Huang Po Chun
 

Semelhante a VAE-type Deep Generative Models (20)

Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
 
Mit6 094 iap10_lec03
Mit6 094 iap10_lec03Mit6 094 iap10_lec03
Mit6 094 iap10_lec03
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
The Magic of Auto Differentiation
The Magic of Auto DifferentiationThe Magic of Auto Differentiation
The Magic of Auto Differentiation
 
ガウス過程入門
ガウス過程入門ガウス過程入門
ガウス過程入門
 
PhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-SeneviratnePhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-Seneviratne
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 

Mais de Kenta Oono

提供AMIについて
提供AMIについて提供AMIについて
提供AMIについて
Kenta Oono
 

Mais de Kenta Oono (20)

Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
 
20170422 数学カフェ Part2
20170422 数学カフェ Part220170422 数学カフェ Part2
20170422 数学カフェ Part2
 
20170422 数学カフェ Part1
20170422 数学カフェ Part120170422 数学カフェ Part1
20170422 数学カフェ Part1
 
情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート情報幾何学の基礎、第7章発表ノート
情報幾何学の基礎、第7章発表ノート
 
GTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introduction
 
On the benchmark of Chainer
On the benchmark of ChainerOn the benchmark of Chainer
On the benchmark of Chainer
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
Introduction to Chainer and CuPy
Introduction to Chainer and CuPyIntroduction to Chainer and CuPy
Introduction to Chainer and CuPy
 
Stochastic Gradient MCMC
Stochastic Gradient MCMCStochastic Gradient MCMC
Stochastic Gradient MCMC
 
Chainer Contribution Guide
Chainer Contribution GuideChainer Contribution Guide
Chainer Contribution Guide
 
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用 2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
 
Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)Introduction to Chainer (LL Ring Recursive)
Introduction to Chainer (LL Ring Recursive)
 
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
 
提供AMIについて
提供AMIについて提供AMIについて
提供AMIについて
 
Chainerインストール
ChainerインストールChainerインストール
Chainerインストール
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

VAE-type Deep Generative Models

  • 1. VAE-type Deep Generative Models (Especially RNN + VAE) Kenta Oono oono@preferred.jp Preferred Networks Inc. 25th Jun. 2016 Tokyo Webmining @FreakOut 1/34
  • 2. Notations • x: observable (visible) variables • z: latent (hidden) variables • D = {x1, x2, …, xN}: training dataset • KL(q || p): KL divergence between two distributions q and p • θ: parameters of generative model • φ: parameters of inference model • pθ: probability distribution modelled by generative model • qφ: probability distribution modelled by inference model • N(µ, σ2): Gaussian Distribution with mean µ and standard deviation σ • Ber(p): Bernoulli Distribution with parameter p • A := B, B =: A : Define A by B. • Ex~p[ f (x)] : Expectation of f(x) with respect to x drawn from p. Namely, ∫ f(x) p(x) dx. 2/34
  • 3. Abbreviations • NN: Neural Network • RNN: Recurrent Neural Network • CNN: Convolutional Neural Network • ELBO: Evidence Lower BOund • AE: Auto Encoder • VAE: Variational Auto Encoder • LSTM: Long Short-Term Memory • NLL: Negative Log-Likelihood 3/34
  • 4. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 4/34
  • 5. Generative models and discriminative models • Discriminative model • Models p(z | x) • e.g. SVM, Logistic Regression Naïve Bayes Classifier etc. • Generative model ← Todayʼs Topic • Models p(x, z) or p(x) • e.g. RBM, HMM, VAE etc. 5/34
  • 6. Recent trend of generative models by NN • Helmholtz machine type ← Todayʼs Topic • Model p(x, z) as p(z) p(x | z) • Prepare two NNs: Generative model and Inference model • Use variational inference and train models to maximize ELBO • e.g. VAE, ADGM, DRAW, IWAE, VRNN etc. • Generative Adversarial Network (GAN) type • Model p(x, z) as p(z) p(x | z) • Prepare two NNs: Generator and Discriminator • Train models by solving min-max problem • e.g. GAN, DCGAN, LAPGAN, f-GAN, InfoGAN etc. • Auto regressive type • Model p(x) as Πi p(xi | x1, …, xi-1) • e.g. Pixel RNN, MADE, NADE etc. 6/34
  • 7. NN as a probabilistic model • We assume p(x, z) are parameterized by NN whose parameter (e.g. weights, biases) is θ and denote it by pθ(x, z). • Training reduces to find θ that maximize some objective function. 7/34
  • 8. NN as a probabilistic model (example) • prior: pθ(z) = N(0, 1) • generation: pθ(x | z) = N(x | µθ(z), σθ 2 (z)) • µθ and σθ are deteministic NNs which takes z as a input and outputs scalar value. • Although pθ(x | z) is, simple, pθ(x) can represent complex distribution. 8/34 z µ σ2 z ~ N(0, 1) x x ~ N(x | µθ, σθ 2 ) deterministic NNs sampling pθ(x) = ∫ pθ (x | z) pθ (z) dz = ∫ N(x | µθ(z), σθ 2 (z)) pθ (z) dz Generation pθ(x | z)
  • 9. Difficulty of generative models • Posterior pθ(z | x) is intractable. 9_34 z x pθ (x | z) is easy to sample × pθ(z | x) is intractable pθ(z | x) = pθ (x | z) pθ (z) / pθ (x) (Bayesʼ Thm.) = pθ (x | z) pθ (z) / ∫ pθ (x, z’) dz’ = pθ (x | z) pθ (z) / ∫ pθ (x | z’) pθ (z’) dz’ • In typical situation, we cannot calculate the integral analytically. • When zʼ is high-dimensional, the integral is difficult to estimate (e.g. MCMC)
  • 10. Variational inference • Instead of posterior distribution pθ(z | x), we consider the set of distributions {qφ(z | x)}φ∈Φ . • Φ is a some set of parameters. • In addition to θ, we try to find φ that approximates pθ(z | x) well in training. • Choice of qφ(z | x) • Easy to calculate or be sampled from. • e.g. Mean field approximation • e.g. VAE : NN with params. φ 10_34 Note: To fully describe the distribution qφ, we need to specify qφ(x). Typically we employ the empirical distribution of training dataset. z x × z x approximate Inference model qφ(z | x) Generative model pθ (z | x)
  • 11. Evidence Lower BOund (ELBO) • Consider single training example x. 11_34 L(x; θ) L~(x; θ, φ) difference = KL(q(z | x) || p(z | x)) L(x; θ) := log pθ(x) = log ∫ pθ(x, z)dz = log ∫ qφ(z | x) pθ(x, z) / qφ(z | x) dz ≧ ∫ qφ(z | x) log pθ(x, z) / qφ(z | x) dz (Jensen) =: L~(x; θ, φ) • Instead of L(x; θ), we maximize L~(x; θ, φ) with respect to θ and φ. • We call L~ Evidence Lower BOund (ELBO).
  • 12. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 12_34
  • 13. Variational AutoEncoder (VAE) [Kingma+13] • Use NN as an inference model. • Training with backpropagation. • How to calculate gradient? • REINFORCE (a.k.a Likelihood Ratio (LR)) • Control Variate • Reparameterization trick [Kingma+13] (a.k.a Stochastic Gradient Variational Bayes (SGVB) [Rezende+14]) 13/34 Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082. x x’ Decoder = Generative model Encoder =Inference model z
  • 14. Training Procedure • ELBO L~(x; θ, φ) equals to Ez~q(z | x) [log p(x | z)] - KL(q(z | x) || p(z)) • 1st term: Reconstruction Loss • 2nd term: Regularization Loss 14/34 z x Inference model qφ z x’ Generative model pθ 2. Inference model tries to make posterior close to the prior of generation model 4. Generation model tries to reconstruct the input data Calculate Reconstruction loss 1. Input is fed to inference model 3. Latent variable is pass generation model. Calculate regularized loss NN + sampling NN + sampling
  • 15. Generation • We can generate data points with trained generative models. 15/34 z x’ Generative model pθ NN + sampling 1. sample from prior ~ pθ(z) (e.g. N(0, 1)) 2. propagate down
  • 16. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Misc. • Inverse DRAW, VAE + GAN • Conclusion 16/34
  • 17. Variational Recurrent AutoEncoder (VRAE) [Fabius+14] • The modification of VAE where two models (inference model and generative model) are replaced with RNNs. 17_34 Fabius, O., & van Amersfoort, J. R. (2014). Variational recurrent auto- encoders. arXiv preprint arXiv:1412.6581. ht ht+1 hT z h0 x1’ xt-1 xt xT-1 ht xt+1’ Encoder Decoder RNN RNNht-1 xt’
  • 18. Variational RNN (VRNN) [Chung+15] • Inference and generative models share the hidden state h and update it throughout time. Latent variable z is sampled from the state. 18_34 Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. C., & Bengio, Y. (2015). A recurrent latent variable model for sequential data. In Advances in neural information processing systems (pp. 2980-2988). ht-1 ht ht+1 xt xt+1 ht-1 ht-1 xt’ xt+1 ’ zt’ zt+1’ Encoder Decoder zt zt+1 RNN RNN
  • 19. DRAW [Gregor+15] • “Generative model of natural images that operates by making a large number of small contributions to an additive canvas using an attention model”. • Inference and generative models are independent RNNs. 19/34 Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015). DRAW: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623.
  • 20. DRAW without attention [Gregor+15] 20/34 x ht e ht d Δct + x ht+1 e ht+1 d Δct+1 ct +ct-1 ct+1 Encoder Decoder zt zt+1 cT x’ σ RNN RNN RNN RNN RNN RNN
  • 21. DRAW [Gregor+15] 21/34 x rt ht e ht d Δct + x rt+1 ht+1 e ht+1 d Δct+1 at at+1 at ct +ct-1 at+1 ct+1 zt zt+1 cT x’ σ RNN RNN RNN RNN RNN RNN Encoder Decoder
  • 22. Convolutional DRAW [Gregor+16] • The variant of DRAW with following modifications: • Linear connections are replaced with convolutions (including connections in LSTM). • Read and write attention mechanisms are removed. • Instead of sampling from Standard Gaussian distribution in DRAW, prior of generative model depends on decoderʼs state. • But details of the implementation is not fully described in the paper ... 22/34 Gregor, K., Besse, F., Rezende, D. J., Danihelka, I., & Wierstra, D. (2016). Towards Conceptual Compression. arXiv preprint arXiv:1604.08772.
  • 23. alignDRAW [Mansimov+15] • Generate image from its caption. 23/34 Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R. (2015). Generating images from captions with attention. arXiv preprint arXiv:1511.02793.
  • 24. Implemantation of convolutional DRAW with Chainer 24 Reconstruction Generation Generation (linear connection)
  • 25. My implementation of convolutional DRAW 25/34 y x + eembe ht e LSTM ht e ztembd +ht d LSTM ht d Δct +ct ct+1 µt d σt d2 µt e σt e2 Convolution Linear Identity Samplingct - xt+1 ’ σ NLL loss Deconvolution y Encoder Decoder
  • 26. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 26/34
  • 27. VAE + GAN [Larsen+15] • Use generative model of VAE as the generator of GAN. 27/34 Larsen, A. B. L., Sønderby, S. K., & Winther, O. (2015). Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300.
  • 29. cf. InfoGAN[Chen+16] • Make latent variables of GAN interpretable. 29/34 Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. arXiv preprint arXiv:1606.03657.
  • 30. Agenda • Mathematical formulation of generative models. • Variational Auto Encoder (VAE) • Variants of VAE: RNN + VAE • RVAE, VRNN, DRAW, Convolutional DRAW, alignDRAW • Chainer implementation of (Convolutional) DRAW • Other VAE-like models • Inverse DRAW, VAE + GAN • Conclusion 30/34
  • 31. Challenges of VAE-like generative models • Compared to GAN, the images generated by VAE-like models are said to be blurry. • Difficulty of evaluation. • The following common evaluation criteria are independent in some situation [Theis+15]. • average log-likelihood • Parzen window estimates • visual fidelity of samples • We can evaluate exactly only lower bound of log-likelihood. • Generation of high dimensional images is still challenging. 31/34 Theis, L., Oord, A. V. D., & Bethge, M. (2015). A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844.
  • 32. Many many topics are not covered today. • VAE + Gaussian Process • VAE-DGP, Variational GP, Recurrent GP • Tighter lower bound of log-likelihood • Importance Weighted AE • Generative model with more complex prior distribution • Hierachical Variational Model, Auxiliary Deep Generative Model, Hamiltonial Variational Inference, Normalizing Flow, Gradient Flow, Inverse Autoregressive Flow, • Automatic Variational Inference 32/34
  • 33. Related conferences, workshops and blogs • NIPS 2015 • Advances in Approximate Bayesian Inference (AABI) • http://approximateinference.org/accepted/ • Black Box Learning and Inference • http://www.blackboxworkshop.org • ICLR 2016 • http://www.iclr.cc/doku.php?id=iclr2016:main • OpenAI • Blog: Generative Models • https://openai.com/blog/generative-models/ 33/34
  • 34. Summary • VAE is a generative model that parameterize the inference and generative models with NNs and optimize them by maximizing the ELBO of loglikelihood. • Recently the variant of VAE is proposed including RVAE, VRNN, and (Convolutional) DRAW. • Introduced the implementation of generative model with Chainer. 34/34