Improving Variational Inference with Inverse Autoregressive Flow

Improving Variational Inference
with Inverse Autoregressive Flow
Jan. 19, 2017
Tatsuya Shirakawa (tatsuya@abeja.asia)
Diederik P. Kingma (OpenAI) Tim Salimans (OpenAI) Rafal Jozefowics (OpenAI)
Xi Chen (OpenAI) Ilya Sutskever (OpenAI)
Max Welling (University of Amsterdam)

2
Requirements for the inference model q(z|x)
Computational Tractability
1. Computationally cheap to compute and differentiate
2. Computationally cheap to sample from
3. Parallel computation
Accuracy
4. Sufficiently flexible to match
the true posterior p(z|x)
P(z|x; μ*)
q(z|x; ν*)
P(z|x; μ)
q(z|x; ν)

3
Previous Designs of q(z|x)
Basic Designs
- Diagonal Gaussian Distribution
- Full Covariance Gaussian Distribution
Designs based on Change of Variables
- Nice
L. Dinh et al., “Nice: non-linear independent components estimation”, 2014
- Normalizing Flow
D. J. Rezende et al., “Variational inference with normalizing flows”, ICML2015
Designs based on Adding Auxiliary Variables
- Hamiltonian Flow/Hamiltonian Variational Inference
T. Salimans et al., ”Markov chain Monte Carlo and variational inference: Bridging the gap”, 2014

4
Diagonal/Full Covariance Gaussian Distribution
Diagonal: Efficient but not flexible
𝑞 𝒛 𝒙 = ΠU 𝑁 𝒛𝒊|𝜇U 𝒙 , 𝜎U 𝒙
Full Covariance: Not Efficient and not flexible (unimodal)
𝑞 𝒛 𝒙 = 𝑁 𝒛|𝝁 𝒙 , 𝚺 𝒙
1. Computationally cheap to compute and differentiate ✓ / ✗
2. Computationally cheap to sample from ✓ / ✗
3. Parallel computation ✓ / ✗
✗

5
Change of Variables based methods
Transoform 𝑞 𝑧Z 𝑥 to make more powerful distribution
𝑞 𝑧 𝑥 via sequential application of change of variables
𝒛 𝒕 = 𝑓^ 𝒛 𝒕_𝟏
𝑞 𝒛 𝒕 𝒙 = 𝑞 𝒛 𝒕_𝟏 𝒙 det
𝑑𝑓^ 𝒛 𝒕_𝟏
𝑑𝒛 𝒕_𝟏
_G
⇒ log 𝑞 𝒛 𝑻 𝒙 = log 𝑞 𝒛 𝟎 𝒙 − B log det
𝑑𝑓^ 𝒛 𝒕_𝟏
𝑑𝒛 𝒕_𝟏
^
• Nice
L. Dinh et al., “Nice: non-linear independent components estimation”, 2014
• Normalizing Flow
D. J. Rezende et al., “Variational inference with normalizing flows”, ICML2015

6
Normalizing Flow
Transformation via
𝒛 𝒕 = 𝒛 𝒕_𝟏 + 𝒖 𝒕 𝑓^ 𝒘 𝒕

𝒛 𝒕_𝟏 + 𝑏^
Key Features
- Determinants are computable
Drawbacks
- Information goes through single bottleneck
1. Computationally cheap to compute and differentiate ✓
2. Computationally cheap to sample from ✓
3. Parallel computation ✗
✗
single bottleneck
⊕
𝒛 𝒕_𝟏
𝒛 𝒕
𝒘 𝒕
𝑻
𝒛 𝒕 + 𝑏^
𝒖 𝒕 𝑓^ 𝒘 𝒕
𝑻
𝒛 𝒕 + 𝑏^

7
Hamiltonian Flow / Hamiltonian Variational Inference
ELBO with auxiliary variables y
log 𝑝 𝒙 ≥ log 𝑝 𝒙 − 𝐷23 𝑞 𝒛|𝒙 ∥ 𝑝 𝒛 𝒙 − 𝐷23 𝑞 𝒚 𝒙, 𝒛 ∥ 𝑟 𝒚 𝒙, 𝒛 =: ℒ 𝒙
Drawing (y, z) via HMC
𝑦^, 𝑧^ ~𝐻𝑀𝐶 𝑦^, 𝑧^|𝑦^_G, 𝑧^_G
Key Features
- Capability to sample from exact posterior
Drawbacks
- Long mixing time and lower ELBO
1. Computationally cheap to compute and differentiate ✗
2. Computationally cheap to sample from ✗
✓

8
Nice
Transform only half of z at each steps
𝒛 𝒕 = 𝒛 𝒕
𝜶
, 𝒛 𝒕
𝜷
= 𝒛 𝒕_𝟏
𝜶
, 𝒛 𝒕_𝟏
𝜷
+ 𝑓^ 𝒙, 𝒛 𝒕_𝟏
𝜶
,
Key Features
- Determinant of the Jacobian det
uvw 𝒛 𝒕x𝟏
u𝒛 𝒕x𝟏
is always 1
Drawbacks
- Limited form of transformation
- less accurate powerful than Normalizing Flow (Next)
✗

9
Autoregressive Flow (proposed)
Autoregressive Flow (𝑑𝜇^,U/𝑑𝑧^,z = 𝑑𝜎^,U/𝑑𝑧^,z = 0 if 𝑖 ≤ 𝑗)
𝑧^,U = 𝜇^,U 𝒛 𝒕,𝟎:𝒊_𝟏 + 𝜎^,U 𝒛 𝒕,𝟎:𝒊_𝟏 ⊙ 𝑧^_G,U
Key features
- Powerful
- Easy to compute det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = ΠU 𝜎^,U 𝐳𝐭_𝟏
Drawbacks
- Difficult to parallelize
✓

10
Inverse Autoregressive Flow (proposed)
Inverting AF (𝝁 𝒕, 𝝈 𝒕 is also autoregressive)
𝒛 𝒕 =
𝒛 𝒕_𝟏 − 𝝁 𝒕 𝒛 𝒕_𝟏
𝝈 𝒕 𝒛 𝒕_𝟏
Key Features
- Equally powerful as AF
- Easy to compute det 𝜕𝒛 𝒕/𝜕𝒛 𝒕_𝟏 = 1/ΠU 𝜎^,U 𝐳𝐭_𝟏
- Parallelizable
3. Parallel computation ✓
✓

11
IAF through Masked Autoencoder (MADE)
Modeling autoregressive 𝝁 𝒕 and 𝝈 𝒕 with MADE
• Removing paths from futures
from Autoencoders
by introducing masks
• MADE is a probabilistic model
𝑝 𝑥 = ΠU 𝑝 𝑥U 𝑥Z:U_G

12
Experiments
IAF is evaluated on image generating models
Models for MNIST
- Convolutional VAE with ResNet blocks
- IAF = 2-layer MADE
- IAF transformations are stacked with ordering reversed alternately
Models for CIFAR-10 (very complicated)

15
IAF in 1 slide
𝒒 𝒛 𝑻 𝒙; 𝝂 𝑻 𝝂 𝑻
𝒑 𝒛 𝒙; 𝝁∗𝒑 𝒛 𝒙; 𝝁
𝒒 𝒛 𝒙; 𝝂 𝑻
∗
𝒒 𝒛 𝒕 𝒙; 𝝂 𝒕 𝝂 𝒕
𝒒 𝒛 𝟎 𝒙; 𝝂 𝟎 𝝂 𝟎
Autoregressive Flow
Inverse Autoregressive Flow
IAF is
ü Easy to compute and differentiate
ü Easy to sample from
ü Parallelizable
ü Flexible
𝒒 𝒛 𝒙; 𝝂 𝑻

We are hiring!
http://www.abeja.asia/
https://www.wantedly.com/companies/abeja

Improving Variational Inference with Inverse Autoregressive Flow

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (19)

Semelhante a Improving Variational Inference with Inverse Autoregressive Flow

Semelhante a Improving Variational Inference with Inverse Autoregressive Flow (20)

Mais de Tatsuya Shirakawa

Mais de Tatsuya Shirakawa (14)

Último

Último (20)

Improving Variational Inference with Inverse Autoregressive Flow