SlideShare uma empresa Scribd logo
1 de 79
Baixar para ler offline
從VAE ⾛向深度學習新理論從VAE ⾛向深度學習新理論
杜岳華
Deep Learning is a kind of Representational LearningDeep Learning is a kind of Representational Learning
Deep Learning is a kind of Representational LearningDeep Learning is a kind of Representational Learning
picture source (https://www.deeplearningbook.org/)
Representational LearningRepresentational Learning
x2
x3
x4
x5
z4
z3
z2
z1
x1
f
woman
ClassifierFeature
extractor
g
Representational LearningRepresentational Learning
woman
Representation Learning: A Review and New Perspectives
(https://arxiv.org/abs/1206.5538)
AutoencoderAutoencoder
x2
x3
x4
x5
z2
z1
x1
x2
x3
x4
x5
x1
z1
z2
Restricted Boltzmann MachinesRestricted Boltzmann Machines
An unsupervised greedy way to extract featuresAn unsupervised greedy way to extract features
發明:發明:
Smolensky, Paul (1986). Chapter 6: Information Processing in Dynamical Systems:
Foundations of Harmony Theory.
應⽤:應⽤:
降維:Hinton, G. E.; Salakhutdinov, R. R. (2006). Reducing the Dimensionality of
Data with Neural Networks. Science.
分類:Larochelle, H.; Bengio, Y. (2008). Classi cation using discriminative
restricted Boltzmann machines. ICML '08.
協同過濾:Salakhutdinov, R.; Mnih, A.; Hinton, G. (2007). Restricted Boltzmann
machines for collaborative ltering. ICML '07.
特徵學習:Coates, Adam; Lee, Honglak; Ng, Andrew Y. (2011). An analysis of
single-layer networks in unsupervised feature learning. International Conference
on Arti cial Intelligence and Statistics (AISTATS).
Restricted Boltzmann MachinesRestricted Boltzmann Machines
x2
x3
x4
x5
z2
z1
x1
A Beginner's Guide to Restricted Boltzmann Machines (RBMs)
(https://skymind.ai/wiki/restricted-boltzmann-machine)
Deep Belief Network [Hinton]Deep Belief Network [Hinton]
A greedy layerwise unsupervised pre-training methodA greedy layerwise unsupervised pre-training method
W1 W1
W2
Deep Belief NetworkDeep Belief Network
W1
W2
W1
T
W2
T
We need generative model!We need generative model!
Discriminative model:
Generative model:
p(Y |X)
p(X, Y )
Disentangle explanatory generative factorsDisentangle explanatory generative factors
to disentangle as many factors as possible, discarding as little information about the
data as is practical
x2
x3
x4
x5
z2
z1
x1
x2
x3
x4
x5
x1
z1
z2
Variational AutoencoderVariational Autoencoder
A generative modelA generative model
z x
N
We hope to learn generative factors by unsupervised method
The factorThe factor
xi
yi
^yi=axi+b
mean ^y
variance σ2
The factorThe factor
x y
N
y=θ0+θ1 x
θ
θ=(θ0, θ1)
To learn latent random variablesTo learn latent random variables
z x
N
θ
Introduce Bayesian theoremIntroduce Bayesian theorem
(z|x) =pθ
(x|z) (z)pθ pθ
(x)pθ
(x) = ∫ (x|z) (z)dzpθ pθ pθ
is intractable.(x)pθ
Variational inference: useVariational inference: use to approximateto approximate(z|x)qϕ (z|x)pθ
Kullback–Leibler divergenceKullback–Leibler divergence
Relative entropy, to measure the dissimilarity between two distributions.
Use data to approximate theoretical distributionp(X) q(X)
(p(X)||q(X)) = − p( ) log DKL ∑
i
xi
q( )xi
p( )xi
1. Asymmetry
2. Not distance
3.
4. and are equal
(p(X)||q(X)) ≥ 0DKL
(p(X)||q(X)) = 0DKL ⇔ p(X) q(X)
FormulationFormulation
(z|x) =pθ
(x|z) (z)pθ
pθ
(x)pθ
arg ( (z|x)|| (z|x))min
ϕ
DKL qϕ
pθ
x
zφ θ
N
x
z θ
N
ArchitectureArchitecture
x
z
x
z
encoder decoder
qϕ(z∣x) pθ(z∣x)
z= f (x) x= g(z)
θφ
x z x
gf
Evidence Lower Bound method (ELOB)Evidence Lower Bound method (ELOB)
( (z|x)|| (z|x))DKL qϕ pθ
= ∫ q(z|x) log  dz
q(z|x)
p(z|x)
= ∫ q(z|x) log  dz
q(z|x)p(x)
p(x, z)
= ∫ q(z|x) log  dz + ∫ q(z|x) log p(x)dz
q(z|x)
p(x, z)
= ∫ q(z|x)(log q(z|x) − log p(x, z))dz + log p(x)
= − [log p(x, z) − log q(z|x)] + log p(x)Eq(z|x)
Evidence Lower Bound method (ELOB)Evidence Lower Bound method (ELOB)
Let
is called (variational) lower bound or evidence lower bound.
L(θ, ϕ, x) = [log  (x, z) − log  (z|x)]E (z|x)qϕ
pθ qϕ
( (z|x)|| (z|x)) = −L(θ, ϕ, x) + log p(x)DKL qϕ pθ
log p(x) = ( (z|x)|| (z|x)) ↙ +L(θ, ϕ, x) ↗DKL qϕ pθ
Evidence Lower Bound method (ELOB)Evidence Lower Bound method (ELOB)
Encoder: , Decoder:
(z|x) =pθ
(x|z) (z)pθ
pθ
(x)pθ
arg ( (z|x)|| (z|x))min
ϕ
DKL qϕ
pθ
⇓
(z|x) =pθ
(x|z) (z)pθ
pθ
(x)pθ
arg L(θ, ϕ, x)max
θ,ϕ
(z|x)qϕ (x|z)pθ
Hypothesis: gaussian mixture as latent representationHypothesis: gaussian mixture as latent representation
z2
z1 μz2
μz1
σz 2
σz1
z2 z2
z1 z1
Encoder and decoderEncoder and decoder
z2
z2
z1
z1
encoderencoder
decoder
How to solve?How to solve?
Mean eld variational approximation
Sampling by Markov chain Monte Carlo
More?
Sampling by MCMCSampling by MCMC
picture source (https://www.youtube.com/watch?
v=OTO1DygELpY)
Stochastic gradient descent?Stochastic gradient descent?
L(θ, ϕ, x) = [log  (x, z) − log  (z|x)]E (z|x)qϕ
pθ qϕ
L(θ, ϕ, x) = [−log  (z|x)]∇ϕ ∇ϕ E (z|x)qϕ
qϕ
Reparameterization trickReparameterization trick
Encoder
( )
Decoder
( )
Sample from
Encoder
( )
Decoder
( )
Sample from
*
+
Tutorial on Variational Autoencoders
(https://arxiv.org/abs/1606.05908)
Stochastic gradient variational bayes (SGVB)Stochastic gradient variational bayes (SGVB)
⾒Algorithm 1 in Auto-Encoding Variational Bayes⾒Algorithm 1 in Auto-Encoding Variational Bayes
Example: variational autoencoderExample: variational autoencoder
z2
z2
z1
z1
encoderencoder
decoder
ExperimentsExperiments
(a) Learned Frey Face manifold (b) Learned MNIST manifold
-variational Autoencoder-variational Autoencoderβ
Achieve disentangled explainable generative factorAchieve disentangled explainable generative factor
Achieve disentangled explainable generative factorAchieve disentangled explainable generative factor
Achieve disentangled explainable generative factorAchieve disentangled explainable generative factor
Figure 6 in β-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED
VARIATIONAL FRAMEWORK
What is the di erence between VAE andWhat is the di erence between VAE and -VAE?-VAE?β
VAE:
-VAE:
arg max L(θ, ϕ, x) = [log  (x|z)] − ( (z|x)|| (z))E (z|x)qϕ
pθ DKL qϕ pθ
β
arg max L(θ, ϕ, x) = [log  (x|z)] − β ( (z|x)|| (z))E (z|x)qϕ
pθ DKL qϕ pθ
L(θ, ϕ, x) = [log  (x, z) − log  (z|x)]E (z|x)qϕ
pθ qϕ
= ∫ (z|x)(log  (x, z) − log  (z|x))dzqϕ pθ qϕ
= ∫ (z|x)(log  − log  )dzqϕ
(x, z)pθ
(z)pθ
(z|x)qϕ
(z)pθ
= [log  (x|z)] − ( (z|x)|| (z))E (z|x)qϕ
pθ DKL qϕ pθ
Why?Why?
The higher encourages learning a disentangled representation.
: encourage to learn good representations.
: constraint the capacity of
β
[log  (x|z)]E (z|x)qϕ
pθ
( (z|x)|| (z))DKL qϕ pθ z
The information bottleneck methodThe information bottleneck method
arg max I (Z; Y ) − βI (X; Z)
: maximize mutual information between Z and Y.
: discard irrelevant information about Y from X.
I (Z; Y )
I (X; Z)
Learning is about forgetting irrelevant details.Learning is about forgetting irrelevant details.
ExperimentsExperiments
Understanding disentangling in β-VAE
(https://arxiv.org/abs/1804.03599)
Information Bottleneck TheoryInformation Bottleneck Theory
Basic Information theoryBasic Information theory
EntropyEntropy
Information entropy, Shannon entropy
Measure the uncertainty of an event.
H(X) = E(I (X)) = − p( ) log p( )∑
i=1
n
xi xi
1. Nonnegativity:
2. Symmetry:
3. If and are independent random variable:
H(X) ≥ 0
H(X, Y ) = H(Y , X)
X Y H(X|Y ) = H(X)
EntropyEntropy
天氣預報100% 下⾬,0% 晴天:
天氣預報80% 下⾬,20% 晴天:
天氣預報50% 下⾬,50% 晴天:
1 lo  1 + 0 lo  0 = 0 + 0 = 0g2 g2
−0.8 lo  0.8 − 0.2 lo  0.2 = 0.258 + 0.464 = 0.722g2 g2
−0.5 lo 0.5 − 0.5 lo 0.5 = 0.5 + 0.5 = 1g2 g2
EntropyEntropy
0 0.5 10
0.5
1
Pr(X=1)
H(X)
)
picture source
(https://en.wikipedia.org/wiki/Entropy_(information_theory)
Conditional entropyConditional entropy
To measure how much information needed to describe the outcome of a random variable
Y given that the value of another random variable X is known.
H(Y |X) = p(x)H(Y |X = x)∑
x∈X
= − p(x) p(y|x) log p(y|x)∑
x∈X
∑
y∈Y
= − p(x, y) log ∑
x∈X ,y∈Y
p(x, y)
p(x)
Mutual informationMutual information
To measure how much information obtained about one random variable through
observing the other.
I (X; Y ) = H(X) − H(X|Y )
= H(Y ) − H(Y |X)
= H(X) + H(Y ) − H(X, Y )
= p(x, y) log ∑
x,y
p(x, y)
p(x)p(y)
1. Nonnegativity:
2. Symmetry:
I (X; Y ) ≥ 0
I (X; Y ) = I (Y ; X)
Relation to Kullback–Leibler divergenceRelation to Kullback–Leibler divergence
I (X; Y ) = (p(X, Y )||p(X)p(Y ))DKL
RelationRelation
picture source (https://en.wikipedia.org/wiki/Mutual_information)
RelationRelation
picture source (https://en.wikipedia.org/wiki/Mutual_information)
Cross entropyCross entropy
How much difference between two distributions.
H(q, p) = H(q) + (q||p)DKL
= − p(x) log q(x)∑
x
DKL(q∣p)
H (q)
H (q, p)
NOTION: notation confused with joint entropy.
Di erence between mutual information and cross entropyDi erence between mutual information and cross entropy
Mutual information
Measure the information share between two random variables.
Cross entropy
Measure the difference between two distributions.
Data processing inequality (DPI)Data processing inequality (DPI)
Let be a Markov chain, thenX → Y → Z
I (X; Y ) ≥ I (X; Z)
The neural network generates a successive Markov chainThe neural network generates a successive Markov chain
Treat the whole layer as a single random variableTi
Encoder Decoder
I (X; Y ) ≥ I ( ; Y ) ≥ I ( ; Y ) ≥. . . ≥ I ( ; Y ) ≥ I ( ; Y )T1 T2 Tm Y^
H(X) ≥ I (X; ) ≥ I (X; ) ≥. . . ≥ I (X; ) ≥ I (X; )T1 T2 Tm Y^
Codebook and volumeCodebook and volume
Let
: signal source with xed probability measure
: quantized codebook
: a soft partition of , with probability with
X p(x)
X^
p( |x)x^ X
p( ) = p(x)p( |x)x^ ∑
x
x^
What determines the quality of a quantization?What determines the quality of a quantization?
Rate, the average numbers of bits per message to encode the signal.
The information to transmit from to is bounded from belowX X^
I (X; )X^
Rate distortion theoryRate distortion theory
Bernd Girod: EE368b Image and Video Compression Rate Distortion Theory no. 1
Lossy compression
n Lower the bit-rate R by allowing some acceptable distortion
D of the signal.
Distortion D
Rate R
Lossless coding
D=0
Rate distortion theoryRate distortion theory
Bernd Girod: EE368b Image and Video Compression Rate Distortion Theory no. 2
Types of lossy compression problems
D
R
n Given maximum rate R,
minimize distortion D
n Given distortion D, minimize
rate R
D
R
Equivalent constrained optimization problems,
often unwieldy due to constraint.
Rate distortion theoryRate distortion theory
Def. rate distortion function as
R(D) = min I (X; )X^
w. r. t. E[d(x, )] ≤ Dx^
Apply Lagrange multiplier:
F (p( |x)) = I (X; ) + βE[d(x, )]x^ X^ x^
Information bottleneck methodInformation bottleneck method
, thenX → → YX^ I (X; ) ≥ I (X; Y )X^
Information bottleneck:
arg min L(x, ) = I (X; ) − βI ( ; Y )x^ X^ X^
We want this quantization to capture as much information about
tradeoff between compress the representation and preserve meaningful information.
Y
Information bottleneck methodInformation bottleneck method
x2
x3
x4
x5
z4
z3
z2
z1
x1
x2
x3
x4
x5
x1
Opening the black box of Deep Neural Networks viaOpening the black box of Deep Neural Networks via
InformationInformation
IssuesIssues
1. The SGD layer dynamics in the Information plane.
2. The effect of the training sample size on the layers.
3. What is the bene t of the hidden layers?
4. What is the nal location of the hidden layers?
5. Do the hidden layers form optimal IB representations?
SetupSetup
standard DNN settings
tanh as activation function
sigmoid function in the nal layer
train with SGD and cross-entropy loss
7 fully connected hidden layers with widths: 12-10-7-5-4-3-2 neurons
Information planeInformation plane
Encoder Decoder
Given , plot point on the information plane.
Applied to the Markov chain of a k-layers of DNN, connected points form a unique
information path.
P (X; Y ) (I (X; T ), I (T ; Y ))
The dynamics of the training by Stochastic-Gradient-DecentThe dynamics of the training by Stochastic-Gradient-Decent
50 different randomized initializations with different randomized training samples
init − 400epochs − 9000epochs
The optimization process in the Information Plane (https://www.youtube.com/watch?
v=P1A1yNsxMjc)
The two optimization phases in the Information PlaneThe two optimization phases in the Information Plane
5% - 45% - 85% training samples5% - 45% - 85% training samples
Emperical risk minimization (ERM) phase (fast)
increase
layer learn the information while preserving the DPI order
Representation compression phase (slow)
decrease until convergence
layer lose irrelevant information (compression)
IY
IX
The drift and di usion phases of SGD optimizationThe drift and di usion phases of SGD optimization
Layer weight's gradient distributionsLayer weight's gradient distributions
The drift and di usion phases of SGD optimizationThe drift and di usion phases of SGD optimization
Drift phase
large gradient mean, small variance (high SNR)
increase and reduce the emperical error
ERM phase
Diffusion phase
small gradient mean, large uctuations (low SNR)
the gradients behave like Gaussian noise, weights evolve like Wiener
process
compression phase
Maximize the entropy of the weight distribution by addiing noise, known
as stochastic relaxation
compression by diffusion phase
attempts to interpret single weights or even single neurons in such networks can
be meaningless
IY
The computational bene t of the hidden layersThe computational bene t of the hidden layers
Train 6 different architecture with 1-6 hidden layers
The computational bene t of the hidden layersThe computational bene t of the hidden layers
1. Adding hidden layers dramatically reduces the number of training epochs for good
generalization.
2. The compression phase of each layer is shorter when it starts from a previous
compressed layer.
3. The compression is faster for the deeper (narrower and closer to the output)
layers.
4. Even wide hidden layers eventually compress in the diffusion phase. Adding extra
width does not help.
Convergence to the layers to the Information Bottleneck boundConvergence to the layers to the Information Bottleneck bound
Evolution of the layers with training sample sizeEvolution of the layers with training sample size
0 1 2 3 4 5 6 7 8 9
I(X;T)
0.3
0.4
0.5
0.6
0.7
I(T;Y)
4%
84%
Training data
with increasing training size the layers’ true label information (generalization) is
pushed up and gets closer to the theoretical IB bound for the rule distribution.
IY
Are our ndings general enough?Are our ndings general enough?
Hinton 的評論Hinton 的評論
Hinton 在聽完Tishby 的talk 之後,給Tishby 發了email:
“I have to listen to it another 10,000 times to really understand it,
but it’s very rare nowadays to hear a talk with a really original
idea in it that may be the answer to a really major puzzle.”
Caution!Caution!
No, information bottleneck (probably) doesn’t open the “black-box” of deep neural n
(https://severelytheoretical.wordpress.com/2017/09/28/no-information-bottlenec
black-box-of-deep-neural-networks/)
Tishby's 'Opening the Black Box of Deep Neural Networks via Information' received
(https://www.reddit.com/r/MachineLearning/comments/72eau7/d_tishbys_opening
On the Information Bottleneck Theory of Deep Learning [Harvard University] [ICLR
(https://openreview.net/forum?id=ry_WPG-A-)
Thank you for attentionThank you for attention
ReferenceReference
18. Information Theory of Deep Learning. Naftali Tishby
(https://www.youtube.com/watch?v=bLqJHjXihK8)

Mais conteúdo relacionado

Mais procurados

開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDKNVIDIA Japan
 
レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証
レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証
レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証Recruit Technologies
 
[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介
[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介
[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介Preferred Networks
 
【CVPR 2019】Do Better ImageNet Models Transfer Better?
【CVPR 2019】Do Better ImageNet Models Transfer Better?【CVPR 2019】Do Better ImageNet Models Transfer Better?
【CVPR 2019】Do Better ImageNet Models Transfer Better?cvpaper. challenge
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartIMC Institute
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Differentiable Ray Sampling for Neural 3D Representation
Differentiable Ray Sampling for Neural 3D RepresentationDifferentiable Ray Sampling for Neural 3D Representation
Differentiable Ray Sampling for Neural 3D RepresentationPreferred Networks
 
MineRL Competition Tutorial with ChainerRL
MineRL Competition Tutorial with ChainerRLMineRL Competition Tutorial with ChainerRL
MineRL Competition Tutorial with ChainerRLPreferred Networks
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Preferred Networks
 
強化学習@PyData.Tokyo
強化学習@PyData.Tokyo強化学習@PyData.Tokyo
強化学習@PyData.TokyoNaoto Yoshida
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya
 
Explanation in Machine Learning and Its Reliability
Explanation in Machine Learning and Its ReliabilityExplanation in Machine Learning and Its Reliability
Explanation in Machine Learning and Its ReliabilitySatoshi Hara
 
SimGAN 輪講資料
SimGAN 輪講資料SimGAN 輪講資料
SimGAN 輪講資料Genki Mori
 
Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西Keigo Nishida
 
[DL輪読会]Attentive neural processes
[DL輪読会]Attentive neural processes[DL輪読会]Attentive neural processes
[DL輪読会]Attentive neural processesDeep Learning JP
 
深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)
深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)
深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)Preferred Networks
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019Yusuke Uchida
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Bilgin Ibryam
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP
 
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida
 

Mais procurados (20)

開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK開発者が語る NVIDIA cuQuantum SDK
開発者が語る NVIDIA cuQuantum SDK
 
レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証
レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証
レコメンドバッチ高速化に向けたSpark/MapReduceの機械学習ライブラリ比較検証
 
[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介
[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介
[GTCJ2018]CuPy -NumPy互換GPUライブラリによるPythonでの高速計算- PFN奥田遼介
 
【CVPR 2019】Do Better ImageNet Models Transfer Better?
【CVPR 2019】Do Better ImageNet Models Transfer Better?【CVPR 2019】Do Better ImageNet Models Transfer Better?
【CVPR 2019】Do Better ImageNet Models Transfer Better?
 
Big data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera QuickstartBig data processing using Hadoop with Cloudera Quickstart
Big data processing using Hadoop with Cloudera Quickstart
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Differentiable Ray Sampling for Neural 3D Representation
Differentiable Ray Sampling for Neural 3D RepresentationDifferentiable Ray Sampling for Neural 3D Representation
Differentiable Ray Sampling for Neural 3D Representation
 
MineRL Competition Tutorial with ChainerRL
MineRL Competition Tutorial with ChainerRLMineRL Competition Tutorial with ChainerRL
MineRL Competition Tutorial with ChainerRL
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
強化学習@PyData.Tokyo
強化学習@PyData.Tokyo強化学習@PyData.Tokyo
強化学習@PyData.Tokyo
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Explanation in Machine Learning and Its Reliability
Explanation in Machine Learning and Its ReliabilityExplanation in Machine Learning and Its Reliability
Explanation in Machine Learning and Its Reliability
 
SimGAN 輪講資料
SimGAN 輪講資料SimGAN 輪講資料
SimGAN 輪講資料
 
Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西Layer Normalization@NIPS+読み会・関西
Layer Normalization@NIPS+読み会・関西
 
[DL輪読会]Attentive neural processes
[DL輪読会]Attentive neural processes[DL輪読会]Attentive neural processes
[DL輪読会]Attentive neural processes
 
深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)
深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)
深層学習による製造業のスマート化と産業応用の将来展望(クオリティフォーラム2020講演資料)
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
 
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話
You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話
 

Semelhante a 從 VAE 走向深度學習新理論

Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Olga Zinkevych
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic DifferentiationSSA KPI
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learningYujiro Katagiri
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problemsSSA KPI
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionXin-She Yang
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayesmehdi Cherti
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdfJunZhao68
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisSilvio Cesare
 

Semelhante a 從 VAE 走向深度學習新理論 (20)

Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic Differentiation
 
Multimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and EntropyMultimedia Communication Lec02: Info Theory and Entropy
Multimedia Communication Lec02: Info Theory and Entropy
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problems
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016HPWFcorePRES--FUR2016
HPWFcorePRES--FUR2016
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
Multitask learning for GGM
Multitask learning for GGMMultitask learning for GGM
Multitask learning for GGM
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf02-VariableLengthCodes_pres.pdf
02-VariableLengthCodes_pres.pdf
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 

Mais de 岳華 杜

[COSCUP 2023] 我的Julia軟體架構演進之旅
[COSCUP 2023] 我的Julia軟體架構演進之旅[COSCUP 2023] 我的Julia軟體架構演進之旅
[COSCUP 2023] 我的Julia軟體架構演進之旅岳華 杜
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future岳華 杜
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia岳華 杜
 
20190907 Julia the language for future
20190907 Julia the language for future20190907 Julia the language for future
20190907 Julia the language for future岳華 杜
 
Metaprogramming in julia
Metaprogramming in juliaMetaprogramming in julia
Metaprogramming in julia岳華 杜
 
Introduction to julia
Introduction to juliaIntroduction to julia
Introduction to julia岳華 杜
 
自然語言處理概覽
自然語言處理概覽自然語言處理概覽
自然語言處理概覽岳華 杜
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning岳華 杜
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
 
Batch normalization 與他愉快的小伙伴
Batch normalization 與他愉快的小伙伴Batch normalization 與他愉快的小伙伴
Batch normalization 與他愉快的小伙伴岳華 杜
 
COSCUP: Foreign Function Call in Julia
COSCUP: Foreign Function Call in JuliaCOSCUP: Foreign Function Call in Julia
COSCUP: Foreign Function Call in Julia岳華 杜
 
COSCUP: Metaprogramming in Julia
COSCUP: Metaprogramming in JuliaCOSCUP: Metaprogramming in Julia
COSCUP: Metaprogramming in Julia岳華 杜
 
COSCUP: Introduction to Julia
COSCUP: Introduction to JuliaCOSCUP: Introduction to Julia
COSCUP: Introduction to Julia岳華 杜
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia岳華 杜
 
20180506 Introduction to machine learning
20180506 Introduction to machine learning20180506 Introduction to machine learning
20180506 Introduction to machine learning岳華 杜
 
20171127 當julia遇上資料科學
20171127 當julia遇上資料科學20171127 當julia遇上資料科學
20171127 當julia遇上資料科學岳華 杜
 
20171117 oop and design patterns in julia
20171117 oop and design patterns in julia20171117 oop and design patterns in julia
20171117 oop and design patterns in julia岳華 杜
 
20171014 tips for manipulating filesystem in julia
20171014 tips for manipulating filesystem in julia20171014 tips for manipulating filesystem in julia
20171014 tips for manipulating filesystem in julia岳華 杜
 
20170807 julia的簡單而高效資料處理
20170807 julia的簡單而高效資料處理20170807 julia的簡單而高效資料處理
20170807 julia的簡單而高效資料處理岳華 杜
 
20170715 北Bio meetup
20170715 北Bio meetup20170715 北Bio meetup
20170715 北Bio meetup岳華 杜
 

Mais de 岳華 杜 (20)

[COSCUP 2023] 我的Julia軟體架構演進之旅
[COSCUP 2023] 我的Julia軟體架構演進之旅[COSCUP 2023] 我的Julia軟體架構演進之旅
[COSCUP 2023] 我的Julia軟體架構演進之旅
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
 
20190907 Julia the language for future
20190907 Julia the language for future20190907 Julia the language for future
20190907 Julia the language for future
 
Metaprogramming in julia
Metaprogramming in juliaMetaprogramming in julia
Metaprogramming in julia
 
Introduction to julia
Introduction to juliaIntroduction to julia
Introduction to julia
 
自然語言處理概覽
自然語言處理概覽自然語言處理概覽
自然語言處理概覽
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Batch normalization 與他愉快的小伙伴
Batch normalization 與他愉快的小伙伴Batch normalization 與他愉快的小伙伴
Batch normalization 與他愉快的小伙伴
 
COSCUP: Foreign Function Call in Julia
COSCUP: Foreign Function Call in JuliaCOSCUP: Foreign Function Call in Julia
COSCUP: Foreign Function Call in Julia
 
COSCUP: Metaprogramming in Julia
COSCUP: Metaprogramming in JuliaCOSCUP: Metaprogramming in Julia
COSCUP: Metaprogramming in Julia
 
COSCUP: Introduction to Julia
COSCUP: Introduction to JuliaCOSCUP: Introduction to Julia
COSCUP: Introduction to Julia
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
20180506 Introduction to machine learning
20180506 Introduction to machine learning20180506 Introduction to machine learning
20180506 Introduction to machine learning
 
20171127 當julia遇上資料科學
20171127 當julia遇上資料科學20171127 當julia遇上資料科學
20171127 當julia遇上資料科學
 
20171117 oop and design patterns in julia
20171117 oop and design patterns in julia20171117 oop and design patterns in julia
20171117 oop and design patterns in julia
 
20171014 tips for manipulating filesystem in julia
20171014 tips for manipulating filesystem in julia20171014 tips for manipulating filesystem in julia
20171014 tips for manipulating filesystem in julia
 
20170807 julia的簡單而高效資料處理
20170807 julia的簡單而高效資料處理20170807 julia的簡單而高效資料處理
20170807 julia的簡單而高效資料處理
 
20170715 北Bio meetup
20170715 北Bio meetup20170715 北Bio meetup
20170715 北Bio meetup
 

Último

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

從 VAE 走向深度學習新理論

  • 2. Deep Learning is a kind of Representational LearningDeep Learning is a kind of Representational Learning
  • 3. Deep Learning is a kind of Representational LearningDeep Learning is a kind of Representational Learning picture source (https://www.deeplearningbook.org/)
  • 5. Representational LearningRepresentational Learning woman Representation Learning: A Review and New Perspectives (https://arxiv.org/abs/1206.5538)
  • 7. Restricted Boltzmann MachinesRestricted Boltzmann Machines An unsupervised greedy way to extract featuresAn unsupervised greedy way to extract features 發明:發明: Smolensky, Paul (1986). Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory. 應⽤:應⽤: 降維:Hinton, G. E.; Salakhutdinov, R. R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science. 分類:Larochelle, H.; Bengio, Y. (2008). Classi cation using discriminative restricted Boltzmann machines. ICML '08. 協同過濾:Salakhutdinov, R.; Mnih, A.; Hinton, G. (2007). Restricted Boltzmann machines for collaborative ltering. ICML '07. 特徵學習:Coates, Adam; Lee, Honglak; Ng, Andrew Y. (2011). An analysis of single-layer networks in unsupervised feature learning. International Conference on Arti cial Intelligence and Statistics (AISTATS).
  • 8. Restricted Boltzmann MachinesRestricted Boltzmann Machines x2 x3 x4 x5 z2 z1 x1 A Beginner's Guide to Restricted Boltzmann Machines (RBMs) (https://skymind.ai/wiki/restricted-boltzmann-machine)
  • 9. Deep Belief Network [Hinton]Deep Belief Network [Hinton] A greedy layerwise unsupervised pre-training methodA greedy layerwise unsupervised pre-training method W1 W1 W2
  • 10. Deep Belief NetworkDeep Belief Network W1 W2 W1 T W2 T
  • 11. We need generative model!We need generative model! Discriminative model: Generative model: p(Y |X) p(X, Y )
  • 12. Disentangle explanatory generative factorsDisentangle explanatory generative factors to disentangle as many factors as possible, discarding as little information about the data as is practical x2 x3 x4 x5 z2 z1 x1 x2 x3 x4 x5 x1 z1 z2
  • 14. A generative modelA generative model z x N We hope to learn generative factors by unsupervised method
  • 16.
  • 17. The factorThe factor x y N y=θ0+θ1 x θ θ=(θ0, θ1)
  • 18. To learn latent random variablesTo learn latent random variables z x N θ
  • 19. Introduce Bayesian theoremIntroduce Bayesian theorem (z|x) =pθ (x|z) (z)pθ pθ (x)pθ (x) = ∫ (x|z) (z)dzpθ pθ pθ is intractable.(x)pθ Variational inference: useVariational inference: use to approximateto approximate(z|x)qϕ (z|x)pθ
  • 20. Kullback–Leibler divergenceKullback–Leibler divergence Relative entropy, to measure the dissimilarity between two distributions. Use data to approximate theoretical distributionp(X) q(X) (p(X)||q(X)) = − p( ) log DKL ∑ i xi q( )xi p( )xi 1. Asymmetry 2. Not distance 3. 4. and are equal (p(X)||q(X)) ≥ 0DKL (p(X)||q(X)) = 0DKL ⇔ p(X) q(X)
  • 21. FormulationFormulation (z|x) =pθ (x|z) (z)pθ pθ (x)pθ arg ( (z|x)|| (z|x))min ϕ DKL qϕ pθ x zφ θ N x z θ N
  • 23. Evidence Lower Bound method (ELOB)Evidence Lower Bound method (ELOB) ( (z|x)|| (z|x))DKL qϕ pθ = ∫ q(z|x) log  dz q(z|x) p(z|x) = ∫ q(z|x) log  dz q(z|x)p(x) p(x, z) = ∫ q(z|x) log  dz + ∫ q(z|x) log p(x)dz q(z|x) p(x, z) = ∫ q(z|x)(log q(z|x) − log p(x, z))dz + log p(x) = − [log p(x, z) − log q(z|x)] + log p(x)Eq(z|x)
  • 24. Evidence Lower Bound method (ELOB)Evidence Lower Bound method (ELOB) Let is called (variational) lower bound or evidence lower bound. L(θ, ϕ, x) = [log  (x, z) − log  (z|x)]E (z|x)qϕ pθ qϕ ( (z|x)|| (z|x)) = −L(θ, ϕ, x) + log p(x)DKL qϕ pθ log p(x) = ( (z|x)|| (z|x)) ↙ +L(θ, ϕ, x) ↗DKL qϕ pθ
  • 25. Evidence Lower Bound method (ELOB)Evidence Lower Bound method (ELOB) Encoder: , Decoder: (z|x) =pθ (x|z) (z)pθ pθ (x)pθ arg ( (z|x)|| (z|x))min ϕ DKL qϕ pθ ⇓ (z|x) =pθ (x|z) (z)pθ pθ (x)pθ arg L(θ, ϕ, x)max θ,ϕ (z|x)qϕ (x|z)pθ
  • 26. Hypothesis: gaussian mixture as latent representationHypothesis: gaussian mixture as latent representation z2 z1 μz2 μz1 σz 2 σz1 z2 z2 z1 z1
  • 27. Encoder and decoderEncoder and decoder z2 z2 z1 z1 encoderencoder decoder
  • 28. How to solve?How to solve? Mean eld variational approximation Sampling by Markov chain Monte Carlo More?
  • 29. Sampling by MCMCSampling by MCMC picture source (https://www.youtube.com/watch? v=OTO1DygELpY)
  • 30. Stochastic gradient descent?Stochastic gradient descent? L(θ, ϕ, x) = [log  (x, z) − log  (z|x)]E (z|x)qϕ pθ qϕ L(θ, ϕ, x) = [−log  (z|x)]∇ϕ ∇ϕ E (z|x)qϕ qϕ
  • 31. Reparameterization trickReparameterization trick Encoder ( ) Decoder ( ) Sample from Encoder ( ) Decoder ( ) Sample from * + Tutorial on Variational Autoencoders (https://arxiv.org/abs/1606.05908)
  • 32. Stochastic gradient variational bayes (SGVB)Stochastic gradient variational bayes (SGVB) ⾒Algorithm 1 in Auto-Encoding Variational Bayes⾒Algorithm 1 in Auto-Encoding Variational Bayes
  • 33. Example: variational autoencoderExample: variational autoencoder z2 z2 z1 z1 encoderencoder decoder
  • 34. ExperimentsExperiments (a) Learned Frey Face manifold (b) Learned MNIST manifold
  • 36. Achieve disentangled explainable generative factorAchieve disentangled explainable generative factor
  • 37. Achieve disentangled explainable generative factorAchieve disentangled explainable generative factor
  • 38. Achieve disentangled explainable generative factorAchieve disentangled explainable generative factor Figure 6 in β-VAE: LEARNING BASIC VISUAL CONCEPTS WITH A CONSTRAINED VARIATIONAL FRAMEWORK
  • 39. What is the di erence between VAE andWhat is the di erence between VAE and -VAE?-VAE?β VAE: -VAE: arg max L(θ, ϕ, x) = [log  (x|z)] − ( (z|x)|| (z))E (z|x)qϕ pθ DKL qϕ pθ β arg max L(θ, ϕ, x) = [log  (x|z)] − β ( (z|x)|| (z))E (z|x)qϕ pθ DKL qϕ pθ L(θ, ϕ, x) = [log  (x, z) − log  (z|x)]E (z|x)qϕ pθ qϕ = ∫ (z|x)(log  (x, z) − log  (z|x))dzqϕ pθ qϕ = ∫ (z|x)(log  − log  )dzqϕ (x, z)pθ (z)pθ (z|x)qϕ (z)pθ = [log  (x|z)] − ( (z|x)|| (z))E (z|x)qϕ pθ DKL qϕ pθ
  • 40. Why?Why? The higher encourages learning a disentangled representation. : encourage to learn good representations. : constraint the capacity of β [log  (x|z)]E (z|x)qϕ pθ ( (z|x)|| (z))DKL qϕ pθ z
  • 41. The information bottleneck methodThe information bottleneck method arg max I (Z; Y ) − βI (X; Z) : maximize mutual information between Z and Y. : discard irrelevant information about Y from X. I (Z; Y ) I (X; Z) Learning is about forgetting irrelevant details.Learning is about forgetting irrelevant details.
  • 42. ExperimentsExperiments Understanding disentangling in β-VAE (https://arxiv.org/abs/1804.03599)
  • 44. Basic Information theoryBasic Information theory EntropyEntropy Information entropy, Shannon entropy Measure the uncertainty of an event. H(X) = E(I (X)) = − p( ) log p( )∑ i=1 n xi xi 1. Nonnegativity: 2. Symmetry: 3. If and are independent random variable: H(X) ≥ 0 H(X, Y ) = H(Y , X) X Y H(X|Y ) = H(X)
  • 45. EntropyEntropy 天氣預報100% 下⾬,0% 晴天: 天氣預報80% 下⾬,20% 晴天: 天氣預報50% 下⾬,50% 晴天: 1 lo  1 + 0 lo  0 = 0 + 0 = 0g2 g2 −0.8 lo  0.8 − 0.2 lo  0.2 = 0.258 + 0.464 = 0.722g2 g2 −0.5 lo 0.5 − 0.5 lo 0.5 = 0.5 + 0.5 = 1g2 g2
  • 46. EntropyEntropy 0 0.5 10 0.5 1 Pr(X=1) H(X) ) picture source (https://en.wikipedia.org/wiki/Entropy_(information_theory)
  • 47. Conditional entropyConditional entropy To measure how much information needed to describe the outcome of a random variable Y given that the value of another random variable X is known. H(Y |X) = p(x)H(Y |X = x)∑ x∈X = − p(x) p(y|x) log p(y|x)∑ x∈X ∑ y∈Y = − p(x, y) log ∑ x∈X ,y∈Y p(x, y) p(x)
  • 48. Mutual informationMutual information To measure how much information obtained about one random variable through observing the other. I (X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X) = H(X) + H(Y ) − H(X, Y ) = p(x, y) log ∑ x,y p(x, y) p(x)p(y) 1. Nonnegativity: 2. Symmetry: I (X; Y ) ≥ 0 I (X; Y ) = I (Y ; X)
  • 49. Relation to Kullback–Leibler divergenceRelation to Kullback–Leibler divergence I (X; Y ) = (p(X, Y )||p(X)p(Y ))DKL
  • 52. Cross entropyCross entropy How much difference between two distributions. H(q, p) = H(q) + (q||p)DKL = − p(x) log q(x)∑ x DKL(q∣p) H (q) H (q, p) NOTION: notation confused with joint entropy.
  • 53. Di erence between mutual information and cross entropyDi erence between mutual information and cross entropy Mutual information Measure the information share between two random variables. Cross entropy Measure the difference between two distributions.
  • 54. Data processing inequality (DPI)Data processing inequality (DPI) Let be a Markov chain, thenX → Y → Z I (X; Y ) ≥ I (X; Z)
  • 55. The neural network generates a successive Markov chainThe neural network generates a successive Markov chain Treat the whole layer as a single random variableTi Encoder Decoder I (X; Y ) ≥ I ( ; Y ) ≥ I ( ; Y ) ≥. . . ≥ I ( ; Y ) ≥ I ( ; Y )T1 T2 Tm Y^ H(X) ≥ I (X; ) ≥ I (X; ) ≥. . . ≥ I (X; ) ≥ I (X; )T1 T2 Tm Y^
  • 56. Codebook and volumeCodebook and volume Let : signal source with xed probability measure : quantized codebook : a soft partition of , with probability with X p(x) X^ p( |x)x^ X p( ) = p(x)p( |x)x^ ∑ x x^
  • 57. What determines the quality of a quantization?What determines the quality of a quantization? Rate, the average numbers of bits per message to encode the signal. The information to transmit from to is bounded from belowX X^ I (X; )X^
  • 58. Rate distortion theoryRate distortion theory Bernd Girod: EE368b Image and Video Compression Rate Distortion Theory no. 1 Lossy compression n Lower the bit-rate R by allowing some acceptable distortion D of the signal. Distortion D Rate R Lossless coding D=0
  • 59. Rate distortion theoryRate distortion theory Bernd Girod: EE368b Image and Video Compression Rate Distortion Theory no. 2 Types of lossy compression problems D R n Given maximum rate R, minimize distortion D n Given distortion D, minimize rate R D R Equivalent constrained optimization problems, often unwieldy due to constraint.
  • 60. Rate distortion theoryRate distortion theory Def. rate distortion function as R(D) = min I (X; )X^ w. r. t. E[d(x, )] ≤ Dx^ Apply Lagrange multiplier: F (p( |x)) = I (X; ) + βE[d(x, )]x^ X^ x^
  • 61. Information bottleneck methodInformation bottleneck method , thenX → → YX^ I (X; ) ≥ I (X; Y )X^ Information bottleneck: arg min L(x, ) = I (X; ) − βI ( ; Y )x^ X^ X^ We want this quantization to capture as much information about tradeoff between compress the representation and preserve meaningful information. Y
  • 62. Information bottleneck methodInformation bottleneck method x2 x3 x4 x5 z4 z3 z2 z1 x1 x2 x3 x4 x5 x1
  • 63. Opening the black box of Deep Neural Networks viaOpening the black box of Deep Neural Networks via InformationInformation
  • 64. IssuesIssues 1. The SGD layer dynamics in the Information plane. 2. The effect of the training sample size on the layers. 3. What is the bene t of the hidden layers? 4. What is the nal location of the hidden layers? 5. Do the hidden layers form optimal IB representations?
  • 65. SetupSetup standard DNN settings tanh as activation function sigmoid function in the nal layer train with SGD and cross-entropy loss 7 fully connected hidden layers with widths: 12-10-7-5-4-3-2 neurons
  • 66. Information planeInformation plane Encoder Decoder Given , plot point on the information plane. Applied to the Markov chain of a k-layers of DNN, connected points form a unique information path. P (X; Y ) (I (X; T ), I (T ; Y ))
  • 67. The dynamics of the training by Stochastic-Gradient-DecentThe dynamics of the training by Stochastic-Gradient-Decent 50 different randomized initializations with different randomized training samples init − 400epochs − 9000epochs The optimization process in the Information Plane (https://www.youtube.com/watch? v=P1A1yNsxMjc)
  • 68. The two optimization phases in the Information PlaneThe two optimization phases in the Information Plane 5% - 45% - 85% training samples5% - 45% - 85% training samples Emperical risk minimization (ERM) phase (fast) increase layer learn the information while preserving the DPI order Representation compression phase (slow) decrease until convergence layer lose irrelevant information (compression) IY IX
  • 69. The drift and di usion phases of SGD optimizationThe drift and di usion phases of SGD optimization Layer weight's gradient distributionsLayer weight's gradient distributions
  • 70. The drift and di usion phases of SGD optimizationThe drift and di usion phases of SGD optimization Drift phase large gradient mean, small variance (high SNR) increase and reduce the emperical error ERM phase Diffusion phase small gradient mean, large uctuations (low SNR) the gradients behave like Gaussian noise, weights evolve like Wiener process compression phase Maximize the entropy of the weight distribution by addiing noise, known as stochastic relaxation compression by diffusion phase attempts to interpret single weights or even single neurons in such networks can be meaningless IY
  • 71. The computational bene t of the hidden layersThe computational bene t of the hidden layers Train 6 different architecture with 1-6 hidden layers
  • 72. The computational bene t of the hidden layersThe computational bene t of the hidden layers 1. Adding hidden layers dramatically reduces the number of training epochs for good generalization. 2. The compression phase of each layer is shorter when it starts from a previous compressed layer. 3. The compression is faster for the deeper (narrower and closer to the output) layers. 4. Even wide hidden layers eventually compress in the diffusion phase. Adding extra width does not help.
  • 73. Convergence to the layers to the Information Bottleneck boundConvergence to the layers to the Information Bottleneck bound
  • 74. Evolution of the layers with training sample sizeEvolution of the layers with training sample size 0 1 2 3 4 5 6 7 8 9 I(X;T) 0.3 0.4 0.5 0.6 0.7 I(T;Y) 4% 84% Training data
  • 75. with increasing training size the layers’ true label information (generalization) is pushed up and gets closer to the theoretical IB bound for the rule distribution. IY
  • 76. Are our ndings general enough?Are our ndings general enough?
  • 77. Hinton 的評論Hinton 的評論 Hinton 在聽完Tishby 的talk 之後,給Tishby 發了email: “I have to listen to it another 10,000 times to really understand it, but it’s very rare nowadays to hear a talk with a really original idea in it that may be the answer to a really major puzzle.”
  • 78. Caution!Caution! No, information bottleneck (probably) doesn’t open the “black-box” of deep neural n (https://severelytheoretical.wordpress.com/2017/09/28/no-information-bottlenec black-box-of-deep-neural-networks/) Tishby's 'Opening the Black Box of Deep Neural Networks via Information' received (https://www.reddit.com/r/MachineLearning/comments/72eau7/d_tishbys_opening On the Information Bottleneck Theory of Deep Learning [Harvard University] [ICLR (https://openreview.net/forum?id=ry_WPG-A-)
  • 79. Thank you for attentionThank you for attention ReferenceReference 18. Information Theory of Deep Learning. Naftali Tishby (https://www.youtube.com/watch?v=bLqJHjXihK8)