SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Metrics	
  for	
  distributions	
  and	
  their	
  
applications	
  for	
  generative	
  models	
  
(part	
  1)
Dai	
  Hai	
  Nguyen
Kyoto	
  University
Learning	
  generative	
  models	
  ?
Q
𝑃"
Ρ
Distance	
  ( 𝑃",Q)=?
𝑄 =
1
𝑛
( 𝛿*+
,
-./
Learning generative models?
• Maximum Likelihood Estimation (MLE):
Given training samples 𝑥/, 𝑥2,…, 𝑥,, how to learn 𝑝45678 𝑥; 𝜃 from
which training samples are likely to be generated
𝜃∗
= 𝑎𝑟𝑔𝑚𝑎𝑥" ( log	
   𝑝45678(𝑥-; 𝜃)
,
-./
Learning	
  generative	
  models?
• Likelihood-free model
Random	
  input
NEURAL	
  NETWORK
Z~Uniform
Generator Output
Learning	
  generative	
  models	
  ?
Q
𝑃"
Ρ
Distance	
  ( 𝑃",Q)=?
𝑄 =
1
𝑛
( 𝛿*+
,
-./
How to measure similarity between 𝑝 and 𝑞 ?
§ Kullback-Leibler (KL) divergence: asymmetric, i.e., 𝐷HI(𝑝| 𝑞 ≠ 𝐷HI(𝑞| 𝑝
𝐷HI(𝑝| 𝑞 = L 𝑝 𝑥 𝑙𝑜𝑔
𝑝(𝑥)
𝑞(𝑥)
𝑑𝑥
§ Jensen-shanon (JS) divergence: symmetric
𝐷PQ(𝑝| 𝑞 =
1
2
𝐷HI(𝑝||
𝑝 + 𝑞
2
) +
1
2
𝐷HI(𝑞||
𝑝 + 𝑞
2
)
§ Optimal transport (OT):
𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓
X~Z([,)
𝐸 *,^ ~X[||𝑥 − 𝑦||]
Where Π(𝑝, 𝑞) is a set of all joint distribution of (X, Y) with marginals 𝑝 and 𝑞
Many fundamental problems can be cast as quantifying
similarity between two distributions
§ Maximum likelihood estimation (MLE) is equivalent to minimizing KL
divergence
Suppose we sample N of 𝑥~𝑝(𝑥|𝜃∗
)
MLE of 𝜃 is
𝜃∗
= argmin
"
−
1
𝑁
( log 𝑝 𝑥- 𝜃 =
j
-./
− Ε*~[ 𝑥 𝜃∗ [log 𝑝 𝑥 𝜃 ]
By def of KL divergence:
𝐷HI(𝑝(𝑥|𝜃∗
)| 𝑝 𝑥 𝜃 = Ε*~[ 𝑥 𝜃∗ [log
𝑝 𝑥 𝜃∗
𝑝 𝑥 𝜃
]
= Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃∗
− Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃
Training GAN is equivalent to minimizing JS divergence
§ GAN has two networks: D and G, which are playing a minimax game
min
l
max
n
𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~r(q) log(1 − 𝐷(𝐺(𝑧)))
= Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥))
Where 𝑝 𝑥 	
  and 𝑞(𝑥)	
  is the distributions of fake images and real images,
respectively
§ Fixing G, optimal D can be easily obtained:
𝐷 𝑥 =
𝑞(𝑥)
𝑝 𝑥 + 𝑞(𝑥)
Training GAN is equivalent to minimizing JS divergence
§ GAN has two networks: D and G, which are playing a minimax game
min
l
max
n
𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~[(q) log(1 − 𝐷(𝐺(𝑧)))
= Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥))
Where 𝑝 𝑥 	
  and 𝑞(𝑥)	
  is the distribution of fake and real images, respectively
§ Fixing G, optimal D can be easily obtained by:
𝐷 𝑥 =
𝑝(𝑥)
𝑝 𝑥 + 𝑞(𝑥)
And 𝐿 𝐷, 𝐺 = ∫ 𝑞 𝑥 𝑙𝑜𝑔
(*)
[ * u(*)
𝑑𝑥 + ∫ 𝑝 𝑥 𝑙𝑜𝑔
[(*)
[ * u(*)
𝑑𝑥
= 2𝐷PQ (𝑝| 𝑞 − log4
f-­‐divergences
• Divergence	
  between	
  two	
  distributions
𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓(
𝑞 𝑥
𝑝 𝑥
)𝑑𝑥
• f:	
  generator	
  function,	
  convex	
  and	
  f(1)	
  =	
  0
• Every	
  function	
  f	
  has	
  a	
  convex	
  conjugate	
  f*	
  such	
  that:
𝑓 𝑥 = sup
^∈654(w∗)
{𝑥𝑦	
   − 𝑓∗
(𝑦)}
f-­‐divergences
• Different	
  generator	
  f	
  give	
  different	
  divergences
Estimating	
  f-­‐divergences	
  from	
  samples
𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓
𝑞 𝑥
𝑝 𝑥
𝑑𝑥
= L 𝑝 𝑥 sup
~∈654(w∗)
{𝑡
𝑞 𝑥
𝑝 𝑥
− 𝑓∗
(𝑡)} 𝑑𝑥
≥ sup
•∈‚
{L 𝑞 𝑥 𝑇 𝑥 𝑑𝑥 −L 𝑝 𝑥 𝑓∗
𝑇 𝑥 𝑑𝑥}
= sup
•∈‚
{ 𝐸*~„ 𝑇 𝑥 − 𝐸*~… 𝑓∗
𝑇 𝑥 }
Samples	
  from	
  PSamples	
  from	
  Q
Conjugate	
  function	
  of	
  f(x):
𝑓∗
𝑥 = sup
~∈654(w)
{𝑡𝑥 − 𝑓(𝑡)}
Some	
  properties:
• 𝑓(𝑥) = sup
~∈654(w∗)
{𝑡𝑥 − 𝑓∗
𝑡 }
• 𝑓∗∗
𝑥 = 𝑓 𝑥
• 𝑓∗
𝑥 is	
  always	
  convec
Training	
  f-­‐divergence	
  GAN
• f-­‐GAN:
m𝑖𝑛
"
max
†
	
   𝐹 𝜃, 𝑤 = 𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰
𝑓∗
𝑇† 𝑥
f-­‐GAN:	
  Training	
  Generative	
  Neural	
  Sampler	
  using	
  Variational Divergence	
  Minimization,	
  NIPS2016
Turns	
  out:	
  GAN	
  is	
  a	
  specific	
  case	
  of	
  f-­‐divergence
• GAN:
m𝑖𝑛
"
max
†
𝐸*~„ log 𝐷† 𝑥 − 𝐸*~…‰
log(1 − 𝐷† 𝑥 )
• f-­‐GAN:
m𝑖𝑛
"
max
†
𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰
𝑓∗
𝑇† 𝑥
By	
  choosing	
  suitable	
  T	
  and	
  f,	
  f-­‐GAN	
  turns	
  into	
  original	
  GAN	
  (^^)
1-Wasserstein distance (another option)
§ It seeks for a probabilistic coupling 𝛾:
𝑊/ = min
X∈ℙ
L 𝑐 𝑥, 𝑦
𝒳×𝒴
𝛾 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 𝐸 *,^ ~X 𝑐(𝑥, 𝑦)
Where ℙ = {𝛾 ≥ 0, ∫ 𝛾 𝑥, 𝑦 𝑑𝑦 = 𝑝𝒴
, ∫ 𝛾 𝑥, 𝑦 𝑑𝑥 = 𝑞𝒳
}
𝑐 𝑥, 𝑦 is the displacement cost from x to y (e.g. Euclidean distance)
§ a.k.a Earth mover distance
§ Can be formulated as Linear Programming (convex)
Kantarovich’s formulation of OT
§ In case of discrete input
𝑝 = ( 𝑎- 𝛿*+
4
-./
, 𝑞 = ( 𝑏“ 𝛿^”
,
“./
§ Couplings:
ℙ = {𝑃 ≥ 0, 𝑃 ∈ ℝ4×,
, 𝑃1, = 𝑎, 𝑃•
14 = 𝑏}
§ LP problem: find P
𝑃 = argmin
…∈ℙ
< 𝑃, 𝐶 >
Where C is cost matrix, i.e. 𝐶-“ = 𝑐(𝑥-, 𝑦“)
Why OT is better than KL and JS divergences?
§ OT provides a smooth measure and
more useful than KL and JS
§ Example:
How to apply 1-Wassertain distance to GAN?
𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓
X~Z([,)
𝐸 *,^ ~X 𝑥 − 𝑦
= inf
X
< 𝐶, 𝛾 >
s.t. š
∑ 𝛾-“ = 𝑝-,,
“./ 	
   𝑖 = 1, 𝑚
∑ 𝛾-“ = 𝑞“,4
-./ 𝑗 = 1, 𝑛
min 𝑐• 𝑥
s.t. 	
  	
  	
  	
  	
  	
  	
   𝐴 𝑥 = 𝑏
𝑥 ≥ 0
m𝑎𝑥 𝑏• 𝑦
s.t. 	
  	
  	
  	
  	
  	
  	
   𝐴• 𝑦 ≤ 𝑐
Primal Dual
𝑐 = 𝑣𝑒𝑐 𝐶 ∈ ℝ4×,
𝑥 = 𝑣𝑒𝑐 𝛾 ∈ ℝ4×,
𝑏•
= [𝑝•
, 𝑞•
]•
∈ ℝ4u,
max 𝑓•
𝑝 + 𝑔•
𝑞
𝑠. 𝑡. 	
   𝑓- + 𝑔“ ≤ 𝐶-“, 𝑖 = 1, . . , 𝑚; 𝑗 = 1 … 𝑛
It	
  easy	
  to	
  see	
  that	
  	
   𝑓-= −𝑔- ,	
  so:	
  | 𝑓- − 𝑓“| ≤1|𝑥- − 𝑦“|
𝒲 U 𝑝, 𝑞 = 𝑠𝑢𝑝
||w||¥¦/
𝐸*~[ 𝑓 𝑥 − 𝐸*~ 𝑓(𝑥)
(Kantorovich-­‐Rubinstein	
  duality)
Training	
  WGAN
In	
  WGAN,	
  replace	
  discrimimator with	
   𝑓 and	
  minimize	
  1-­‐Wasserstain	
  distance:
min
"
𝒲 U 𝑝, 𝑞" = 𝑠𝑢𝑝
||†||§¦/
𝐸*~[ 𝑓† 𝑥 − 𝐸q~r(𝑔"(𝑧))
Ref:	
  Wasserstein	
  GAN,	
  ICML2017
Find	
   𝑤
Update	
  	
   𝜃
Thank	
  you	
  for	
  listening

Mais conteúdo relacionado

Mais procurados

A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILN
Tomonari Masada
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30
Ryan White
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
Shane Nicklas
 
Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3
LiloSEA
 

Mais procurados (20)

Formal systems introduction
Formal systems introductionFormal systems introduction
Formal systems introduction
 
Restricted boltzmann machine
Restricted boltzmann machineRestricted boltzmann machine
Restricted boltzmann machine
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Variational AutoEncoder
Variational AutoEncoderVariational AutoEncoder
Variational AutoEncoder
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
A Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILNA Note on the Derivation of the Variational Inference Updates for DILN
A Note on the Derivation of the Variational Inference Updates for DILN
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproducts
 
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensors
 
Product Rules & Amp Laplacian 1
Product Rules & Amp Laplacian 1Product Rules & Amp Laplacian 1
Product Rules & Amp Laplacian 1
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
On uniformly continuous uniform space
On uniformly continuous uniform spaceOn uniformly continuous uniform space
On uniformly continuous uniform space
 
Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3Nikolay Shilov. CSEDays 3
Nikolay Shilov. CSEDays 3
 
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular LatticesIRJET - Some Results on Fuzzy Semi-Super Modular Lattices
IRJET - Some Results on Fuzzy Semi-Super Modular Lattices
 

Semelhante a Metrics for generativemodels

Semelhante a Metrics for generativemodels (20)

Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]Mathematics of nyquist plot [autosaved] [autosaved]
Mathematics of nyquist plot [autosaved] [autosaved]
 
Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
Matrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence SpacesMatrix Transformations on Some Difference Sequence Spaces
Matrix Transformations on Some Difference Sequence Spaces
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017
 
Does Zero-Shot RL Exist
Does Zero-Shot RL ExistDoes Zero-Shot RL Exist
Does Zero-Shot RL Exist
 
Lash
LashLash
Lash
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Basic calculus (ii) recap
Basic calculus (ii) recapBasic calculus (ii) recap
Basic calculus (ii) recap
 
Some properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spacesSome properties of two-fuzzy Nor med spaces
Some properties of two-fuzzy Nor med spaces
 
Integral dalam Bahasa Inggris
Integral dalam Bahasa InggrisIntegral dalam Bahasa Inggris
Integral dalam Bahasa Inggris
 
Annals of Statistics読み回 第一回
Annals of Statistics読み回 第一回Annals of Statistics読み回 第一回
Annals of Statistics読み回 第一回
 
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICSBSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
BSC_COMPUTER _SCIENCE_UNIT-2_DISCRETE MATHEMATICS
 
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix MappingDual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
Dual Spaces of Generalized Cesaro Sequence Space and Related Matrix Mapping
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissection
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Generalized Laplace - Mellin Integral Transformation
Generalized Laplace - Mellin Integral TransformationGeneralized Laplace - Mellin Integral Transformation
Generalized Laplace - Mellin Integral Transformation
 
DISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEMDISCRETE LOGARITHM PROBLEM
DISCRETE LOGARITHM PROBLEM
 

Mais de Dai-Hai Nguyen (8)

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
IBSB tutorial
IBSB tutorialIBSB tutorial
IBSB tutorial
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Hierarchical selection
Hierarchical selectionHierarchical selection
Hierarchical selection
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property prediction
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
 
Seminar
SeminarSeminar
Seminar
 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
 

Último

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

Último (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Metrics for generativemodels

  • 1. Metrics  for  distributions  and  their   applications  for  generative  models   (part  1) Dai  Hai  Nguyen Kyoto  University
  • 2. Learning  generative  models  ? Q 𝑃" Ρ Distance  ( 𝑃",Q)=? 𝑄 = 1 𝑛 ( 𝛿*+ , -./
  • 3. Learning generative models? • Maximum Likelihood Estimation (MLE): Given training samples 𝑥/, 𝑥2,…, 𝑥,, how to learn 𝑝45678 𝑥; 𝜃 from which training samples are likely to be generated 𝜃∗ = 𝑎𝑟𝑔𝑚𝑎𝑥" ( log   𝑝45678(𝑥-; 𝜃) , -./
  • 4. Learning  generative  models? • Likelihood-free model Random  input NEURAL  NETWORK Z~Uniform Generator Output
  • 5. Learning  generative  models  ? Q 𝑃" Ρ Distance  ( 𝑃",Q)=? 𝑄 = 1 𝑛 ( 𝛿*+ , -./
  • 6. How to measure similarity between 𝑝 and 𝑞 ? § Kullback-Leibler (KL) divergence: asymmetric, i.e., 𝐷HI(𝑝| 𝑞 ≠ 𝐷HI(𝑞| 𝑝 𝐷HI(𝑝| 𝑞 = L 𝑝 𝑥 𝑙𝑜𝑔 𝑝(𝑥) 𝑞(𝑥) 𝑑𝑥 § Jensen-shanon (JS) divergence: symmetric 𝐷PQ(𝑝| 𝑞 = 1 2 𝐷HI(𝑝|| 𝑝 + 𝑞 2 ) + 1 2 𝐷HI(𝑞|| 𝑝 + 𝑞 2 ) § Optimal transport (OT): 𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓 X~Z([,) 𝐸 *,^ ~X[||𝑥 − 𝑦||] Where Π(𝑝, 𝑞) is a set of all joint distribution of (X, Y) with marginals 𝑝 and 𝑞
  • 7. Many fundamental problems can be cast as quantifying similarity between two distributions § Maximum likelihood estimation (MLE) is equivalent to minimizing KL divergence Suppose we sample N of 𝑥~𝑝(𝑥|𝜃∗ ) MLE of 𝜃 is 𝜃∗ = argmin " − 1 𝑁 ( log 𝑝 𝑥- 𝜃 = j -./ − Ε*~[ 𝑥 𝜃∗ [log 𝑝 𝑥 𝜃 ] By def of KL divergence: 𝐷HI(𝑝(𝑥|𝜃∗ )| 𝑝 𝑥 𝜃 = Ε*~[ 𝑥 𝜃∗ [log 𝑝 𝑥 𝜃∗ 𝑝 𝑥 𝜃 ] = Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃∗ − Ε*~[ 𝑥 𝜃∗ log 𝑝 𝑥 𝜃
  • 8. Training GAN is equivalent to minimizing JS divergence § GAN has two networks: D and G, which are playing a minimax game min l max n 𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~r(q) log(1 − 𝐷(𝐺(𝑧))) = Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥)) Where 𝑝 𝑥  and 𝑞(𝑥)  is the distributions of fake images and real images, respectively § Fixing G, optimal D can be easily obtained: 𝐷 𝑥 = 𝑞(𝑥) 𝑝 𝑥 + 𝑞(𝑥)
  • 9. Training GAN is equivalent to minimizing JS divergence § GAN has two networks: D and G, which are playing a minimax game min l max n 𝐿 𝐷, 𝐺 = Ε*~(*) log 𝐷 𝑥 + Εq~[(q) log(1 − 𝐷(𝐺(𝑧))) = Ε*~(*) log 𝐷 𝑥 + Ε*~[(*) log(1 − 𝐷(𝑥)) Where 𝑝 𝑥  and 𝑞(𝑥)  is the distribution of fake and real images, respectively § Fixing G, optimal D can be easily obtained by: 𝐷 𝑥 = 𝑝(𝑥) 𝑝 𝑥 + 𝑞(𝑥) And 𝐿 𝐷, 𝐺 = ∫ 𝑞 𝑥 𝑙𝑜𝑔 (*) [ * u(*) 𝑑𝑥 + ∫ 𝑝 𝑥 𝑙𝑜𝑔 [(*) [ * u(*) 𝑑𝑥 = 2𝐷PQ (𝑝| 𝑞 − log4
  • 10. f-­‐divergences • Divergence  between  two  distributions 𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓( 𝑞 𝑥 𝑝 𝑥 )𝑑𝑥 • f:  generator  function,  convex  and  f(1)  =  0 • Every  function  f  has  a  convex  conjugate  f*  such  that: 𝑓 𝑥 = sup ^∈654(w∗) {𝑥𝑦   − 𝑓∗ (𝑦)}
  • 11. f-­‐divergences • Different  generator  f  give  different  divergences
  • 12. Estimating  f-­‐divergences  from  samples 𝐷w(𝑞| 𝑝 = L 𝑝 𝑥 𝑓 𝑞 𝑥 𝑝 𝑥 𝑑𝑥 = L 𝑝 𝑥 sup ~∈654(w∗) {𝑡 𝑞 𝑥 𝑝 𝑥 − 𝑓∗ (𝑡)} 𝑑𝑥 ≥ sup •∈‚ {L 𝑞 𝑥 𝑇 𝑥 𝑑𝑥 −L 𝑝 𝑥 𝑓∗ 𝑇 𝑥 𝑑𝑥} = sup •∈‚ { 𝐸*~„ 𝑇 𝑥 − 𝐸*~… 𝑓∗ 𝑇 𝑥 } Samples  from  PSamples  from  Q Conjugate  function  of  f(x): 𝑓∗ 𝑥 = sup ~∈654(w) {𝑡𝑥 − 𝑓(𝑡)} Some  properties: • 𝑓(𝑥) = sup ~∈654(w∗) {𝑡𝑥 − 𝑓∗ 𝑡 } • 𝑓∗∗ 𝑥 = 𝑓 𝑥 • 𝑓∗ 𝑥 is  always  convec
  • 13. Training  f-­‐divergence  GAN • f-­‐GAN: m𝑖𝑛 " max †   𝐹 𝜃, 𝑤 = 𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰ 𝑓∗ 𝑇† 𝑥 f-­‐GAN:  Training  Generative  Neural  Sampler  using  Variational Divergence  Minimization,  NIPS2016
  • 14. Turns  out:  GAN  is  a  specific  case  of  f-­‐divergence • GAN: m𝑖𝑛 " max † 𝐸*~„ log 𝐷† 𝑥 − 𝐸*~…‰ log(1 − 𝐷† 𝑥 ) • f-­‐GAN: m𝑖𝑛 " max † 𝐸*~„ 𝑇† 𝑥 − 𝐸*~…‰ 𝑓∗ 𝑇† 𝑥 By  choosing  suitable  T  and  f,  f-­‐GAN  turns  into  original  GAN  (^^)
  • 15. 1-Wasserstein distance (another option) § It seeks for a probabilistic coupling 𝛾: 𝑊/ = min X∈ℙ L 𝑐 𝑥, 𝑦 𝒳×𝒴 𝛾 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 𝐸 *,^ ~X 𝑐(𝑥, 𝑦) Where ℙ = {𝛾 ≥ 0, ∫ 𝛾 𝑥, 𝑦 𝑑𝑦 = 𝑝𝒴 , ∫ 𝛾 𝑥, 𝑦 𝑑𝑥 = 𝑞𝒳 } 𝑐 𝑥, 𝑦 is the displacement cost from x to y (e.g. Euclidean distance) § a.k.a Earth mover distance § Can be formulated as Linear Programming (convex)
  • 16. Kantarovich’s formulation of OT § In case of discrete input 𝑝 = ( 𝑎- 𝛿*+ 4 -./ , 𝑞 = ( 𝑏“ 𝛿^” , “./ § Couplings: ℙ = {𝑃 ≥ 0, 𝑃 ∈ ℝ4×, , 𝑃1, = 𝑎, 𝑃• 14 = 𝑏} § LP problem: find P 𝑃 = argmin …∈ℙ < 𝑃, 𝐶 > Where C is cost matrix, i.e. 𝐶-“ = 𝑐(𝑥-, 𝑦“)
  • 17. Why OT is better than KL and JS divergences? § OT provides a smooth measure and more useful than KL and JS § Example:
  • 18. How to apply 1-Wassertain distance to GAN? 𝒲 U 𝑝, 𝑞 = 𝑖𝑛𝑓 X~Z([,) 𝐸 *,^ ~X 𝑥 − 𝑦 = inf X < 𝐶, 𝛾 > s.t. š ∑ 𝛾-“ = 𝑝-,, “./   𝑖 = 1, 𝑚 ∑ 𝛾-“ = 𝑞“,4 -./ 𝑗 = 1, 𝑛 min 𝑐• 𝑥 s.t.               𝐴 𝑥 = 𝑏 𝑥 ≥ 0 m𝑎𝑥 𝑏• 𝑦 s.t.               𝐴• 𝑦 ≤ 𝑐 Primal Dual 𝑐 = 𝑣𝑒𝑐 𝐶 ∈ ℝ4×, 𝑥 = 𝑣𝑒𝑐 𝛾 ∈ ℝ4×, 𝑏• = [𝑝• , 𝑞• ]• ∈ ℝ4u, max 𝑓• 𝑝 + 𝑔• 𝑞 𝑠. 𝑡.   𝑓- + 𝑔“ ≤ 𝐶-“, 𝑖 = 1, . . , 𝑚; 𝑗 = 1 … 𝑛 It  easy  to  see  that     𝑓-= −𝑔- ,  so:  | 𝑓- − 𝑓“| ≤1|𝑥- − 𝑦“| 𝒲 U 𝑝, 𝑞 = 𝑠𝑢𝑝 ||w||¥¦/ 𝐸*~[ 𝑓 𝑥 − 𝐸*~ 𝑓(𝑥) (Kantorovich-­‐Rubinstein  duality)
  • 19. Training  WGAN In  WGAN,  replace  discrimimator with   𝑓 and  minimize  1-­‐Wasserstain  distance: min " 𝒲 U 𝑝, 𝑞" = 𝑠𝑢𝑝 ||†||§¦/ 𝐸*~[ 𝑓† 𝑥 − 𝐸q~r(𝑔"(𝑧)) Ref:  Wasserstein  GAN,  ICML2017 Find   𝑤 Update     𝜃
  • 20. Thank  you  for  listening