SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
A connection Between GAN(Generative Adversarial
Networks) and IRL(Inverse Reinforcement Learning) and
Energy-Based Model
GAN IRL
Mabonki0725
()1
May 16, 2017
Contents
 GAN(Generative Adversial Networks)
IRL(Inverse Reinforcement Learning)
2 / 20
GAN(Generative Adversial Networks)
Generator Discriminater
Generator Discriminator
Disciriminator Generator
Figure: 1 GAN image
3 / 20
IRL(Inverse Reinforcement Learning)
Network
Figure: 2 IRS image
4 / 20
IRL
 
τ −cθ(τ)
pθ(τ) =
1
Z(θ)
exp (−cθ(τ))
cθ(τ) =
t
cθ(xt, ut)
τ =
x1, x2, · · · , xT
u1, u2, · · · , uT
−cθ(τ) cθ(τ)
xt t x
ut t u
τ
5 / 20
IRL
 
τ −cθ(τ)
cθ(τ)
pθ(τ) =
1
Z(θ)
exp (−cθ(τ))
Figure: x=0 =1 x= =0 6 / 20
IRL
IRL
Max Entropy − pθ(log pθ)dpθ
Gausian Proces
Guide Cost Learning
GAN Guid Cost Learning GAN
7 / 20
Guide cost Learning for IRL
Max Entopy Lcost(p) pθ Network
(Cost of IRL)
Lcost(p) = Eτ∼p[− log pθ(τ)] (1)
= Eτ∼p[cθ(τ)] + log Z(θ) (2)
= Eτ∼p[cθ(τ)] + log Eτ∼q
exp(−cθ(τ))
q(τ)
(3)
Max Entropy Z(θ)
q q Lsampler(q)
8 / 20
Guide cost Learning for IRL
cθ Z = exp(cθ(τ))dθ
q(τ) 1
Z exp(−cθ(τ)) KL
Lsampler(q) q(τ) Network
(Sampler of IRL)
Lsampler(q) = KL q(τ)||
1
Z
exp(−cθ(τ)) (4)
= q(τ) log
1
Z exp(−cθ(τ))
q(τ)
dτ (5)
= Eτ∼p[cθ(τ)] + Eτ∼q[log q(τ)] + log Z (6)
Guide cost Learning p q
Lcost(p) pθ
Lsampler(q) q
9 / 20
Guide cost Learning for IRL
q(τ) Importance Sampling Sample
p(τ)
µ ∼
1
2
p(τ) +
1
2
q(τ)
p(τ) ˜p(τ)
GAN Generator p(τ)
(Cost )
Lcost(p) = Eτ∼p[cθ(τ)] + log Eτ∼µ
exp(−cθ(τ))
1
2 ˜p(τ) + 1
2 q(τ)
(7)
10 / 20
GAN disciminator
p(τ) q(τ) GAN
Discriminater D∗
(GAN Discriminater)
D∗
(τ) =
p(τ)
1
2 p(τ) + 1
2 q(τ)
(8)
p(τ)
p(τ) =
1
Z
exp(−cθ(τ))
(GAN Discriminater for θ)
Dθ(τ) =
1
Z exp(−cθ(τ))
1
2Z exp(−cθ(τ)) + 1
2 q(τ)
(9)
11 / 20
GAN disciminator Loss
(Loss of Discrimater)
Ldiscriminator(Dθ) = Eτ∼p[log Dθ(τ)] − Eτ∼p[log(1 − Dθ(τ))] (10)
= Eτ∼p − log
1
Z exp(−cθ(τ))
1
2Z exp(−cθ(τ)) + 1
2 q(τ)
− Eτ∼p − log
q(τ)
1
2Z exp(−cθ(τ)) + 1
2 q(τ)
(11)
Discriminator Network Loss
12 / 20
Estimeate Z
˜µ =
1
2Z
exp(−cθ(τ)) +
1
2
q(τ)
(Discriminater)
Ldiscriminator(Dθ) = Eτ∼p[log Dθ(τ)] − Eτ∼p[log(1 − Dθ(τ))]  
= Eτ∼µ
1
Z exp(−cθ(τ))
˜µ
− Eτ∼q − log
q(τ)
˜µ
!
= log Z + Eτ∼p[cθ(τ)] + Eτ∼p[log ˜µ(τ)]
− Eτ∼q[log q(τ)] + Eτ∼q[log ˜µ(τ)] 
13 / 20
Estimeate Z

Ldiscriminator(Dθ) = log Z + Eτ∼p[cθ(τ)] + Eτ∼p[log ˜µ(τ)]
− Eτ∼q[log q(τ)] + Eτ∼q[log ˜µ(τ)]
Ldiscriminater(Dθ) Z Z
derivative Discriminater wiht z
∂zLdiscriminator(Dθ) =
1
Z
− Eτ∼µ
1
Z2 exp(−cθ(τ))
˜µ
#
∂zLdiscriminator(Dθ) = 0 $
Z = Eτ∼µ
exp(−cθ(τ))
˜µ
%
14 / 20
Derivative Discriminater

Ldiscriminator(Dθ) = log Z + Eτ∼p[cθ(τ)] + Eτ∼p[log ˜µ(τ)]
− Eτ∼q[log q(τ)] + Eτ∼q[log ˜µ(τ)]
Ldiscriminater(Dθ) θ
derivative Discriminater with θ
∂θLdiscriminator(Dθ) = Eτ∼p[∂θcθ(τ)]
− Eτ∼µ
1
Z exp(−cθ(τ)∂θcθ(τ)
˜µ

15 / 20
Derivative IRL cost
(7)
Lcost(θ) = Eτ∼p[cθ(τ)] + log Eτ∼µ
exp(−cθ(τ))
˜µ(τ)
IRL Lcost(θ) θ (17) Z
derivative cost with θ
∂θLcost(θ) = Eτ∼p[∂θcθ(τ)] + ∂θ log Eτ∼µ
exp(−cθ(τ))
˜µ(τ)
(19)
= Eτ∼p[∂θcθ(τ)]
− Eτ∼µ
exp(−cθ(τ))∂θcθ(τ)
˜µ(τ)
/Eτ∼µ
exp(−cθ(τ))
˜µ(τ)
= Eτ∼p[∂θcθ(τ)] − Eτ∼µ
exp(−cθ(τ))∂θcθ(τ)
˜µ(τ)
/Z (20)
(21)
16 / 20
Conclusion IRS cost and GAN discriminator
Derivative IRL cost = Derivative GAN discriminator

∂θLdiscriminator(Dθ) = Eτ∼p[∂θcθ(τ)]
− Eτ∼µ
1
Z exp(−cθ(τ)∂θcθ(τ)
˜µ
∂θLcost(θ) = Eτ∼p[∂θcθ(τ)] − Eτ∼µ
exp(−cθ(τ))∂θcθ(τ)
˜µ(τ)
/Z
= Eτ∼p[∂θcθ(τ)] − Eτ∼µ
1
Z exp(−cθ(τ))∂θcθ(τ)
˜µ(τ)
 
= ∂θLdiscriminator(Dθ)  !
17 / 20
Conclusion IRS sampler and GAN generator
IRL sampler
$
Lsampler(q) = Eτ∼p[cθ(τ)] + Eτ∼q[log q(τ)]
GAN generator = IRS sampler + Constant
Lgenerater(q) = Eτ∼q[log(1 − D(τ)) − log D((τ))]  
= Eτ∼q log
q(τ)
˜µ(τ)
− log
1
Z exp(−cθ(τ))
˜µ(τ)
 #
= Eτ∼q[log q(τ) + log Z + cθ(τ)]  $
= log Z + Eτ∼q[cθ(τ)] + Eτ∼q[log q(τ)]  %
= log Z + Lsampler(q)  
18 / 20
Conclusion
network
IRL Lcost qθ Lsampler q
−cθ(τ)
GAN Lgenerater Ldiscriminator
q(τ) = p(τ)
IRL GAN
IRL GAN ∂θLcost = ∂θLdiscriminator
IRL GAN Lsampler(q) + log Z = Lgenerator(q)
19 / 20
Proposal
GAN
Figure:
pθ(τ) =
1
Z(θ)
exp (−cθ(τ))
Z(θ) cθ
20 / 20

Mais conteúdo relacionado

Mais procurados

ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習
Shota Ishikawa
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
Shuyo Nakatani
 

Mais procurados (20)

強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
 
スパースモデリング入門
スパースモデリング入門スパースモデリング入門
スパースモデリング入門
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
 
機械学習モデルのハイパパラメータ最適化
機械学習モデルのハイパパラメータ最適化機械学習モデルのハイパパラメータ最適化
機械学習モデルのハイパパラメータ最適化
 
[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning[Dl輪読会]introduction of reinforcement learning
[Dl輪読会]introduction of reinforcement learning
 
スパースモデリングによる多次元信号・画像復元
スパースモデリングによる多次元信号・画像復元スパースモデリングによる多次元信号・画像復元
スパースモデリングによる多次元信号・画像復元
 
ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習
 
劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章劣モジュラ最適化と機械学習1章
劣モジュラ最適化と機械学習1章
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
ベイジアンディープニューラルネット
ベイジアンディープニューラルネットベイジアンディープニューラルネット
ベイジアンディープニューラルネット
 
報酬設計と逆強化学習
報酬設計と逆強化学習報酬設計と逆強化学習
報酬設計と逆強化学習
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
Active Learning 入門
Active Learning 入門Active Learning 入門
Active Learning 入門
 
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein Gradients
 
[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展[DL輪読会]近年のエネルギーベースモデルの進展
[DL輪読会]近年のエネルギーベースモデルの進展
 
機械学習による統計的実験計画(ベイズ最適化を中心に)
機械学習による統計的実験計画(ベイズ最適化を中心に)機械学習による統計的実験計画(ベイズ最適化を中心に)
機械学習による統計的実験計画(ベイズ最適化を中心に)
 
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
 
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
 
多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識
 

Semelhante a Irs gan doc

Semelhante a Irs gan doc (20)

Deep IRL by C language
Deep IRL by C languageDeep IRL by C language
Deep IRL by C language
 
4 matched filters and ambiguity functions for radar signals-2
4 matched filters and ambiguity functions for radar signals-24 matched filters and ambiguity functions for radar signals-2
4 matched filters and ambiguity functions for radar signals-2
 
Inessa Gracheva and Andrey Kopylov - Image Processing Algorithms with Struct...
Inessa Gracheva and  Andrey Kopylov - Image Processing Algorithms with Struct...Inessa Gracheva and  Andrey Kopylov - Image Processing Algorithms with Struct...
Inessa Gracheva and Andrey Kopylov - Image Processing Algorithms with Struct...
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 
Interpolation
InterpolationInterpolation
Interpolation
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensors
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
脳の計算論 第3章「リズム活動と位相応答」
脳の計算論 第3章「リズム活動と位相応答」脳の計算論 第3章「リズム活動と位相応答」
脳の計算論 第3章「リズム活動と位相応答」
 
Solution Manual Image Processing for Engineers by Yagle and Ulaby
Solution Manual Image Processing for Engineers by Yagle and UlabySolution Manual Image Processing for Engineers by Yagle and Ulaby
Solution Manual Image Processing for Engineers by Yagle and Ulaby
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Fourier Transform
Fourier TransformFourier Transform
Fourier Transform
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
Tables
TablesTables
Tables
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
 
05_AJMS_199_19_RA.pdf
05_AJMS_199_19_RA.pdf05_AJMS_199_19_RA.pdf
05_AJMS_199_19_RA.pdf
 
05_AJMS_199_19_RA.pdf
05_AJMS_199_19_RA.pdf05_AJMS_199_19_RA.pdf
05_AJMS_199_19_RA.pdf
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)
 

Mais de Masato Nakai

Mais de Masato Nakai (20)

Padoc_presen4R.pdf
Padoc_presen4R.pdfPadoc_presen4R.pdf
Padoc_presen4R.pdf
 
Factor analysis for ml by padoc 6 r
Factor analysis for ml by padoc 6 rFactor analysis for ml by padoc 6 r
Factor analysis for ml by padoc 6 r
 
報酬が殆ど得られない場合の強化学習
報酬が殆ど得られない場合の強化学習報酬が殆ど得られない場合の強化学習
報酬が殆ど得られない場合の強化学習
 
Padocview anonimous2
Padocview anonimous2Padocview anonimous2
Padocview anonimous2
 
presentation for padoc
presentation for padocpresentation for padoc
presentation for padoc
 
Ai neuro science_pdf
Ai neuro science_pdfAi neuro science_pdf
Ai neuro science_pdf
 
Open pose時系列解析7
Open pose時系列解析7Open pose時系列解析7
Open pose時系列解析7
 
Team ai 3
Team ai 3Team ai 3
Team ai 3
 
Semi vae memo (2)
Semi vae memo (2)Semi vae memo (2)
Semi vae memo (2)
 
Open posedoc
Open posedocOpen posedoc
Open posedoc
 
Dr.raios papers
Dr.raios papersDr.raios papers
Dr.raios papers
 
Deep genenergyprobdoc
Deep genenergyprobdocDeep genenergyprobdoc
Deep genenergyprobdoc
 
Semi vae memo (1)
Semi vae memo (1)Semi vae memo (1)
Semi vae memo (1)
 
Ai論文サイト
Ai論文サイトAi論文サイト
Ai論文サイト
 
Vae gan nlp
Vae gan nlpVae gan nlp
Vae gan nlp
 
機械学習の全般について 4
機械学習の全般について 4機械学習の全般について 4
機械学習の全般について 4
 
Word2vecの理論背景
Word2vecの理論背景Word2vecの理論背景
Word2vecの理論背景
 
粒子フィルターによる自動運転
粒子フィルターによる自動運転粒子フィルターによる自動運転
粒子フィルターによる自動運転
 
Icpによる原画像推定
Icpによる原画像推定Icpによる原画像推定
Icpによる原画像推定
 
Siftによる特徴点抽出
Siftによる特徴点抽出Siftによる特徴点抽出
Siftによる特徴点抽出
 

Último

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Último (20)

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 

Irs gan doc

  • 1. A connection Between GAN(Generative Adversarial Networks) and IRL(Inverse Reinforcement Learning) and Energy-Based Model GAN IRL Mabonki0725 ()1 May 16, 2017
  • 2. Contents GAN(Generative Adversial Networks) IRL(Inverse Reinforcement Learning) 2 / 20
  • 3. GAN(Generative Adversial Networks) Generator Discriminater Generator Discriminator Disciriminator Generator Figure: 1 GAN image 3 / 20
  • 5. IRL τ −cθ(τ) pθ(τ) = 1 Z(θ) exp (−cθ(τ)) cθ(τ) = t cθ(xt, ut) τ = x1, x2, · · · , xT u1, u2, · · · , uT −cθ(τ) cθ(τ) xt t x ut t u τ 5 / 20
  • 6. IRL τ −cθ(τ) cθ(τ) pθ(τ) = 1 Z(θ) exp (−cθ(τ)) Figure: x=0 =1 x= =0 6 / 20
  • 7. IRL IRL Max Entropy − pθ(log pθ)dpθ Gausian Proces Guide Cost Learning GAN Guid Cost Learning GAN 7 / 20
  • 8. Guide cost Learning for IRL Max Entopy Lcost(p) pθ Network (Cost of IRL) Lcost(p) = Eτ∼p[− log pθ(τ)] (1) = Eτ∼p[cθ(τ)] + log Z(θ) (2) = Eτ∼p[cθ(τ)] + log Eτ∼q exp(−cθ(τ)) q(τ) (3) Max Entropy Z(θ) q q Lsampler(q) 8 / 20
  • 9. Guide cost Learning for IRL cθ Z = exp(cθ(τ))dθ q(τ) 1 Z exp(−cθ(τ)) KL Lsampler(q) q(τ) Network (Sampler of IRL) Lsampler(q) = KL q(τ)|| 1 Z exp(−cθ(τ)) (4) = q(τ) log 1 Z exp(−cθ(τ)) q(τ) dτ (5) = Eτ∼p[cθ(τ)] + Eτ∼q[log q(τ)] + log Z (6) Guide cost Learning p q Lcost(p) pθ Lsampler(q) q 9 / 20
  • 10. Guide cost Learning for IRL q(τ) Importance Sampling Sample p(τ) µ ∼ 1 2 p(τ) + 1 2 q(τ) p(τ) ˜p(τ) GAN Generator p(τ) (Cost ) Lcost(p) = Eτ∼p[cθ(τ)] + log Eτ∼µ exp(−cθ(τ)) 1 2 ˜p(τ) + 1 2 q(τ) (7) 10 / 20
  • 11. GAN disciminator p(τ) q(τ) GAN Discriminater D∗ (GAN Discriminater) D∗ (τ) = p(τ) 1 2 p(τ) + 1 2 q(τ) (8) p(τ) p(τ) = 1 Z exp(−cθ(τ)) (GAN Discriminater for θ) Dθ(τ) = 1 Z exp(−cθ(τ)) 1 2Z exp(−cθ(τ)) + 1 2 q(τ) (9) 11 / 20
  • 12. GAN disciminator Loss (Loss of Discrimater) Ldiscriminator(Dθ) = Eτ∼p[log Dθ(τ)] − Eτ∼p[log(1 − Dθ(τ))] (10) = Eτ∼p − log 1 Z exp(−cθ(τ)) 1 2Z exp(−cθ(τ)) + 1 2 q(τ) − Eτ∼p − log q(τ) 1 2Z exp(−cθ(τ)) + 1 2 q(τ) (11) Discriminator Network Loss 12 / 20
  • 13. Estimeate Z ˜µ = 1 2Z exp(−cθ(τ)) + 1 2 q(τ) (Discriminater) Ldiscriminator(Dθ) = Eτ∼p[log Dθ(τ)] − Eτ∼p[log(1 − Dθ(τ))] = Eτ∼µ 1 Z exp(−cθ(τ)) ˜µ − Eτ∼q − log q(τ) ˜µ ! = log Z + Eτ∼p[cθ(τ)] + Eτ∼p[log ˜µ(τ)] − Eτ∼q[log q(τ)] + Eτ∼q[log ˜µ(τ)] 13 / 20
  • 14. Estimeate Z Ldiscriminator(Dθ) = log Z + Eτ∼p[cθ(τ)] + Eτ∼p[log ˜µ(τ)] − Eτ∼q[log q(τ)] + Eτ∼q[log ˜µ(τ)] Ldiscriminater(Dθ) Z Z derivative Discriminater wiht z ∂zLdiscriminator(Dθ) = 1 Z − Eτ∼µ 1 Z2 exp(−cθ(τ)) ˜µ # ∂zLdiscriminator(Dθ) = 0 $ Z = Eτ∼µ exp(−cθ(τ)) ˜µ % 14 / 20
  • 15. Derivative Discriminater Ldiscriminator(Dθ) = log Z + Eτ∼p[cθ(τ)] + Eτ∼p[log ˜µ(τ)] − Eτ∼q[log q(τ)] + Eτ∼q[log ˜µ(τ)] Ldiscriminater(Dθ) θ derivative Discriminater with θ ∂θLdiscriminator(Dθ) = Eτ∼p[∂θcθ(τ)] − Eτ∼µ 1 Z exp(−cθ(τ)∂θcθ(τ) ˜µ 15 / 20
  • 16. Derivative IRL cost (7) Lcost(θ) = Eτ∼p[cθ(τ)] + log Eτ∼µ exp(−cθ(τ)) ˜µ(τ) IRL Lcost(θ) θ (17) Z derivative cost with θ ∂θLcost(θ) = Eτ∼p[∂θcθ(τ)] + ∂θ log Eτ∼µ exp(−cθ(τ)) ˜µ(τ) (19) = Eτ∼p[∂θcθ(τ)] − Eτ∼µ exp(−cθ(τ))∂θcθ(τ) ˜µ(τ) /Eτ∼µ exp(−cθ(τ)) ˜µ(τ) = Eτ∼p[∂θcθ(τ)] − Eτ∼µ exp(−cθ(τ))∂θcθ(τ) ˜µ(τ) /Z (20) (21) 16 / 20
  • 17. Conclusion IRS cost and GAN discriminator Derivative IRL cost = Derivative GAN discriminator ∂θLdiscriminator(Dθ) = Eτ∼p[∂θcθ(τ)] − Eτ∼µ 1 Z exp(−cθ(τ)∂θcθ(τ) ˜µ ∂θLcost(θ) = Eτ∼p[∂θcθ(τ)] − Eτ∼µ exp(−cθ(τ))∂θcθ(τ) ˜µ(τ) /Z = Eτ∼p[∂θcθ(τ)] − Eτ∼µ 1 Z exp(−cθ(τ))∂θcθ(τ) ˜µ(τ) = ∂θLdiscriminator(Dθ) ! 17 / 20
  • 18. Conclusion IRS sampler and GAN generator IRL sampler $ Lsampler(q) = Eτ∼p[cθ(τ)] + Eτ∼q[log q(τ)] GAN generator = IRS sampler + Constant Lgenerater(q) = Eτ∼q[log(1 − D(τ)) − log D((τ))] = Eτ∼q log q(τ) ˜µ(τ) − log 1 Z exp(−cθ(τ)) ˜µ(τ) # = Eτ∼q[log q(τ) + log Z + cθ(τ)] $ = log Z + Eτ∼q[cθ(τ)] + Eτ∼q[log q(τ)] % = log Z + Lsampler(q) 18 / 20
  • 19. Conclusion network IRL Lcost qθ Lsampler q −cθ(τ) GAN Lgenerater Ldiscriminator q(τ) = p(τ) IRL GAN IRL GAN ∂θLcost = ∂θLdiscriminator IRL GAN Lsampler(q) + log Z = Lgenerator(q) 19 / 20