SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
Maximum Entrophy Deep Inverse Reinforcement Learning
mabonki0725
()1
January 3, 2018
IRL(Inverse Reinforcement Learning)
Network
Figure: 2 IRS image
2 / 9
IRL
 
τ cτ(θ)
pθ(τ) =
1
Z(θ)
exp (−cτ (θ))
rτ (θ) = −cτ (θ)
rτ (θ) =
1
N
N
t=0
rst,at (θ)
τ =
s1, s2, · · · , sN
a1, a2, · · · , aN
cτ (θ) = −rτ (θ)
st t s
at t a
τ
3 / 9
IRL
 
τ cτ (θ)
rτ (θ) = −cτ (θ)
pθ(τ) =
1
Z(θ)
exp (−cτ (θ)) =
1
Z(θ)
exp(rτ (θ))
Figure: x=0 =1 x= =0
4 / 9
IRL
L(θ) = log p(τ(s, a)|c) = log
1
Z(θ)
exp

 1
N
N
s,a
rs,a(θ)

 
=
1
N
N
s,a
rs,a(θ) − log Z(θ)  
!
θ
∂L(θ)
∂θ
=

 1
N s,a
∂
∂θ
rs,a(θ)

 −
∂
∂θ
log Z(θ) 
log Z(θ)
log Z(θ) =
θ
1
N
log
s,a
exp rs,a(θ)p(θ) =
θ
1
N s,a
rs,a(θ)p(θ) #
∂
∂θ
log Z(θ) =
θ
1
N s,a
∂
∂θ
rs,a(θ)p(θ) = Eθ
1
N s,a
∂
∂θ
rs,a(θ) $
5 / 9
IRL
rs,a(θ) = θT f(s, a) θ f(s, a)
∂
∂θ
rs,a(θi) = f(s, a) %
∂L(θ)
∂θ
=

 1
N s,a
f(s, a)

 −

 1
N s,a
Eθf(s, a)

 
= Es,af(s, a) − Es,a[ ˆf(s, a)] ˆf(s, a) = Eθf(s, a) '
θ
6 / 9
Open-AI Gridword.py
Gridword
Gridword.py
Entrophy
16 16
Figure: 16Cell GridWorld Puseudo Rewards BY SGD
7 / 9
5/,
5/,
reward+ = α ∗ ∂L
∂θ
Figure: 16Cell GridWorld Puseud Rewards BY SGD
8 / 9
Deep Neural Net
Network.grad = ∂L
∂θ
Figure: 16Cell GridWorld Puseud Rewards BY DNN
9 / 9

Mais conteúdo relacionado

Mais procurados

Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationAlexander Litvinenko
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryHidenoriOgata
 
Symbolic Regression on Network Properties
Symbolic Regression on Network PropertiesSymbolic Regression on Network Properties
Symbolic Regression on Network PropertiesMarcus Märtens
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator SplittingFabian Pedregosa
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
SPSF02 - Graphical Data Representation
SPSF02 - Graphical Data RepresentationSPSF02 - Graphical Data Representation
SPSF02 - Graphical Data RepresentationSyeilendra Pramuditya
 
A successful maximum likelihood parameter estimation in skewed distributions ...
A successful maximum likelihood parameter estimation in skewed distributions ...A successful maximum likelihood parameter estimation in skewed distributions ...
A successful maximum likelihood parameter estimation in skewed distributions ...Hideo Hirose
 
Hideitsu Hino
Hideitsu HinoHideitsu Hino
Hideitsu HinoSuurist
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao MatsutaSuurist
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki ShiokawaSuurist
 
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser BootsmaDSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser BootsmaDeltares
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk TomasSri Ambati
 

Mais procurados (18)

Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theory
 
2.6 homework
2.6 homework2.6 homework
2.6 homework
 
Derivatives in Multi
Derivatives in MultiDerivatives in Multi
Derivatives in Multi
 
303
303303
303
 
Symbolic Regression on Network Properties
Symbolic Regression on Network PropertiesSymbolic Regression on Network Properties
Symbolic Regression on Network Properties
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator Splitting
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
SPSF02 - Graphical Data Representation
SPSF02 - Graphical Data RepresentationSPSF02 - Graphical Data Representation
SPSF02 - Graphical Data Representation
 
SPSF03 - Numerical Integrations
SPSF03 - Numerical IntegrationsSPSF03 - Numerical Integrations
SPSF03 - Numerical Integrations
 
A successful maximum likelihood parameter estimation in skewed distributions ...
A successful maximum likelihood parameter estimation in skewed distributions ...A successful maximum likelihood parameter estimation in skewed distributions ...
A successful maximum likelihood parameter estimation in skewed distributions ...
 
Hideitsu Hino
Hideitsu HinoHideitsu Hino
Hideitsu Hino
 
002 ray modeling dynamic systems
002 ray modeling dynamic systems002 ray modeling dynamic systems
002 ray modeling dynamic systems
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao Matsuta
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
 
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser BootsmaDSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk Tomas
 
Calculus III
Calculus IIICalculus III
Calculus III
 

Semelhante a Deep IRL by C language

Inessa Gracheva and Andrey Kopylov - Image Processing Algorithms with Struct...
Inessa Gracheva and  Andrey Kopylov - Image Processing Algorithms with Struct...Inessa Gracheva and  Andrey Kopylov - Image Processing Algorithms with Struct...
Inessa Gracheva and Andrey Kopylov - Image Processing Algorithms with Struct...AIST
 
ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-
ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-
ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-ssusere0a682
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6NTNU
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsFrancesco Tudisco
 
Finding self-force quantities in a post-Newtonian expansion
Finding self-force quantities in a post-Newtonian expansionFinding self-force quantities in a post-Newtonian expansion
Finding self-force quantities in a post-Newtonian expansionLisa Erkens
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
Hybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity MarketHybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity Markettomoyukiichiba
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...AIST
 
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Hayato Watanabe
 
Ejercicio de fasores
Ejercicio de fasoresEjercicio de fasores
Ejercicio de fasoresdpancheins
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 
Contemporary communication systems 1st edition mesiya solutions manual
Contemporary communication systems 1st edition mesiya solutions manualContemporary communication systems 1st edition mesiya solutions manual
Contemporary communication systems 1st edition mesiya solutions manualto2001
 
Fuzzy calculation
Fuzzy calculationFuzzy calculation
Fuzzy calculationAmir Rafati
 

Semelhante a Deep IRL by C language (20)

Inessa Gracheva and Andrey Kopylov - Image Processing Algorithms with Struct...
Inessa Gracheva and  Andrey Kopylov - Image Processing Algorithms with Struct...Inessa Gracheva and  Andrey Kopylov - Image Processing Algorithms with Struct...
Inessa Gracheva and Andrey Kopylov - Image Processing Algorithms with Struct...
 
ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-
ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-
ゲーム理論BASIC 演習58 -有限回繰り返しゲームにおける部分ゲーム完全均衡-
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6
 
A new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensorsA new Perron-Frobenius theorem for nonnegative tensors
A new Perron-Frobenius theorem for nonnegative tensors
 
Finding self-force quantities in a post-Newtonian expansion
Finding self-force quantities in a post-Newtonian expansionFinding self-force quantities in a post-Newtonian expansion
Finding self-force quantities in a post-Newtonian expansion
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Hybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity MarketHybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity Market
 
Conference ppt
Conference pptConference ppt
Conference ppt
 
Statistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient DescentStatistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient Descent
 
Statistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient DescentStatistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient Descent
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Beamer 4.pdf
Beamer 4.pdfBeamer 4.pdf
Beamer 4.pdf
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...
Gracheva Inessa - Fast Global Image Denoising Algorithm on the Basis of Nonst...
 
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
 
Ejercicio de fasores
Ejercicio de fasoresEjercicio de fasores
Ejercicio de fasores
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Contemporary communication systems 1st edition mesiya solutions manual
Contemporary communication systems 1st edition mesiya solutions manualContemporary communication systems 1st edition mesiya solutions manual
Contemporary communication systems 1st edition mesiya solutions manual
 
Fuzzy calculation
Fuzzy calculationFuzzy calculation
Fuzzy calculation
 

Mais de Masato Nakai

Padoc_presen4R.pdf
Padoc_presen4R.pdfPadoc_presen4R.pdf
Padoc_presen4R.pdfMasato Nakai
 
Factor analysis for ml by padoc 6 r
Factor analysis for ml by padoc 6 rFactor analysis for ml by padoc 6 r
Factor analysis for ml by padoc 6 rMasato Nakai
 
報酬が殆ど得られない場合の強化学習
報酬が殆ど得られない場合の強化学習報酬が殆ど得られない場合の強化学習
報酬が殆ど得られない場合の強化学習Masato Nakai
 
Padocview anonimous2
Padocview anonimous2Padocview anonimous2
Padocview anonimous2Masato Nakai
 
presentation for padoc
presentation for padocpresentation for padoc
presentation for padocMasato Nakai
 
Ai neuro science_pdf
Ai neuro science_pdfAi neuro science_pdf
Ai neuro science_pdfMasato Nakai
 
Open pose時系列解析7
Open pose時系列解析7Open pose時系列解析7
Open pose時系列解析7Masato Nakai
 
機械学習の全般について 4
機械学習の全般について 4機械学習の全般について 4
機械学習の全般について 4Masato Nakai
 
Word2vecの理論背景
Word2vecの理論背景Word2vecの理論背景
Word2vecの理論背景Masato Nakai
 
粒子フィルターによる自動運転
粒子フィルターによる自動運転粒子フィルターによる自動運転
粒子フィルターによる自動運転Masato Nakai
 
Icpによる原画像推定
Icpによる原画像推定Icpによる原画像推定
Icpによる原画像推定Masato Nakai
 
Siftによる特徴点抽出
Siftによる特徴点抽出Siftによる特徴点抽出
Siftによる特徴点抽出Masato Nakai
 
End to end training with deep visiomotor
End to end training with deep visiomotorEnd to end training with deep visiomotor
End to end training with deep visiomotorMasato Nakai
 
機械学習の全般について
機械学習の全般について機械学習の全般について
機械学習の全般についてMasato Nakai
 
強化学習の汎用化Ai
強化学習の汎用化Ai強化学習の汎用化Ai
強化学習の汎用化AiMasato Nakai
 

Mais de Masato Nakai (20)

Padoc_presen4R.pdf
Padoc_presen4R.pdfPadoc_presen4R.pdf
Padoc_presen4R.pdf
 
Factor analysis for ml by padoc 6 r
Factor analysis for ml by padoc 6 rFactor analysis for ml by padoc 6 r
Factor analysis for ml by padoc 6 r
 
報酬が殆ど得られない場合の強化学習
報酬が殆ど得られない場合の強化学習報酬が殆ど得られない場合の強化学習
報酬が殆ど得られない場合の強化学習
 
Padocview anonimous2
Padocview anonimous2Padocview anonimous2
Padocview anonimous2
 
presentation for padoc
presentation for padocpresentation for padoc
presentation for padoc
 
Ai neuro science_pdf
Ai neuro science_pdfAi neuro science_pdf
Ai neuro science_pdf
 
Open pose時系列解析7
Open pose時系列解析7Open pose時系列解析7
Open pose時系列解析7
 
Team ai 3
Team ai 3Team ai 3
Team ai 3
 
Open posedoc
Open posedocOpen posedoc
Open posedoc
 
Dr.raios papers
Dr.raios papersDr.raios papers
Dr.raios papers
 
Ai論文サイト
Ai論文サイトAi論文サイト
Ai論文サイト
 
Vae gan nlp
Vae gan nlpVae gan nlp
Vae gan nlp
 
機械学習の全般について 4
機械学習の全般について 4機械学習の全般について 4
機械学習の全般について 4
 
Word2vecの理論背景
Word2vecの理論背景Word2vecの理論背景
Word2vecの理論背景
 
粒子フィルターによる自動運転
粒子フィルターによる自動運転粒子フィルターによる自動運転
粒子フィルターによる自動運転
 
Icpによる原画像推定
Icpによる原画像推定Icpによる原画像推定
Icpによる原画像推定
 
Siftによる特徴点抽出
Siftによる特徴点抽出Siftによる特徴点抽出
Siftによる特徴点抽出
 
End to end training with deep visiomotor
End to end training with deep visiomotorEnd to end training with deep visiomotor
End to end training with deep visiomotor
 
機械学習の全般について
機械学習の全般について機械学習の全般について
機械学習の全般について
 
強化学習の汎用化Ai
強化学習の汎用化Ai強化学習の汎用化Ai
強化学習の汎用化Ai
 

Último

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 

Último (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 

Deep IRL by C language

  • 1. Maximum Entrophy Deep Inverse Reinforcement Learning mabonki0725 ()1 January 3, 2018
  • 3. IRL τ cτ(θ) pθ(τ) = 1 Z(θ) exp (−cτ (θ)) rτ (θ) = −cτ (θ) rτ (θ) = 1 N N t=0 rst,at (θ) τ = s1, s2, · · · , sN a1, a2, · · · , aN cτ (θ) = −rτ (θ) st t s at t a τ 3 / 9
  • 4. IRL τ cτ (θ) rτ (θ) = −cτ (θ) pθ(τ) = 1 Z(θ) exp (−cτ (θ)) = 1 Z(θ) exp(rτ (θ)) Figure: x=0 =1 x= =0 4 / 9
  • 5. IRL L(θ) = log p(τ(s, a)|c) = log 1 Z(θ) exp   1 N N s,a rs,a(θ)   = 1 N N s,a rs,a(θ) − log Z(θ) ! θ ∂L(θ) ∂θ =   1 N s,a ∂ ∂θ rs,a(θ)   − ∂ ∂θ log Z(θ) log Z(θ) log Z(θ) = θ 1 N log s,a exp rs,a(θ)p(θ) = θ 1 N s,a rs,a(θ)p(θ) # ∂ ∂θ log Z(θ) = θ 1 N s,a ∂ ∂θ rs,a(θ)p(θ) = Eθ 1 N s,a ∂ ∂θ rs,a(θ) $ 5 / 9
  • 6. IRL rs,a(θ) = θT f(s, a) θ f(s, a) ∂ ∂θ rs,a(θi) = f(s, a) % ∂L(θ) ∂θ =   1 N s,a f(s, a)   −   1 N s,a Eθf(s, a)   = Es,af(s, a) − Es,a[ ˆf(s, a)] ˆf(s, a) = Eθf(s, a) ' θ 6 / 9
  • 7. Open-AI Gridword.py Gridword Gridword.py Entrophy 16 16 Figure: 16Cell GridWorld Puseudo Rewards BY SGD 7 / 9
  • 8. 5/, 5/, reward+ = α ∗ ∂L ∂θ Figure: 16Cell GridWorld Puseud Rewards BY SGD 8 / 9
  • 9. Deep Neural Net Network.grad = ∂L ∂θ Figure: 16Cell GridWorld Puseud Rewards BY DNN 9 / 9