[DL輪読会]`強化学習のための状態表現学習－より良い「世界モデル」の獲得に向けて－

•

11 gostaram•5,326 visualizações

Deep Learning JP

2018/10/26 Deep Learning JP: http://deeplearning.jp/seminar-2/

Tecnologia

1
Tatsuya Matsushima @__tmats__ , Matsuo Lab

•
SRL
6
at ∈ 𝒜
ot ∈ 𝒪
at
ot ot+1
˜st ˜st+1˜st ∈ ˜𝒮
˜st ∈ ˜𝒮 st ∈ 𝒮
o1:t st st = ϕ (o1:t)

SRL
•
•
•
8
st
ϕ ϕ−1
st = ϕ (ot; θϕ)
̂ot = ϕ−1
(st; θϕ−1
)

SRL
•
•
•
9
̂st+1 = f (st, at; θfwd)
st = ϕ (ot; θϕ)
ϕ
st at st+1f

SRL
•
•
10
st st+1 at
ϕ at
st = ϕ (ot; θϕ)
̂at = g (st, st+1; θinv)

SRL
•
•
•
•
11
Loss = ℒprior (s1:n; θϕ |c)
s1:nc
 
st = ϕ (ot; θϕ)

E2C [Watter+ 2015]
•
•  
•  
18
st
̂st+1 ∼ 𝒩 (μ = Wst + Uat + V, σ)
̂st+1
st+1

ICM [Pathak+ 2017]
•
•
•
 
21
ℒfwd (
̂ϕ (ot+1), ̂f (
̂ϕ (ot), at))
=
1
2
̂f (
̂ϕ (ot), at) − ̂ϕ (ot+1)
2
2
ℒfwd
min
θP,θI,θF
[−λ𝔼π(st; θP) [Σtrt] + (1 − β)ℒinv + βℒfwd]

•
•
22
min
G,Q,ℳ
max
D
V(G, D) − λIVLB(G, Q)

•
•  
•
•  
 
•
•  
 
24
ℒSlowness(D, ϕ) = 𝔼 [ Δst
2
]
ℒVariabilty(D, ϕ) = 𝔼 [e− st1 − st2
]

•
•
•  
 
•
•  
25
ℒProp(D, ϕ) = 𝔼
[( Δst2
− Δst1 )
2
|at1
= at2]
ℒRep(D, ϕ) = 𝔼
[
e
− st2
− st1
2
Δst2
− Δst1
2
|at1
= at2]

•
26
/  
※
 
E2C 
[Watter+ 2015]
✔ ✔ ✔ ✔
World Model 
[Ha+ 2018]
✔ ✔ ✔
ICM 
[Pathak+ 2017]
✔ ✔ ✔
Causal InfoGAN 
[Kurutach+ 2018]
✔ ✔ ✔ ✔
VPN 
[Oh+ 2017]
✔ ✔
Robotic Priors 
[Jonschkowski+ 2015]
✔ ✔

•
•
•
•
•
27
Robotic Priors 
[Jon-schkowski+ 2015]
slot car racing 16×16×3 2 (25)
E2C 
[Watter+ 2015]
cart-pole 80×80×3 8
ICM 
[Pathak+ 2017]
Mario Bros. 42×42×3 2 (14)

•
•
•
•
•
•
•
28
KNN − MSE(s) =
1
k ∑
s′∈KNN(s,k)
˜s − ˜s′
2

Mais conteúdo relacionado

Mais procurados

強化学習アルゴリズムPPOの解説と実験克海納谷

[DL輪読会] マルチエージェント強化学習と心の理論Deep Learning JP

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...Deep Learning JP

Control as Inference (強化学習とベイズ統計)Shohei Taniguchi

[DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.Deep Learning JP

深層生成モデルと世界モデル（2020/11/20版）Masahiro Suzuki

SSII2021 [OS2-01] 転移学習の基礎：異なるタスクの知識を利用するための機械学習の方法SSII

方策勾配型強化学習の基礎と応用Ryo Iwaki

多様な強化学習の概念と課題認識佑甲野

[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence ModelingDeep Learning JP

[DL輪読会]相互情報量最大化による表現学習Deep Learning JP

SSII2021 [TS2] 深層強化学習〜強化学習の基礎から応用まで〜SSII

強化学習 DQNからPPOまでharmonylab

強化学習における好奇心Shota Imai

[DL輪読会]Flow-based Deep Generative ModelsDeep Learning JP

【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP

確率的推論と行動選択Masahiro Suzuki

ELBO型VAEのダメなところKCS Keio Computer Society

ゼロから始める深層強化学習（NLP2018講演資料）/ Introduction of Deep Reinforcement LearningPreferred Networks

[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP

Mais procurados (20)

強化学習アルゴリズムPPOの解説と実験

[DL輪読会] マルチエージェント強化学習と心の理論

[DL輪読会]“SimPLe”,“Improved Dynamics Model”,“PlaNet” 近年のVAEベース系列モデルの進展とそのモデルベース...

Control as Inference (強化学習とベイズ統計)

[DL輪読会]深層強化学習はなぜ難しいのか？Why Deep RL fails? A brief survey of recent works.

深層生成モデルと世界モデル（2020/11/20版）

SSII2021 [OS2-01] 転移学習の基礎：異なるタスクの知識を利用するための機械学習の方法

方策勾配型強化学習の基礎と応用

多様な強化学習の概念と課題認識

[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling

[DL輪読会]相互情報量最大化による表現学習

SSII2021 [TS2] 深層強化学習〜強化学習の基礎から応用まで〜

強化学習 DQNからPPOまで

強化学習における好奇心

[DL輪読会]Flow-based Deep Generative Models

【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model

確率的推論と行動選択

ELBO型VAEのダメなところ

ゼロから始める深層強化学習（NLP2018講演資料）/ Introduction of Deep Reinforcement Learning

[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder

Mais de Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP

【DL輪読会】事前学習用データセットについてDeep Learning JP

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP

【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP

【DL輪読会】マルチモーダル LLMDeep Learning JP

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP

【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP

【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP

【DL輪読会】Hopfield network　関連研究についてDeep Learning JP

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP

Mais de Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

【DL輪読会】事前学習用データセットについて

【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...

【DL輪読会】Zero-Shot Dual-Lens Super-Resolution

【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv

【DL輪読会】マルチモーダル LLM

【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...

【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition

【DL輪読会】Can Neural Network Memorization Be Localized?

【DL輪読会】Hopfield network　関連研究について

【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )

【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...

【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"

【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "

【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models

【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"

【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...

【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...

【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...

【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

Último

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

CloudStudio User manual (basic edition):comworks

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Commit 2024 - Secret Management made easyAlfredo García Lavilla

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

From Family Reminiscence to Scholarly Archive .Alan Dix

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

[DL輪読会]`強化学習のための状態表現学習－より良い「世界モデル」の獲得に向けて－

1. 1 Tatsuya Matsushima @__tmats__ , Matsuo Lab

2. • • • • • • • 2

3. (SRL) • • • • • • 3

4. • • •   • • •   4

5. • • • • • 5

6. • SRL 6 at ∈ 𝒜 ot ∈ 𝒪 at ot ot+1 ˜st ˜st+1˜st ∈ ˜𝒮 ˜st ∈ ˜𝒮 st ∈ 𝒮 o1:t st st = ϕ (o1:t)

7. SRL • • • • 7

8. SRL • • • 8 st ϕ ϕ−1 st = ϕ (ot; θϕ) ̂ot = ϕ−1 (st; θϕ−1 )

9. SRL • • • 9 ̂st+1 = f (st, at; θfwd) st = ϕ (ot; θϕ) ϕ st at st+1f

10. SRL • • 10 st st+1 at ϕ at st = ϕ (ot; θϕ) ̂at = g (st, st+1; θinv)  

11. SRL • • • • 11 Loss = ℒprior (s1:n; θϕ |c) s1:nc   st = ϕ (ot; θϕ)

12. • • • • • Why SRL? 12

13. 13

14. • • • • 14

15. • • • • • • • 15

16. • • • • • • • • 16

17. • • • 17 ̂st+1 = Wst + Uat + V

18. E2C [Watter+ 2015] • •   •   18 st ̂st+1 ∼ 𝒩 (μ = Wst + Uat + V, σ) ̂st+1 st+1

19. • •   •     World Model [Ha+ 2018] 19

20. • • • 20 ltθt pt

21. ICM [Pathak+ 2017] • • •   21 ℒfwd ( ̂ϕ (ot+1), ̂f ( ̂ϕ (ot), at)) = 1 2 ̂f ( ̂ϕ (ot), at) − ̂ϕ (ot+1) 2 2 ℒfwd min θP,θI,θF [−λ𝔼π(st; θP) [Σtrt] + (1 − β)ℒinv + βℒfwd]

22. • • 22 min G,Q,ℳ max D V(G, D) − λIVLB(G, Q)

23. • • • 23  

24. • •   • •     • •     24 ℒSlowness(D, ϕ) = 𝔼 [ Δst 2 ] ℒVariabilty(D, ϕ) = 𝔼 [e− st1 − st2 ]

25. • • •     • •   25 ℒProp(D, ϕ) = 𝔼 [( Δst2 − Δst1 ) 2 |at1 = at2] ℒRep(D, ϕ) = 𝔼 [ e − st2 − st1 2 Δst2 − Δst1 2 |at1 = at2]

26. • 26 /   ※   E2C  [Watter+ 2015] ✔ ✔ ✔ ✔ World Model  [Ha+ 2018] ✔ ✔ ✔ ICM  [Pathak+ 2017] ✔ ✔ ✔ Causal InfoGAN  [Kurutach+ 2018] ✔ ✔ ✔ ✔ VPN  [Oh+ 2017] ✔ ✔ Robotic Priors  [Jonschkowski+ 2015] ✔ ✔

27. • • • • • 27 Robotic Priors  [Jon-schkowski+ 2015] slot car racing 16×16×3 2 (25) E2C  [Watter+ 2015] cart-pole 80×80×3 8 ICM  [Pathak+ 2017] Mario Bros. 42×42×3 2 (14)

28. • • • • • • • 28 KNN − MSE(s) = 1 k ∑ s′∈KNN(s,k) ˜s − ˜s′ 2

29. • • • • • • 29

30. 30

31. • • • • • 31

32. • • • • • • 32

33. S-RL Toolbox • • • • • • • • • • 33

34. 34

36. Appendix 36