SlideShare a Scribd company logo
1 of 25
Dream To Control:
Learning Behaviors by Latent Imagination
LEE, DOHYEON
leadh991114@gmail.com
4/18/2023 딥논읽 세미나 - 강화학습
ICLR 2020 (Oral)
NeurIPS Deep RL Workshop 2019 (Oral)
Contents
1. Introduction
2. Methods
3. Experiments
4. Conclusion
4/18/2023 딥논읽 세미나 - 강화학습 1
1. RL Comparison
4/18/2023 딥논읽 세미나 - 강화학습 2
INDEX
Introduction
Methods
Performance
Conclusion
Model-Free RL
• No Model
• Learn value function(and/or policy) from real experience
Model–Based RL
• Learn a model from real experience
• Plan value function(and/or policy) from the simulated experience
RL Comparison, from slides of Sergey Levine
2. World Model
4/18/2023 딥논읽 세미나 - 강화학습 3
INDEX
Introduction
Methods
Performance
Conclusion
“Intelligent agents can achieve goals in complex environments
even through they never encouter the exact same situation twice.”
“This ability requires building representations of the world from past
experience that enable generalization to novel situations.”
“World models offer an explicit way to represent an agent’s knowledge
about the world in a parametric model that can make predictions about the
future”
A World Model, from Scott McCloud’s Understanding Comics.
3. Visual Control
4/18/2023 딥논읽 세미나 - 강화학습 4
INDEX
Introduction
Methods
Performance
Conclusion
“Sensory inputs are high-dimensional images, latent dynamic models can
abstract observations to predict forward in compact state spaces.”
→ latent states have a small memory footprint
“Behaviors can be derived from dynamic models in many ways.”
→ Considering only rewards within a fixed imagination horizon results in shortsighted behaviors
→ Prior work commonly resorts to derivative-free optimization for robustness
4. PlaNet
4/18/2023 딥논읽 세미나 - 강화학습 5
INDEX
Introduction
Methods
Performance
Conclusion
An RL agent that learns the environment dynamics from
images and chooses actions through fast online planning in
latent space.
Learning Latent Dynamics for Planning from Pixels(Danijar Hafner et al., 2019)
5. Dreamer
4/18/2023 딥논읽 세미나 - 강화학습 6
INDEX
Introduction
Methods
Performance
Conclusion
An RL agent that learns long-horizontal behaviors from
images purely by latent imagination.
The three processes of the Dreamer agent.
1. The world model is learned from past experience.
2. From predictions of this model, the agent then learns a value network
to predict future rewards and an actor network to select actions.
3. The actor network is used to interact with the environment.
QnA
4/18/2023 딥논읽 세미나 - 강화학습 7
INDEX
Introduction
Methods
Performance
Conclusion
1. Learning the World Model
4/18/2023 딥논읽 세미나 - 강화학습 8
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns a world model from experience. Using past images 𝑜1 ~ 𝑜3 and actions 𝑎1 ~
𝑎2, it computes a sequence of compact model states (green circles) from which it reconstructs
the images 𝑜1 ~ 𝑜3 and predicts the rewards 𝑟1 ~ 𝑟3. → Leveraging PlaNet
2. Learning Behavior in Imagination
4/18/2023 딥논읽 세미나 - 강화학습 9
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns long-sighted behaviors from predicted sequences of model states. It first
learns the long-term value 𝑣2 ~ 𝑣3 of each state, and then predicts actions 𝑎1 ~ 𝑎2 that lead to
high rewards and values by backpropagating them through the state sequence to the actor
network.
2. Learning Behavior in Imagination
4/18/2023 딥논읽 세미나 - 강화학습 10
INDEX
Introduction
Methods
Performance
Conclusion
PlaNet vs Dreamer
• For a given situation in the environment, PlaNet searches for the best action among many predictions for different
action sequences.
• Dreamer side-steps this expensive search by decoupling planning and acting. Once its actor network has been
trained on predicted sequences, it computes the actions for interacting with the environment without additional search.
In addition, Dreamer considers rewards beyond the planning horizon using a value function and leverages
backpropagation for efficient planning.
3. Act in the Environment
4/18/2023 딥논읽 세미나 - 강화학습 11
INDEX
Introduction
Methods
Performance
Conclusion
The agent encodes the history of the episode to compute the current model state and the next
action to execute in the environment.
4. Explained
4/18/2023 딥논읽 세미나 - 강화학습 12
INDEX
Introduction
Methods
Performance
Conclusion
↖ PlaNet(Omitted.)
4. Explained
4/18/2023 딥논읽 세미나 - 강화학습 13
INDEX
Introduction
Methods
Performance
Conclusion
1. Transition Model
2. Reward Model
3. Policy
4. Objective
(Imgained Rewards)
Actor-Critic Method
From papers on Deep Multi-Agent(Taiki Fuji et al.)
5. Actor-Critic Model
(Parametrized by 𝜙 & 𝜓, respectively)
4. Explained
4/18/2023 딥논읽 세미나 - 강화학습 14
INDEX
Introduction
Methods
Performance
Conclusion
6. Value Estimation
7. Learning Object
Flow of Actor-Critic Method
From slides of Deep RL, Sergey Levine
w/ Imgained Trajectories
↖ Exponential decaying for old Trajectory
↖ Rewards beyond k steps with the learned value model
↖ Actor
↖ Critic
QnA
4/18/2023 딥논읽 세미나 - 강화학습 15
INDEX
Introduction
Methods
Performance
Conclusion
1. Control Tasks
4/18/2023 딥논읽 세미나 - 강화학습 16
INDEX
Introduction
Methods
Performance
Conclusion
• Dreamer learns to solve 20 challenging continuous control tasks with
image inputs, 5 of which are displayed here.
The tasks are designed to pose a variety of challenges to the RL agent, including difficult to
predict collisions, sparse rewards, chaotic dynamics, small but relevant objects, high degrees
of freedom, and 3D perspectives
• The visualizations show the same 64x64 images that the agent receives
from the environment.
2. Comparison
4/18/2023 딥논읽 세미나 - 강화학습 17
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer outperforms the previous best model-free
(D4PG) and model-based (PlaNet) methods on the
benchmark of 20 tasks in terms of final performance,
data efficiency, and computation time.
3. Atari Games
4/18/2023 딥논읽 세미나 - 강화학습 18
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns successful behaviors on Atari games and DeepMind Lab
levels, which feature discrete actions and visually more diverse scenes,
including 3D environments with multiple objects.
QnA
4/18/2023 딥논읽 세미나 - 강화학습 19
INDEX
Introduction
Methods
Performance
Conclusion
Conclusion
4/18/2023 딥논읽 세미나 - 강화학습 20
INDEX
Introduction
Methods
Performance
Conclusion
1. Learning behaviors from sequences predicted by world models alone can solve
challenging visual control tasks from image inputs, surpassing the performance
of previous model-free approaches.
2. Dreamer demonstrates that learning behaviors by backpropagating value gradients through
predicted sequences of compact model states is successful and robust, solving a diverse
collection of continuous and discrete control tasks.
My Questions
4/18/2023 딥논읽 세미나 - 강화학습 21
INDEX
Introduction
Methods
Performance
Conclusion
1. Is there any relation between world model and “common sense” mentioned by Yann LeCun?
2. Is there any evidence for the mechanism of human prediction and dream?
What we see is based on our brain’s prediction of the future,
A. Kitaoka.Kanzen. 2002.
Dreamer Series!
4/18/2023 딥논읽 세미나 - 강화학습 22
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer Series!
4/18/2023 딥논읽 세미나 - 강화학습 23
INDEX
Introduction
Methods
Performance
Conclusion
Thank You For Your Listening!
4/18/2023 딥논읽 세미나 - 강화학습 24
INDEX
Introduction
Methods
Performance
Conclusion
Learning Behaviors by Latent Imagination
DQN model for Text2image, PixRay

More Related Content

Similar to Dream2Control paper review

HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...ijdpsjournal
 
Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...TELKOMNIKA JOURNAL
 
Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)Universitat Politècnica de Catalunya
 
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...Hadi Santoso
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
 
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEWFACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEWIRJET Journal
 
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...Daniele Malitesta
 
Deep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationDeep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationManish Pandey
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...butest
 
Pratik ibm-open power-ppt
Pratik ibm-open power-pptPratik ibm-open power-ppt
Pratik ibm-open power-pptVaibhav R
 
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face RecognitionFeature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face RecognitionCSCJournals
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxssuser2624f71
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET Journal
 

Similar to Dream2Control paper review (20)

ObjectDetection.pptx
ObjectDetection.pptxObjectDetection.pptx
ObjectDetection.pptx
 
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
HIGHLY SCALABLE, PARALLEL AND DISTRIBUTED ADABOOST ALGORITHM USING LIGHT WEIG...
 
Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...Software engineering model based smart indoor localization system using deep-...
Software engineering model based smart indoor localization system using deep-...
 
Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)Inferring and executing programs for visual reasoning (UPC Reading Group)
Inferring and executing programs for visual reasoning (UPC Reading Group)
 
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
A Parallel Architecture for Multiple-Face Detection Technique Using AdaBoost ...
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEWFACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
FACE PHOTO-SKETCH RECOGNITION USING DEEP LEARNING TECHNIQUES - A REVIEW
 
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
[RecSys2023] Challenging the Myth of Graph Collaborative Filtering: a Reasone...
 
Deep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationDeep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual Navigation
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
 
RE@Next_final.pptx
RE@Next_final.pptxRE@Next_final.pptx
RE@Next_final.pptx
 
Pratik ibm-open power-ppt
Pratik ibm-open power-pptPratik ibm-open power-ppt
Pratik ibm-open power-ppt
 
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face RecognitionFeature Fusion and Classifier Ensemble Technique for Robust Face Recognition
Feature Fusion and Classifier Ensemble Technique for Robust Face Recognition
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015
 
Graph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptxGraph convolutional neural networks for web-scale recommender systems.pptx
Graph convolutional neural networks for web-scale recommender systems.pptx
 
OOP in java
OOP in javaOOP in java
OOP in java
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Dream2Control paper review

  • 1. Dream To Control: Learning Behaviors by Latent Imagination LEE, DOHYEON leadh991114@gmail.com 4/18/2023 딥논읽 세미나 - 강화학습 ICLR 2020 (Oral) NeurIPS Deep RL Workshop 2019 (Oral)
  • 2. Contents 1. Introduction 2. Methods 3. Experiments 4. Conclusion 4/18/2023 딥논읽 세미나 - 강화학습 1
  • 3. 1. RL Comparison 4/18/2023 딥논읽 세미나 - 강화학습 2 INDEX Introduction Methods Performance Conclusion Model-Free RL • No Model • Learn value function(and/or policy) from real experience Model–Based RL • Learn a model from real experience • Plan value function(and/or policy) from the simulated experience RL Comparison, from slides of Sergey Levine
  • 4. 2. World Model 4/18/2023 딥논읽 세미나 - 강화학습 3 INDEX Introduction Methods Performance Conclusion “Intelligent agents can achieve goals in complex environments even through they never encouter the exact same situation twice.” “This ability requires building representations of the world from past experience that enable generalization to novel situations.” “World models offer an explicit way to represent an agent’s knowledge about the world in a parametric model that can make predictions about the future” A World Model, from Scott McCloud’s Understanding Comics.
  • 5. 3. Visual Control 4/18/2023 딥논읽 세미나 - 강화학습 4 INDEX Introduction Methods Performance Conclusion “Sensory inputs are high-dimensional images, latent dynamic models can abstract observations to predict forward in compact state spaces.” → latent states have a small memory footprint “Behaviors can be derived from dynamic models in many ways.” → Considering only rewards within a fixed imagination horizon results in shortsighted behaviors → Prior work commonly resorts to derivative-free optimization for robustness
  • 6. 4. PlaNet 4/18/2023 딥논읽 세미나 - 강화학습 5 INDEX Introduction Methods Performance Conclusion An RL agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. Learning Latent Dynamics for Planning from Pixels(Danijar Hafner et al., 2019)
  • 7. 5. Dreamer 4/18/2023 딥논읽 세미나 - 강화학습 6 INDEX Introduction Methods Performance Conclusion An RL agent that learns long-horizontal behaviors from images purely by latent imagination. The three processes of the Dreamer agent. 1. The world model is learned from past experience. 2. From predictions of this model, the agent then learns a value network to predict future rewards and an actor network to select actions. 3. The actor network is used to interact with the environment.
  • 8. QnA 4/18/2023 딥논읽 세미나 - 강화학습 7 INDEX Introduction Methods Performance Conclusion
  • 9. 1. Learning the World Model 4/18/2023 딥논읽 세미나 - 강화학습 8 INDEX Introduction Methods Performance Conclusion Dreamer learns a world model from experience. Using past images 𝑜1 ~ 𝑜3 and actions 𝑎1 ~ 𝑎2, it computes a sequence of compact model states (green circles) from which it reconstructs the images 𝑜1 ~ 𝑜3 and predicts the rewards 𝑟1 ~ 𝑟3. → Leveraging PlaNet
  • 10. 2. Learning Behavior in Imagination 4/18/2023 딥논읽 세미나 - 강화학습 9 INDEX Introduction Methods Performance Conclusion Dreamer learns long-sighted behaviors from predicted sequences of model states. It first learns the long-term value 𝑣2 ~ 𝑣3 of each state, and then predicts actions 𝑎1 ~ 𝑎2 that lead to high rewards and values by backpropagating them through the state sequence to the actor network.
  • 11. 2. Learning Behavior in Imagination 4/18/2023 딥논읽 세미나 - 강화학습 10 INDEX Introduction Methods Performance Conclusion PlaNet vs Dreamer • For a given situation in the environment, PlaNet searches for the best action among many predictions for different action sequences. • Dreamer side-steps this expensive search by decoupling planning and acting. Once its actor network has been trained on predicted sequences, it computes the actions for interacting with the environment without additional search. In addition, Dreamer considers rewards beyond the planning horizon using a value function and leverages backpropagation for efficient planning.
  • 12. 3. Act in the Environment 4/18/2023 딥논읽 세미나 - 강화학습 11 INDEX Introduction Methods Performance Conclusion The agent encodes the history of the episode to compute the current model state and the next action to execute in the environment.
  • 13. 4. Explained 4/18/2023 딥논읽 세미나 - 강화학습 12 INDEX Introduction Methods Performance Conclusion ↖ PlaNet(Omitted.)
  • 14. 4. Explained 4/18/2023 딥논읽 세미나 - 강화학습 13 INDEX Introduction Methods Performance Conclusion 1. Transition Model 2. Reward Model 3. Policy 4. Objective (Imgained Rewards) Actor-Critic Method From papers on Deep Multi-Agent(Taiki Fuji et al.) 5. Actor-Critic Model (Parametrized by 𝜙 & 𝜓, respectively)
  • 15. 4. Explained 4/18/2023 딥논읽 세미나 - 강화학습 14 INDEX Introduction Methods Performance Conclusion 6. Value Estimation 7. Learning Object Flow of Actor-Critic Method From slides of Deep RL, Sergey Levine w/ Imgained Trajectories ↖ Exponential decaying for old Trajectory ↖ Rewards beyond k steps with the learned value model ↖ Actor ↖ Critic
  • 16. QnA 4/18/2023 딥논읽 세미나 - 강화학습 15 INDEX Introduction Methods Performance Conclusion
  • 17. 1. Control Tasks 4/18/2023 딥논읽 세미나 - 강화학습 16 INDEX Introduction Methods Performance Conclusion • Dreamer learns to solve 20 challenging continuous control tasks with image inputs, 5 of which are displayed here. The tasks are designed to pose a variety of challenges to the RL agent, including difficult to predict collisions, sparse rewards, chaotic dynamics, small but relevant objects, high degrees of freedom, and 3D perspectives • The visualizations show the same 64x64 images that the agent receives from the environment.
  • 18. 2. Comparison 4/18/2023 딥논읽 세미나 - 강화학습 17 INDEX Introduction Methods Performance Conclusion Dreamer outperforms the previous best model-free (D4PG) and model-based (PlaNet) methods on the benchmark of 20 tasks in terms of final performance, data efficiency, and computation time.
  • 19. 3. Atari Games 4/18/2023 딥논읽 세미나 - 강화학습 18 INDEX Introduction Methods Performance Conclusion Dreamer learns successful behaviors on Atari games and DeepMind Lab levels, which feature discrete actions and visually more diverse scenes, including 3D environments with multiple objects.
  • 20. QnA 4/18/2023 딥논읽 세미나 - 강화학습 19 INDEX Introduction Methods Performance Conclusion
  • 21. Conclusion 4/18/2023 딥논읽 세미나 - 강화학습 20 INDEX Introduction Methods Performance Conclusion 1. Learning behaviors from sequences predicted by world models alone can solve challenging visual control tasks from image inputs, surpassing the performance of previous model-free approaches. 2. Dreamer demonstrates that learning behaviors by backpropagating value gradients through predicted sequences of compact model states is successful and robust, solving a diverse collection of continuous and discrete control tasks.
  • 22. My Questions 4/18/2023 딥논읽 세미나 - 강화학습 21 INDEX Introduction Methods Performance Conclusion 1. Is there any relation between world model and “common sense” mentioned by Yann LeCun? 2. Is there any evidence for the mechanism of human prediction and dream? What we see is based on our brain’s prediction of the future, A. Kitaoka.Kanzen. 2002.
  • 23. Dreamer Series! 4/18/2023 딥논읽 세미나 - 강화학습 22 INDEX Introduction Methods Performance Conclusion
  • 24. Dreamer Series! 4/18/2023 딥논읽 세미나 - 강화학습 23 INDEX Introduction Methods Performance Conclusion
  • 25. Thank You For Your Listening! 4/18/2023 딥논읽 세미나 - 강화학습 24 INDEX Introduction Methods Performance Conclusion Learning Behaviors by Latent Imagination DQN model for Text2image, PixRay