Dream2Control paper review

Dream To Control:
Learning Behaviors by Latent Imagination
LEE, DOHYEON
leadh991114@gmail.com
4/18/2023 딥논읽 세미나 - 강화학습
ICLR 2020 (Oral)
NeurIPS Deep RL Workshop 2019 (Oral)

Contents
1. Introduction
2. Methods
3. Experiments
4. Conclusion
4/18/2023 딥논읽 세미나 - 강화학습 1

1. RL Comparison
4/18/2023 딥논읽 세미나 - 강화학습 2
INDEX
Introduction
Methods
Performance
Conclusion
Model-Free RL
• No Model
• Learn value function(and/or policy) from real experience
Model–Based RL
• Learn a model from real experience
• Plan value function(and/or policy) from the simulated experience
RL Comparison, from slides of Sergey Levine

2. World Model
4/18/2023 딥논읽 세미나 - 강화학습 3
INDEX
Introduction
Methods
Performance
Conclusion
“Intelligent agents can achieve goals in complex environments
even through they never encouter the exact same situation twice.”
“This ability requires building representations of the world from past
experience that enable generalization to novel situations.”
“World models offer an explicit way to represent an agent’s knowledge
about the world in a parametric model that can make predictions about the
future”
A World Model, from Scott McCloud’s Understanding Comics.

3. Visual Control
4/18/2023 딥논읽 세미나 - 강화학습 4
INDEX
Introduction
Methods
Performance
Conclusion
“Sensory inputs are high-dimensional images, latent dynamic models can
abstract observations to predict forward in compact state spaces.”
→ latent states have a small memory footprint
“Behaviors can be derived from dynamic models in many ways.”
→ Considering only rewards within a fixed imagination horizon results in shortsighted behaviors
→ Prior work commonly resorts to derivative-free optimization for robustness

4. PlaNet
4/18/2023 딥논읽 세미나 - 강화학습 5
INDEX
Introduction
Methods
Performance
Conclusion
An RL agent that learns the environment dynamics from
images and chooses actions through fast online planning in
latent space.
Learning Latent Dynamics for Planning from Pixels(Danijar Hafner et al., 2019)

5. Dreamer
4/18/2023 딥논읽 세미나 - 강화학습 6
INDEX
Introduction
Methods
Performance
Conclusion
An RL agent that learns long-horizontal behaviors from
images purely by latent imagination.
The three processes of the Dreamer agent.
1. The world model is learned from past experience.
2. From predictions of this model, the agent then learns a value network
to predict future rewards and an actor network to select actions.
3. The actor network is used to interact with the environment.

QnA
4/18/2023 딥논읽 세미나 - 강화학습 7
INDEX
Introduction
Methods
Performance
Conclusion

1. Learning the World Model
4/18/2023 딥논읽 세미나 - 강화학습 8
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns a world model from experience. Using past images 𝑜1 ~ 𝑜3 and actions 𝑎1 ~
𝑎2, it computes a sequence of compact model states (green circles) from which it reconstructs
the images 𝑜1 ~ 𝑜3 and predicts the rewards 𝑟1 ~ 𝑟3. → Leveraging PlaNet

2. Learning Behavior in Imagination
4/18/2023 딥논읽 세미나 - 강화학습 9
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns long-sighted behaviors from predicted sequences of model states. It first
learns the long-term value 𝑣2 ~ 𝑣3 of each state, and then predicts actions 𝑎1 ~ 𝑎2 that lead to
high rewards and values by backpropagating them through the state sequence to the actor
network.

2. Learning Behavior in Imagination
4/18/2023 딥논읽 세미나 - 강화학습 10
INDEX
Introduction
Methods
Performance
Conclusion
PlaNet vs Dreamer
• For a given situation in the environment, PlaNet searches for the best action among many predictions for different
action sequences.
• Dreamer side-steps this expensive search by decoupling planning and acting. Once its actor network has been
trained on predicted sequences, it computes the actions for interacting with the environment without additional search.
In addition, Dreamer considers rewards beyond the planning horizon using a value function and leverages
backpropagation for efficient planning.

3. Act in the Environment
4/18/2023 딥논읽 세미나 - 강화학습 11
INDEX
Introduction
Methods
Performance
Conclusion
The agent encodes the history of the episode to compute the current model state and the next
action to execute in the environment.

4. Explained
4/18/2023 딥논읽 세미나 - 강화학습 12
INDEX
Introduction
Methods
Performance
Conclusion
↖ PlaNet(Omitted.)

4. Explained
4/18/2023 딥논읽 세미나 - 강화학습 13
INDEX
Introduction
Methods
Performance
Conclusion
1. Transition Model
2. Reward Model
3. Policy
4. Objective
(Imgained Rewards)
Actor-Critic Method
From papers on Deep Multi-Agent(Taiki Fuji et al.)
5. Actor-Critic Model
(Parametrized by 𝜙 & 𝜓, respectively)

4. Explained
4/18/2023 딥논읽 세미나 - 강화학습 14
INDEX
Introduction
Methods
Performance
Conclusion
6. Value Estimation
7. Learning Object
Flow of Actor-Critic Method
From slides of Deep RL, Sergey Levine
w/ Imgained Trajectories
↖ Exponential decaying for old Trajectory
↖ Rewards beyond k steps with the learned value model
↖ Actor
↖ Critic

QnA
4/18/2023 딥논읽 세미나 - 강화학습 15
INDEX
Introduction
Methods
Performance
Conclusion

1. Control Tasks
4/18/2023 딥논읽 세미나 - 강화학습 16
INDEX
Introduction
Methods
Performance
Conclusion
• Dreamer learns to solve 20 challenging continuous control tasks with
image inputs, 5 of which are displayed here.
The tasks are designed to pose a variety of challenges to the RL agent, including difficult to
predict collisions, sparse rewards, chaotic dynamics, small but relevant objects, high degrees
of freedom, and 3D perspectives
• The visualizations show the same 64x64 images that the agent receives
from the environment.

2. Comparison
4/18/2023 딥논읽 세미나 - 강화학습 17
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer outperforms the previous best model-free
(D4PG) and model-based (PlaNet) methods on the
benchmark of 20 tasks in terms of final performance,
data efficiency, and computation time.

3. Atari Games
4/18/2023 딥논읽 세미나 - 강화학습 18
INDEX
Introduction
Methods
Performance
Conclusion
Dreamer learns successful behaviors on Atari games and DeepMind Lab
levels, which feature discrete actions and visually more diverse scenes,
including 3D environments with multiple objects.

QnA
4/18/2023 딥논읽 세미나 - 강화학습 19
INDEX
Introduction
Methods
Performance
Conclusion

Conclusion
4/18/2023 딥논읽 세미나 - 강화학습 20
INDEX
Introduction
Methods
Performance
Conclusion
1. Learning behaviors from sequences predicted by world models alone can solve
challenging visual control tasks from image inputs, surpassing the performance
of previous model-free approaches.
2. Dreamer demonstrates that learning behaviors by backpropagating value gradients through
predicted sequences of compact model states is successful and robust, solving a diverse
collection of continuous and discrete control tasks.

My Questions
4/18/2023 딥논읽 세미나 - 강화학습 21
INDEX
Introduction
Methods
Performance
Conclusion
1. Is there any relation between world model and “common sense” mentioned by Yann LeCun?
2. Is there any evidence for the mechanism of human prediction and dream?
What we see is based on our brain’s prediction of the future,
A. Kitaoka.Kanzen. 2002.

Dreamer Series!
4/18/2023 딥논읽 세미나 - 강화학습 22
INDEX
Introduction
Methods
Performance
Conclusion

Dreamer Series!
4/18/2023 딥논읽 세미나 - 강화학습 23
INDEX
Introduction
Methods
Performance
Conclusion

Thank You For Your Listening!
4/18/2023 딥논읽 세미나 - 강화학습 24
INDEX
Introduction
Methods
Performance
Conclusion
Learning Behaviors by Latent Imagination
DQN model for Text2image, PixRay

Dream2Control paper review

Recommended

Recommended

More Related Content

Similar to Dream2Control paper review

Similar to Dream2Control paper review (20)

More from taeseon ryu

More from taeseon ryu (20)

Recently uploaded

Recently uploaded (20)

Dream2Control paper review