3. Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra (Facebook Research)
https://arxiv.org/abs/1711.11543
“Embodied Question Answering” (arXiv, 2017)
Overview
This paper proposes Embodied Question Answering
(EmbodiedQA) task.
The simulator is available in github
https://github.com/facebookresearch/house3d
Key Point of Proposed Method
Difference between existing QA tasks
1) State is presented as a first person view
2) Agent needs its actions in order to answer correctly
In Experiment, they use hierarchical RL consisted of
planner and controller
- Train separately both modules of navigation and QA,
then joint two modules
Main Insights
Design concept of task
“Long term objective is to make intelligent agents that
can perceive, communicate and act”
- need active perception
- need inference with “common sense”
ex) If asked about a car, agents try to go garage,
- need grounding of symbol and real world
4. David Ha, Jürgen Schmidhuber
https://arxiv.org/abs/1803.10122
“World Models” (arXiv, 2018)
- VAE RNN
-
(hallucinated dream)
- VAE
- z ( RNN)
-
(z h )
RNN
But RL credit assignment
NN
-
NN
Key
- CarRacing-v0
-
5. David Ha, Jürgen Schmidhuber
https://arxiv.org/abs/1803.10122
“World Models” (arXiv, 2018)
Overview
This paper proposes to learn dynamics of environment
and control of agent separately in RL settings.
- model dynamics of environment using VAE and
mixture gaussian RNN
- We can make controller simpler (with fewer
parameters)
By learning the model of environment, the agent can
learn policies without interacting real environment
(hallucinated dream), then even transfer into real
settings.
Key Point of Proposed Method
Making the controller simpler by dividing modules into
“World Model” with a RNN, and controller with small
number of parameters
- dimension reduction with VAE
- predict latent representation z using Gaussian
Mixture RNN
- simple controller with linear model
Difference between Previous Work
Large RNNs have high capacity, but in RL setting,
there’s credit assignment problem, so existing method
tended to use smaller RNNs.
In proposed method, the model is divided into the
model of environment and controller, so large RNNs
can be used.
Main Insights
- First model that achieved required score in
CarRacing-v0 task
- solve task using only learned environment model