SlideShare uma empresa Scribd logo
1 de 44
Planning in Reinforcement
Learning
Yuandong Tian
Research Scientist
Facebook AI Research
AI works in a lot of situations
Medical Translation
Personalization Surveillance
Object Recognition
Smart Design
Speech Recognition
Board game
What AI still needs to improve
Very few supervised data
Complicated environments
Lots of Corner cases.
Home Robotics
Autonomous Driving
ChatBot
StarCraftQuestion Answering
Common Sense
Exponential space
to explore
What AI still needs to improve
Human level
A scary trend of slowing down
“Man, we need more data”
Initial Enthusiasm
“It really works! All in AI!”
Trying all possible hacks
“How can that be…” Despair
“No way, it doesn’t work”
Performance
Efforts
What AI still needs to improve
Human level
A scary trend of slowing down
“Man, we need more data”
Initial Enthusiasm
“It really works! All in AI!”
Trying all possible hacks
“How can that be…” Despair
“No way, it doesn’t work”
Performance
Efforts
We need novel algorithms
Reinforcement Learning
Action
State Reward
Agent
Environment
[R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction]
Atari Games Go
DoTA 2 Doom
Quake 3 StarCraft
Why Planning is important?
Just one example
AlphaGo Zero
Update
Models
Generate
Training data
Self-Replays
Zero-human knowledge
[Silver et al, Mastering the game of Go without human knowledge, Nature 2017]
AlphaGo Zero Strength
• 3 days version
• 4.9M Games, 1600 rollouts/move
• 20 block ResNet
• Defeat AlphaGo Lee.
• 40 days version
• 29M Games, 1600 rollouts/move
• 40 blocks ResNet.
• Defeat AlphaGo Master by 89:11
ELF: Extensive, Lightweight and Flexible
Framework for Game Research
Larry ZitnickQucheng Gong Wenling Shang Yuxin WuYuandong Tian
https://github.com/facebookresearch/ELF
[Y. Tian et al, ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games, NIPS 2017]
ELF: A simple for-loop Action
State Reward
Agent
Environment
C++
Python
How ELF works
Game
Threads
(C++)
0
1
2
3
4
5
6
7
Batch BatchBatch Batch Batch
Python
Distributed ELF
Server
Evaluate/Selfplay
Training
Send request
(game params)
Receive
experiences
Client
Client
Client Client Client
Client
Client
AlphaGoZero (more synchronization)
AlphaZero (less synchronization)
Putting AlphaGoZero and AlphaZero
into the same framework
ELF OpenGo
• System can be trained with 2000 GPUs in 2 weeks.
• Decent performance against professional players and strong bots.
• Abundant ablation analysis
• Decoupled design, code highly reusable for other games.
We open source the code and the pre-trained model for the Go and ML community
https://github.com/pytorch/ELF
Simple tutorial in experimental branch (tutorial, tutorial_distri)
ELF OpenGo Performance
20-0Name (rank) ELO (world rank) Result
Kim Ji-seok 3590 (#3) 5-0
Shin Jin-seo 3570 (#5) 5-0
Park Yeonghun 3481 (#23) 5-0
Choi Cheolhan 3466 (#30) 5-0
Single GPU, 80k rollouts, 50 seconds
Offer unlimited thinking time for the players
Vs top professional players
Vs strong bot (LeelaZero)
[158603eb, 192x15, Apr. 25, 2018]: 980 wins, 18 losses (98.2%)
Vs professional players
Single GPU, 2k rollouts, 27-0 against Taiwanese pros.
Planning is how new knowledge is created
Game Start Game End
Random Moves
Meaningful Moves
Already dan level even if the
opening doesn’t make much sense.
Planning #rollouts
Win rate against
bot without planning
50%
100%
Training is almost always constrained
by model capacity (why 40b > 20b)
Where the
reward signal is
Planning is how new knowledge is created
T-step
looking forward
Tree searchOne-step
looking forward
Monte Carlo Sampling
Learning a neural network that directly predicts the optimal value/policy
Temporal Difference (TD) in Reinforcement Learning:
Planning is how game AI is created
Extensive search/planning Evaluate Consequence
Black wins
White wins
Black wins
White wins
Black wins
Current game situation
Lufei Ruan vs. Yifan Hou (2010)
AlphaBeta Pruning + Iterative Deepening
Monte Carlo Tree Search
…
How to plan without a known model?
If you don’t have a ground truth dynamics model …
• Only (human/expert) trajectories, no world model.
• Limited access of world models.
• Cannot restart the model, cannot query any (s, a) pair
• Noisy signals from the world model
• …
If you have a ground truth dynamics model …
• Infinite access of the exact world model
• May query any (s, a) pair
• …
Current
state
Next
state
Action to
take
Build one
Navigation
Target
Yi Wu Georgia Gkioxari Yuxin Wu How to plan the trajectory?
Build a semantic model
outdoor
living room
sofa
Find “oven”
car
chair
dining room
kitchen
oven
Incomplete model of the environment
[Y. Wu et al, Learning and Planning with a Semantic Model, submitted to ICLR 2019]
Build a semantic model
Bayesian Inference
𝑃(𝑧|𝑌)
car
chair
dining room
kitchen
oven
0.7
0.95
0.8
0.5
0.6
0.7
Next step
“kitchen”
outdoor
living room
sofa
Learning experience 𝑌
LEAPS
LEArning and Planning with a Semantic model
living room
kitchen
chair
sofa
Dining room
𝑃(𝑧kitchen,living room)
𝑃(𝑧sofa,living room)
𝑃(𝑧chair,living room)
𝑃(𝑧dining,living room)
living room
kitchen
chair
sofa
Dining room
𝑃(𝑧kitchen,living room)
𝑃 𝑧sofa,living room 𝑌 = 1
𝑃(𝑧chair,living room)
𝑃(𝑧dining,living room)
Planning the trajectory
and more exploration
Planning the trajectory
and more exploration
House3D
SUNCG dataset, 45K scenes, all objects are fully labeled.
https://github.com/facebookresearch/House3D
Depth
Segmentation maskRGB image
Top-down map
Learning the Prior between Different Rooms
Test Performance on ConceptNav
Case Study
• A case study
• Go to “outdoor”
Prior: 𝑃(𝑧)
Birth
Outdoor
Living
room
Garage
0.12
0.38
0.76
0.73 0.28
Sub-Goal: Outdoor
Case Study
• A case study
• Go to “outdoor”
Sub-Goal: Outdoor
Failed!
Birth
Outdoor
Living
room
Garage
0.12
0.38
0.01
0.73 0.28
Posterior: 𝑃(𝑧|𝐸)
• A case study
• Go to “outdoor”
Posterior: 𝑃(𝑧|𝐸)
Sub-Goal: Garage
Birth
Outdoor
Living
room
Garage
0.12
0.01
0.73 0.28
Failed!
Case Study
0.08
Case Study
• A case study
• Go to “outdoor”
Sub-Goal: Living Room
Posterior: 𝑃(𝑧|𝐸)
Birth
Outdoor
Living
room
Garage
0.12
0.08
0.01
0.28
Success
0.99
Case Study
• A case study
• Go to “outdoor”
Sub-Goal: Outdoor
Posterior: 𝑃(𝑧|𝐸)
Success
Birth
Outdoor
Living
room
Garage
0.99
0.08
0.01
0.280.99
Improving Existing Plans
Iterative LQR
What if the dynamics is nonlinear?
New Policy
Sample according to the new policy
Non-differentiable Plans
• Direct predicting combinatorial solutions.
[O. Vinyals. et al, Pointer Networks, NIPS 2015]
Convex hull
Seq2seq model
[H. Mao et al, Resource Management with Deep
Reinforcement Learning, ACM Workshop on Hot
Topics in Networks, 2016]
Schedule the job
to i-th slot
Policy gradient
Neural network rewriter
st
Sample 𝒈 𝒕 ∼ 𝑺𝑷 𝒈 𝒕 , 𝒈 𝒕 ⊂ 𝒔 𝒕 Sample 𝒂 𝒕 ∼ 𝑹𝑺 𝒈 𝒕
𝒔 𝒕+𝟏 = 𝒇(𝒔 𝒕, 𝒈 𝒕, 𝒂 𝒕)
Input Encoder Score Predictor Rule Selector
[X. Chen and Y. Tian, Learning to Progressively Plan, submitted to ICLR 2019]
Is it simpler to improve the solution progressively?
Job Scheduling
Scheduling
1
2
3
T=2, S=1
T=3, S=2
T=1, S=3
0
1
2
3
1 32 4 5 6
0
1
2
3
1
2
3
1 32 4 5 6
1
2
3
Graph representationJobs
10
Resource 1
Resource 2
time
time
Slow down
Slow
down
Job Scheduling
g2 0.1
g4 0.7
g5 0.2
Input Encoder Score Predictor Rule Selector
0.1 -0.3 0.2
DAG-LSTM
Embedding
0
1
5
3
42
FC
FC
FC
Softmax
St
at
St+1
0
1
5
3
42
-0.2 0.5
𝐠 𝐭 0
1
5
4
32
Job Scheduling
#Resources 2 5 10 20
Shortest Job First 4.80 5.83 5.58 5.00
Shortest First Search 4.25 5.05 5.54 4.98
DeepRM 2.81 6.52 9.20 10.18
Neural Rewriter (Ours) 2.80 3.36 4.50 4.63
Optim (LB) 2.57 3.02 4.08 4.26
Earliest Job First (UB) 11.11 13.62 22.13 24.23
Expression Simplification
a1 0
…
a5 1
(Constant Reduction)
…
a19 0
Input Encoder Score Predictor Rule Selector
Embedding
Tree-LSTM FC
0
-1 0.7
FC
Softmax
St St+1
𝐠 𝐭
at
<=
min -
v0 v2 v1 v1
<=
min -
v0 v2 v1 v1
<=
min 0
v0 v2
Expression Simplification
Expr Len
Reduction
Tree Size
Reduction
Halide Rule-based
Rewriter
36.13 9.68
Heuristic Search 43.27 12.09
Neural Rewriter (Ours) 46.98 13.53
Example
≤
5
max
v 3
+
3
5 ≤ max 𝑣, 3 + 3
≤
5
v 3
max
+ +
33
≤
5
v 3
+
≤
5
3 3
+
OR
≤
5
v 3
+
True
OR
True
Future Directions
Multi-Agent
Admiral (General) Captain Lieutenant
Hierarchical RL
RL Systems Model-based RL
Policy
Optimization
Model
Estimation
RL applications
RL for Optimization
How to do well in Reinforcement Learning?
Experience on
(distributed) systems
Strong math skills
Strong coding skills
Parameter tuning skills
Thanks!

Mais conteúdo relacionado

Mais procurados

iT Cafe - OT & The first approach to Deep Learning
iT Cafe - OT & The first approach to Deep LearningiT Cafe - OT & The first approach to Deep Learning
iT Cafe - OT & The first approach to Deep LearningDongmin Kim
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013MLconf
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice남주 김
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learningjoshwills
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFMLconf
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningMadhu Sanjeevi (Mady)
 
Deep Learning Review
Deep Learning ReviewDeep Learning Review
Deep Learning Review明信 蘇
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer visionEran Shlomo
 

Mais procurados (11)

TensorFlow 101
TensorFlow 101TensorFlow 101
TensorFlow 101
 
Practical Deep Learning
Practical Deep LearningPractical Deep Learning
Practical Deep Learning
 
iT Cafe - OT & The first approach to Deep Learning
iT Cafe - OT & The first approach to Deep LearningiT Cafe - OT & The first approach to Deep Learning
iT Cafe - OT & The first approach to Deep Learning
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 
Deep Learning Review
Deep Learning ReviewDeep Learning Review
Deep Learning Review
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 

Semelhante a Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning

Atari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural NetworksAtari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural Networksjohnstamford
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
One Person, One Model, One World: Learning Continual User Representation wi...
One Person, One Model, One World:  Learning Continual User Representation  wi...One Person, One Model, One World:  Learning Continual User Representation  wi...
One Person, One Model, One World: Learning Continual User Representation wi...westlakereplab
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
 
AI_Module_1_Lecture_1.pptx
AI_Module_1_Lecture_1.pptxAI_Module_1_Lecture_1.pptx
AI_Module_1_Lecture_1.pptxadityab33
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Jun Okumura
 
Machine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdfMachine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdfEd Fernandez
 
TensorfLow_Basic.pptx
TensorfLow_Basic.pptxTensorfLow_Basic.pptx
TensorfLow_Basic.pptxTMUb202109065
 
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit Shah
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit ShahANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit Shah
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit ShahAgileNetwork
 
Dealing with Estimation, Uncertainty, Risk, and Commitment
Dealing with Estimation, Uncertainty, Risk, and CommitmentDealing with Estimation, Uncertainty, Risk, and Commitment
Dealing with Estimation, Uncertainty, Risk, and CommitmentTechWell
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfMohammad Shaker
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Austin Ogilvie
 
Worker Productivity 20230628 v1.pptx
Worker Productivity 20230628 v1.pptxWorker Productivity 20230628 v1.pptx
Worker Productivity 20230628 v1.pptxISSIP
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.NetBruno Capuano
 
HPAI Class 2 - human aspects and computing systems in ai - 012920
HPAI  Class 2 - human aspects and computing systems in ai - 012920HPAI  Class 2 - human aspects and computing systems in ai - 012920
HPAI Class 2 - human aspects and computing systems in ai - 012920melendez321
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...multimediaeval
 

Semelhante a Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning (20)

Atari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural NetworksAtari Game State Representation using Convolutional Neural Networks
Atari Game State Representation using Convolutional Neural Networks
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
One Person, One Model, One World: Learning Continual User Representation wi...
One Person, One Model, One World:  Learning Continual User Representation  wi...One Person, One Model, One World:  Learning Continual User Representation  wi...
One Person, One Model, One World: Learning Continual User Representation wi...
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 
AI_Module_1_Lecture_1.pptx
AI_Module_1_Lecture_1.pptxAI_Module_1_Lecture_1.pptx
AI_Module_1_Lecture_1.pptx
 
Open ai openpower
Open ai openpowerOpen ai openpower
Open ai openpower
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)
 
Machine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdfMachine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdf
 
TensorfLow_Basic.pptx
TensorfLow_Basic.pptxTensorfLow_Basic.pptx
TensorfLow_Basic.pptx
 
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit Shah
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit ShahANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit Shah
ANIn Pune July 2023 |Prompt Engineering and AI first SDLC by Abhijit Shah
 
Machine learning 101 Talk at Freshworks
Machine learning 101 Talk at FreshworksMachine learning 101 Talk at Freshworks
Machine learning 101 Talk at Freshworks
 
Dealing with Estimation, Uncertainty, Risk, and Commitment
Dealing with Estimation, Uncertainty, Risk, and CommitmentDealing with Estimation, Uncertainty, Risk, and Commitment
Dealing with Estimation, Uncertainty, Risk, and Commitment
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
 
Worker Productivity 20230628 v1.pptx
Worker Productivity 20230628 v1.pptxWorker Productivity 20230628 v1.pptx
Worker Productivity 20230628 v1.pptx
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net
 
HPAI Class 2 - human aspects and computing systems in ai - 012920
HPAI  Class 2 - human aspects and computing systems in ai - 012920HPAI  Class 2 - human aspects and computing systems in ai - 012920
HPAI Class 2 - human aspects and computing systems in ai - 012920
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 

Mais de AI Frontiers

Divya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video SummarizationDivya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video SummarizationAI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...AI Frontiers
 
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...AI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...AI Frontiers
 
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksTraining at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksAI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...AI Frontiers
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...AI Frontiers
 
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningPercy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningAI Frontiers
 
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI mission
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI missionIlya Sutskever at AI Frontiers : Progress towards the OpenAI mission
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI missionAI Frontiers
 
Mark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber ElevateMark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber ElevateAI Frontiers
 
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...AI Frontiers
 
Arnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyArnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyAI Frontiers
 
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...AI Frontiers
 
Sumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for EnterpriseSumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for EnterpriseAI Frontiers
 
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAlex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAI Frontiers
 
Melissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & FinanceMelissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & FinanceAI Frontiers
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...AI Frontiers
 
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic ProblemsAshok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic ProblemsAI Frontiers
 
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...AI Frontiers
 

Mais de AI Frontiers (20)

Divya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video SummarizationDivya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video Summarization
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
 
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
 
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksTraining at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
 
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningPercy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
 
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI mission
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI missionIlya Sutskever at AI Frontiers : Progress towards the OpenAI mission
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI mission
 
Mark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber ElevateMark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber Elevate
 
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
 
Arnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyArnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the Sky
 
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
 
Sumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for EnterpriseSumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for Enterprise
 
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAlex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
 
Melissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & FinanceMelissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & Finance
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
 
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic ProblemsAshok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
 
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning

  • 1. Planning in Reinforcement Learning Yuandong Tian Research Scientist Facebook AI Research
  • 2. AI works in a lot of situations Medical Translation Personalization Surveillance Object Recognition Smart Design Speech Recognition Board game
  • 3. What AI still needs to improve Very few supervised data Complicated environments Lots of Corner cases. Home Robotics Autonomous Driving ChatBot StarCraftQuestion Answering Common Sense Exponential space to explore
  • 4. What AI still needs to improve Human level A scary trend of slowing down “Man, we need more data” Initial Enthusiasm “It really works! All in AI!” Trying all possible hacks “How can that be…” Despair “No way, it doesn’t work” Performance Efforts
  • 5. What AI still needs to improve Human level A scary trend of slowing down “Man, we need more data” Initial Enthusiasm “It really works! All in AI!” Trying all possible hacks “How can that be…” Despair “No way, it doesn’t work” Performance Efforts We need novel algorithms
  • 6. Reinforcement Learning Action State Reward Agent Environment [R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction] Atari Games Go DoTA 2 Doom Quake 3 StarCraft
  • 7. Why Planning is important? Just one example
  • 8. AlphaGo Zero Update Models Generate Training data Self-Replays Zero-human knowledge [Silver et al, Mastering the game of Go without human knowledge, Nature 2017]
  • 9. AlphaGo Zero Strength • 3 days version • 4.9M Games, 1600 rollouts/move • 20 block ResNet • Defeat AlphaGo Lee. • 40 days version • 29M Games, 1600 rollouts/move • 40 blocks ResNet. • Defeat AlphaGo Master by 89:11
  • 10. ELF: Extensive, Lightweight and Flexible Framework for Game Research Larry ZitnickQucheng Gong Wenling Shang Yuxin WuYuandong Tian https://github.com/facebookresearch/ELF [Y. Tian et al, ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games, NIPS 2017]
  • 11. ELF: A simple for-loop Action State Reward Agent Environment C++ Python
  • 13. Distributed ELF Server Evaluate/Selfplay Training Send request (game params) Receive experiences Client Client Client Client Client Client Client AlphaGoZero (more synchronization) AlphaZero (less synchronization) Putting AlphaGoZero and AlphaZero into the same framework
  • 14. ELF OpenGo • System can be trained with 2000 GPUs in 2 weeks. • Decent performance against professional players and strong bots. • Abundant ablation analysis • Decoupled design, code highly reusable for other games. We open source the code and the pre-trained model for the Go and ML community https://github.com/pytorch/ELF Simple tutorial in experimental branch (tutorial, tutorial_distri)
  • 15. ELF OpenGo Performance 20-0Name (rank) ELO (world rank) Result Kim Ji-seok 3590 (#3) 5-0 Shin Jin-seo 3570 (#5) 5-0 Park Yeonghun 3481 (#23) 5-0 Choi Cheolhan 3466 (#30) 5-0 Single GPU, 80k rollouts, 50 seconds Offer unlimited thinking time for the players Vs top professional players Vs strong bot (LeelaZero) [158603eb, 192x15, Apr. 25, 2018]: 980 wins, 18 losses (98.2%) Vs professional players Single GPU, 2k rollouts, 27-0 against Taiwanese pros.
  • 16. Planning is how new knowledge is created Game Start Game End Random Moves Meaningful Moves Already dan level even if the opening doesn’t make much sense. Planning #rollouts Win rate against bot without planning 50% 100% Training is almost always constrained by model capacity (why 40b > 20b) Where the reward signal is
  • 17. Planning is how new knowledge is created T-step looking forward Tree searchOne-step looking forward Monte Carlo Sampling Learning a neural network that directly predicts the optimal value/policy Temporal Difference (TD) in Reinforcement Learning:
  • 18. Planning is how game AI is created Extensive search/planning Evaluate Consequence Black wins White wins Black wins White wins Black wins Current game situation Lufei Ruan vs. Yifan Hou (2010) AlphaBeta Pruning + Iterative Deepening Monte Carlo Tree Search …
  • 19. How to plan without a known model? If you don’t have a ground truth dynamics model … • Only (human/expert) trajectories, no world model. • Limited access of world models. • Cannot restart the model, cannot query any (s, a) pair • Noisy signals from the world model • … If you have a ground truth dynamics model … • Infinite access of the exact world model • May query any (s, a) pair • … Current state Next state Action to take Build one
  • 20. Navigation Target Yi Wu Georgia Gkioxari Yuxin Wu How to plan the trajectory?
  • 21. Build a semantic model outdoor living room sofa Find “oven” car chair dining room kitchen oven Incomplete model of the environment [Y. Wu et al, Learning and Planning with a Semantic Model, submitted to ICLR 2019]
  • 22. Build a semantic model Bayesian Inference 𝑃(𝑧|𝑌) car chair dining room kitchen oven 0.7 0.95 0.8 0.5 0.6 0.7 Next step “kitchen” outdoor living room sofa Learning experience 𝑌
  • 23. LEAPS LEArning and Planning with a Semantic model living room kitchen chair sofa Dining room 𝑃(𝑧kitchen,living room) 𝑃(𝑧sofa,living room) 𝑃(𝑧chair,living room) 𝑃(𝑧dining,living room) living room kitchen chair sofa Dining room 𝑃(𝑧kitchen,living room) 𝑃 𝑧sofa,living room 𝑌 = 1 𝑃(𝑧chair,living room) 𝑃(𝑧dining,living room) Planning the trajectory and more exploration Planning the trajectory and more exploration
  • 24. House3D SUNCG dataset, 45K scenes, all objects are fully labeled. https://github.com/facebookresearch/House3D Depth Segmentation maskRGB image Top-down map
  • 25. Learning the Prior between Different Rooms
  • 26. Test Performance on ConceptNav
  • 27. Case Study • A case study • Go to “outdoor” Prior: 𝑃(𝑧) Birth Outdoor Living room Garage 0.12 0.38 0.76 0.73 0.28 Sub-Goal: Outdoor
  • 28. Case Study • A case study • Go to “outdoor” Sub-Goal: Outdoor Failed! Birth Outdoor Living room Garage 0.12 0.38 0.01 0.73 0.28 Posterior: 𝑃(𝑧|𝐸)
  • 29. • A case study • Go to “outdoor” Posterior: 𝑃(𝑧|𝐸) Sub-Goal: Garage Birth Outdoor Living room Garage 0.12 0.01 0.73 0.28 Failed! Case Study 0.08
  • 30. Case Study • A case study • Go to “outdoor” Sub-Goal: Living Room Posterior: 𝑃(𝑧|𝐸) Birth Outdoor Living room Garage 0.12 0.08 0.01 0.28 Success 0.99
  • 31. Case Study • A case study • Go to “outdoor” Sub-Goal: Outdoor Posterior: 𝑃(𝑧|𝐸) Success Birth Outdoor Living room Garage 0.99 0.08 0.01 0.280.99
  • 33. Iterative LQR What if the dynamics is nonlinear? New Policy Sample according to the new policy
  • 34. Non-differentiable Plans • Direct predicting combinatorial solutions. [O. Vinyals. et al, Pointer Networks, NIPS 2015] Convex hull Seq2seq model [H. Mao et al, Resource Management with Deep Reinforcement Learning, ACM Workshop on Hot Topics in Networks, 2016] Schedule the job to i-th slot Policy gradient
  • 35. Neural network rewriter st Sample 𝒈 𝒕 ∼ 𝑺𝑷 𝒈 𝒕 , 𝒈 𝒕 ⊂ 𝒔 𝒕 Sample 𝒂 𝒕 ∼ 𝑹𝑺 𝒈 𝒕 𝒔 𝒕+𝟏 = 𝒇(𝒔 𝒕, 𝒈 𝒕, 𝒂 𝒕) Input Encoder Score Predictor Rule Selector [X. Chen and Y. Tian, Learning to Progressively Plan, submitted to ICLR 2019] Is it simpler to improve the solution progressively?
  • 36. Job Scheduling Scheduling 1 2 3 T=2, S=1 T=3, S=2 T=1, S=3 0 1 2 3 1 32 4 5 6 0 1 2 3 1 2 3 1 32 4 5 6 1 2 3 Graph representationJobs 10 Resource 1 Resource 2 time time Slow down Slow down
  • 37. Job Scheduling g2 0.1 g4 0.7 g5 0.2 Input Encoder Score Predictor Rule Selector 0.1 -0.3 0.2 DAG-LSTM Embedding 0 1 5 3 42 FC FC FC Softmax St at St+1 0 1 5 3 42 -0.2 0.5 𝐠 𝐭 0 1 5 4 32
  • 38. Job Scheduling #Resources 2 5 10 20 Shortest Job First 4.80 5.83 5.58 5.00 Shortest First Search 4.25 5.05 5.54 4.98 DeepRM 2.81 6.52 9.20 10.18 Neural Rewriter (Ours) 2.80 3.36 4.50 4.63 Optim (LB) 2.57 3.02 4.08 4.26 Earliest Job First (UB) 11.11 13.62 22.13 24.23
  • 39. Expression Simplification a1 0 … a5 1 (Constant Reduction) … a19 0 Input Encoder Score Predictor Rule Selector Embedding Tree-LSTM FC 0 -1 0.7 FC Softmax St St+1 𝐠 𝐭 at <= min - v0 v2 v1 v1 <= min - v0 v2 v1 v1 <= min 0 v0 v2
  • 40. Expression Simplification Expr Len Reduction Tree Size Reduction Halide Rule-based Rewriter 36.13 9.68 Heuristic Search 43.27 12.09 Neural Rewriter (Ours) 46.98 13.53
  • 41. Example ≤ 5 max v 3 + 3 5 ≤ max 𝑣, 3 + 3 ≤ 5 v 3 max + + 33 ≤ 5 v 3 + ≤ 5 3 3 + OR ≤ 5 v 3 + True OR True
  • 42. Future Directions Multi-Agent Admiral (General) Captain Lieutenant Hierarchical RL RL Systems Model-based RL Policy Optimization Model Estimation RL applications RL for Optimization
  • 43. How to do well in Reinforcement Learning? Experience on (distributed) systems Strong math skills Strong coding skills Parameter tuning skills

Notas do Editor

  1. RL is one of the methods that might come to rescue. The basic idea of RL is very simple. We have an agent who perceives the state from the environment, takes an action and receives a reward. The environment receives the action, changes its internal state and repeats. With virtual environments, we could potentially get infinite amount of data to train our models. Since then, deep reinforcement learning has make substantial progress in different kind of games, including Atari games, Go, DoTA 2, etc. Then the question is, what is the next step? In this talk, I am going to talk about a few recent works that mainly explore the power of planning in reinforcement learning setting.
  2. 1.
  3. I will start with one example why planning is important.
  4. We all know that last year, DeepMind has released a paper in Nature about AlphaGoZero, which learns super-human Go bot without any human knowledge. The idea is very simple, starting from random generated models, we use planning algorithms like MCTS to find the best moves in each step, and save the selfplay into a replay buffer. Then we update the models based on the self-replay buffer. Then repeat.
  5. This approach is surprisingly simple, yet it gives very strong performance. 3 days with thousands of TPUs and the model can already beat AlphaGo Lee, after 40 days it defeats AlphaGo Master.
  6. Inspired by this interesting and exciting results, we thus try reproducing AlphaGoZero with our recent published ELF platform. The goal is to understand why such a simple approach yields a strong performance.
  7. ELF is an Extensive, Lightweight and Flexible framework that makes a practical RL system easy and manageable. For this, ELF puts all the implementation and design details into the C++ side, leaving the Python side a simple for-loop, as advertised in the text book. Moreover, each time the python side returns a batch for a neural network model to operate on, improving its efficiency.
  8. The key idea of ELF is to achieve dynamic batching from multiple game instances. In this platform, many games are running simultaneously. From time to time each have requests to call deep learning API (like PyTorch) for computing the next states and actions, or store the game history into replay buffers. ELF provides a dynamic batching interface so that requests can be automatically batched for high thoroughput.
  9. We improve our framework to support distributed setting, and putting AGZ and AZ together.
  10. We then releases ELF OpenGo, reproduces AZ and AGZ.
  11. One question is that, ok, what can we learn from it?
  12. One interesting observation from this experiment is that the knowledge is propagated backwards. You start to see meaningful moves from the end of the game, where there is reward signal. Over iterations, the meaningful moves are backpropagated all the way to the opening of the game. The right figure shows that even with well-trained models, still with the additional planning the strength is much higher. In fact, the strength of the bot is always constrained by the capacity of the model. We are working on an arXiv paper to discuss about the training details.
  13. Not only training AGZ/AZ requires planning. For general reinforcement learning, during training, planning is also very important. This not only includes 1 step look forward, but also including T-step looking forward as well as more complicated planning mechanism like tree search. A large portion of DRL is to try pushing the results given by planning into neural networks.
  14. Planning is also very important for general game AI. Chess or other games are other important examples.
  15. Traditional