SlideShare uma empresa Scribd logo
1 de 183
Baixar para ler offline
Full Stack Deep Learning
Research Directions
Pieter Abbeel, Sergey Karayev, Josh Tobin
Number of arxiv papers submitted in AI categories
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[source: http://data-mining.philippe-fournier-viger.com/too-many-machine-learning-papers]
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding!)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
n Sampling of research directions
n Overall research theme
n Research <> Real-World gap
n How to keep up
Outline
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
Supervised Learning
n Huge successes
n But requirement: lots of labeled data
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh
Tobin
In Contrast: Humans Need Only One Example
hoverboard onewheelplus
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
What’s this one?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
What’s this one?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
What’s this one?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
What’s this one?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n We have a strong prior notion of what object categories are
like
Why Can We Do This?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
How to equip a machine with such prior?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Starting observation:
n Computer vision practice:
n Train on ImageNet [Deng et al. ’09]
n Fine-tune on actual task
works really well! [Decaf: Donahue et al. ’14; …]
n Questions:
n How to generalize this to behavior learning?
n And can we explicitly train end-to-end for being maximally ready for
efficient fine-tuning?
Model-Agnostic Meta-Learning (MAML)
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Key idea: End-to-end learning of parameter vector θ that
is good init for fine-tuning for many tasks
fine-tuning:
Model-Agnostic Meta-Learning (MAML)
train data for
new task
pretrained parameters
MAML training:
MAML test time:
[Finn, Abbeel, Levine ICML 2017]
finetuned parameters
finetuned parameters ✓(i)0
min
✓
X
tasks i
L
(i)
val(✓ ↵r✓L
(i)
train(✓))
optimal parameter
vector for task i
parameter vector
being meta-learned
Model-Agnostic Meta-Learning (MAML)
Pieter Abbeel -- UC Berkeley | Covariant.AI | BerkeleyOpenArms.org
min
✓
X
tasks i
L
(i)
val(✓ ↵r✓L
(i)
train(✓))
Few-Shot Learning: Classification
Algorithm:
meta-training
meta-testing
training data test set
training
classes
held-out
classes
…
…
…
…
Omniglot dataset (Lake et al. ’15), Mini-Imagenet dataset (Vinyals et al. ‘16)
diagram adapted from Ravi & Larochelle ‘17
Omniglot Few-Shot Classification
[1] Koch ’15 [2] Vinyals et al. ’16
[3] Edwards & Storkey ’17 [4] Kaiser et al. ’17
Classification Error
Omniglot Dataset: 1200 training classes, 423 test classes
Mini-ImageNet Few-Shot Classification
[1] Vinyals ’16 [2] Ravi & Larochelle ’17
Mini-Imagenet Dataset: 64 training classes, 24 test classes
Classification Error
Meta Learning for Classification
Task distribution: different classification datasets (input: images, output: class labels)
n Hochreiter et al., (2001) Learning to learn using gradient descent
n Younger et al., (2001), Meta learning with back propagation
n Koch et al., (2015) Siamese neural networks for one-shot image recognition
n Santoro et al., (2016) Meta-learning with memory-augmented neural networks
n Vinyals et al., (2016) Matching networks for one shot learning
n Edwards et al., (2016) Towards a Neural Statistician
n Ravi et al., (2017) Optimization as a model for few-shot learning
n Munkhdalai et al., (2017) Meta Networks
n Snell et al., (2017) Prototypical Networks for Few-shot Learning
n Shyam et al., (2017) Attentive Recurrent Comparators
n Finn et al., (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
n Mehrotra et al., (2017) Generative Adversarial Residual Pairwise Networks for One Shot Learning
n Mishra et al., (2017) Meta-Learning with Temporal Convolutions
n Li et al., (2017) Meta-SGD: Learning to Learn Quickly for Few Shot Learning
n Finn and Levine, (2017) Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate
any Learning Algorithm
n Grant et al., (2017) Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta Learning for Classification
Task distribution: different classification datasets (input: images, output: class labels)
n Hochreiter et al., (2001) Learning to learn using gradient descent
n Younger et al., (2001), Meta learning with back propagation
n Koch et al., (2015) Siamese neural networks for one-shot image recognition
n Santoro et al., (2016) Meta-learning with memory-augmented neural networks
n Vinyals et al., (2016) Matching networks for one shot learning
n Edwards et al., (2016) Towards a Neural Statistician
n Ravi et al., (2017) Optimization as a model for few-shot learning
n Munkhdalai et al., (2017) Meta Networks
n Snell et al., (2017) Prototypical Networks for Few-shot Learning
n Shyam et al., (2017) Attentive Recurrent Comparators
n Finn et al., (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
n Mehrotra et al., (2017) Generative Adversarial Residual Pairwise Networks for One Shot Learning
n Mishra et al., (2017) Meta-Learning with Temporal Convolutions
n Li et al., (2017) Meta-SGD: Learning to Learn Quickly for Few Shot Learning
n Finn and Levine, (2017) Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm
n Grant et al., (2017) Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
n Raghu et al, (2019) Rapid learning or feature reuse? towards understanding the effectiveness of maml
n Lake et al, (2019) The Omniglot Challenge: a 3-year progress report
n Rajeswaran et al (2019) Meta-learning with implicit gradients
n Wang et al (2019) SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Few-Shot Learning for Grading
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Tang*, Karayev* et al, 2018 (in prep.)]
Meta Learning for Optimization
Task distribution: different neural networks, weight initializations, and/or
different loss functions
n Bengio et al., (1990) Learning a synaptic learning rule
n Naik et al., (1992) Meta-neural networks that learn by learning
n Hochreiter et al., (2001) Learning to learn using gradient descent
n Younger et al., (2001), Meta learning with back propagation
n Andrychowicz et al., (2016) Learning to learn by gradient descent by gradient descent
n Chen et al., (2016) Learning to Learn for Global Optimization of Black Box Functions
n Wichrowska et al., (2017) Learned Optimizers that Scale and Generalize
n Ke et al., (2017) Learning to Optimize Neural Nets
n Wu et al., (2017) Understanding Short-Horizon Bias in Stochastic Meta-Optimization
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta Learning for Generative Models
Task distribution: different unsupervised datasets (e.g. collection of images)
n Rezende et al., (2016) One-Shot Generalization in Deep Generative Models
n Edwards et al., (2016) Towards a Neural Statistician
n Bartunov et al., (2016) Fast Adaptation in Generative Models with Generative Matching
Networks
n Bornschein et al., (2017) Variational Memory Addressing in Generative Models
n Reed et al., (2017) Few-shot Autoregressive Density Estimation: Towards Learning to
Learn Distributions
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
Reinforcement Learning (RL)
Robot +
Environment
max
✓
E[
HX
t=0
R(st)|⇡✓]
[Image credit: Sutton and Barto 1998] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning (RL)
Robot +
Environment
n Compared to
supervised learning,
additional challenges:
n Credit assignment
n Stability
n Exploration
[Image credit: Sutton and Barto 1998]
max
✓
E[
HX
t=0
R(st)|⇡✓]
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL Success Stories
DQN Mnih et al, NIPS 2013 / Nature 2015
MCTS Guo et al, NIPS 2014; TRPO Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015; A3C Mnih et al,
ICML 2016; Dueling DQN Wang et al ICML 2016; Double DQN van Hasselt et al, AAAI 2016; Prioritized
Experience Replay Schaul et al, ICLR 2016; Bootstrapped DQN Osband et al, 2016; Q-Ensembles Chen et al,
2017; Rainbow Hessel et al, 2017; Accelerated Stooke and Abbeel, 2018; …
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL Success Stories
AlphaGo Silver et al, Nature 2015
AlphaGoZero Silver et al, Nature 2017
Tian et al, 2016; Maddison et al, 2014; Clark et al, 2015
AlphaZero Silver et al, 2017
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Super-human agent on Dota2 1v1, enabled by
n Reinforcement learning
n Self-play
n Enough computation
Deep RL Success Stories
[OpenAI, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL Success Stories
^ TRPO Schulman et al, 2015 + GAE Schulman et al, 2016
See also: DDPG Lillicrap et al 2015; SVG Heess et al, 2015; Q-Prop Gu et al, 2016; Scaling up ES Salimans et
al, 2017; PPO Schulman et al, 2017; Parkour Heess et al, 2017;
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL Success Stories
[Peng, Abbeel, Levine, van de Panne, 2018]
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh
Tobin
Deep RL Success: Dynamic Animation
[Peng, Abbeel, Levine, van de Panne, 2018]
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh
Tobin
Deep RL Success Stories
Guided Policy Search Levine*, Finn*, Darrell, Abbeel, JMLR 2016
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Rigid rods connected by elastic cables
n Controlled by motors that extend / contract
cables
n Properties:
n Lightweight
n Low cost
n Capable of withstanding significant impact
n NASA investigates them for space exploration
n Major challenge: control
Tensegrity Robotics: NASA SuperBall
[Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Mastery? Yes
Deep RL (DQN) vs. Human
Score: 18.9 Score: 9.3
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
But how good is the learning?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Humans vs. DDQN
Black dots: human play
Blue curve: mean of human play
Blue dashed line: ‘expert’ human play
Red dashed lines:
DDQN after 10, 25, 200M frames
(~ 46, 115, 920 hours)
[Tsividis, Pouncy, Xu, Tenenbaum, Gershman, 2017]
Humans after 15 minutes tend to outperform DDQN after 115 hours
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
How to bridge this gap?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Starting Observations
n TRPO, DQN, A3C, DDPG, PPO, Rainbow, … are fully general RL
algorithms
n i.e., for any environment that can be mathematically defined, these
algorithms are equally applicable
n Environments encountered in real world
= tiny, tiny subset of all environments that could be defined
(e.g. they all satisfy our universe’s physics)
Can we develop “fast” RL algorithms that take advantage of this?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
Agent
Environment
asr
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
RL Algorithm
Policy
Environment
aor
RL Agent
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
RL Algorithm
Policy
Environment A
asr
…
RL Agent
RL Algorithm
Policy
Environment B
asr
RL Agent
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
…
RL Algorithm
Policy
Environment A
aor
RL Algorithm
Policy
Environment B
aor
RL Agent RL Agent
Traditional RL research:
• Human experts develop
the RL algorithm
• After many years, still
no RL algorithms nearly
as good as humans…
Alternative:
• Could we learn a better
RL algorithm?
• Or even learn a better
entire agent?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
Meta RL
Algorithm
Environment B
…
Meta-training environments
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
ar,o
Meta RL
Algorithm
Environment B
…
Environment F
Meta-training environments
Testing environments
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
ar,o
Meta RL
Algorithm
Environment B
…
Environment G
Meta-training environments
Testing environments
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
ar,o
Meta RL
Algorithm
Environment B
…
Environment H
Meta-training environments
Testing environments
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Formalizing Learning to Reinforcement Learn
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
max
✓
EM E⌧
(k)
M
" KX
k=1
R(⌧
(k)
M ) | RLagent✓
#
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Formalizing Learning to Reinforcement Learn
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
M : sample MDP
⌧
(k)
M : k’th trajectory in MDP M
max
✓
EM E⌧
(k)
M
" KX
k=1
R(⌧
(k)
M ) | RLagent✓
#
Meta-train:
max
✓
X
M2Mtrain
E⌧
(k)
M
" KX
k=1
R(⌧
(k)
M ) | RLagent✓
#
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n RLagent = RNN = generic computation architecture
n different weights in the RNN means different RL algorithm and prior
n different activations in the RNN means different current policy
n meta-train objective can be optimized with an existing (slow) RL algorithm
Representing RLagent✓
max
✓
X
M2Mtrain
E⌧
(k)
M
" KX
k=1
R(⌧
(k)
M ) | RLagent✓
#
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Multi-Armed Bandits
n Multi-Armed Bandits setting
n Each bandit has its own distribution
over pay-outs
n Each episode = choose 1 bandit
n Good RL agent should explore
bandits sufficiently, yet also exploit
the good/best ones
n Provably (asymptotically) optimal
RL algorithms have been invented
by humans: Gittins index, UCB1,
Thompson sampling, …
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016]
Evaluation: Multi-Armed Bandits
We consider Bayesian evaluation setting. Some of these prior works also have adversarial
guarantees, which we don’t consider here.
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Task – reward based on target running direction + speed
Evaluation: Locomotion – Half Cheetah
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Task – reward based on target running direction + speed
n Result of meta-training = a single agent (the “fast RL agent”),
which masters each task almost instantly within 1st episode
Evaluation: Locomotion – Half Cheetah
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Task – reward based on target running direction + speed
Evaluation: Locomotion – Ant
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Task – reward based on target running direction + speed
n Result of meta-training = a single agent (the “fast RL agent”),
which masters each task almost instantly within 1st episode
Evaluation: Locomotion – Ant
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Visual Navigation
Agent’s view Maze
Agent input: current image
Agent action: straight / 2 degrees left / 2 degrees right
Map just shown for our purposes, but not available to agent
Related work: Mirowski, et al, 2016; Jaderberg et al, 2016; Mnih et al, 2016; Wang et al, 2016
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Visual Navigation
After learning-to-learnBefore learning-to-learn
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016]
Agent input: current image
Agent action: straight / 2 degrees left / 2 degrees right
Map just shown for our purposes, but not available to agent
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016]
Meta-Learning Curves
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Simple Neural Attentive Meta-Learner (SNAIL)
[Mishra, Rohaninejad, Chen, Abbeel, 2017]
Other Architecture
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Like RL2 but:
replace the LSTM with
dilated temporal
convolution (like wavenet)
+ attention
Simple Neural Attentive Meta-Learner
[Wavenet: van den Oord et al, 2016]
[Attention-is-all-you-need: Vaswani et al, 2017]
[Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Bandits
[Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Vision-based Navigation in Unknown Maze
[Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Agent Dropped in New Maze
[Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta Learning for RL
Task distribution: different environments
n Schmidhuber. Evolutionary principles in self-referential learning. (1987)
n Wiering, Schmidhuber. Solving POMDPs with Levin search and EIRA. (1996)
n Schmidhuber, Zhao, Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. (MLJ 1997)
n Schmidhuber, Zhao, Schraudolph. Reinforcement learning with self-modifying policies (1998)
n Zhao, Schmidhuber. Solving a complex prisoner’s dilemma with self-modifying policies. (1998)
n Schmidhuber. A general method for incremental self-improvement and multiagent learning. (1999)
n Singh, Lewis, Barto. Where do rewards come from? (2009)
n Singh, Lewis, Barto. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective (2010)
n Niekum, Spector, Barto. Evolution of reward functions for reinforcement learning (2011)
n Duan et al., (2016) RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
n Wang et al., (2016) Learning to Reinforcement Learn
n Finn et al., (2017) Model-Agnostic Meta-Learning (MAML)
n Mishra, Rohinenjad et al., (2017) Simple Neural AttentIve meta-Learner
n Frans et al., (2017) Meta-Learning Shared Hierarchies
n Stadie et al (2018) Some considerations on learning to explore via meta-reinforcement learning
n Gupta et al (2018) Meta-reinforcement learning of structured exploration strategies
n Nagabandi*, Clavera*, et al (2018) Learning to adapt in dynamic, real-world environments through meta-reinforcement learning
n Clavera et al (2018) Model-based reinforcement learning via meta-policy optimization
n Botvinick et al (2019) Reinforcement learning: fast and slow
n Rakelly et al (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables
n … FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Benchmarks / Environments
Vizdoom
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deepmind Lab mujoco
Probably need more / richer environments…
OpenAI Retro Contest
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-World
[Yu, Quillen, he, Julian, Hausman, Finn, Levine, 2019] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
Imitation Learning in Robotics
[Abbeel et al. 2008] [Kolter et al. 2008] [Ziebart et al. 2008]
[Schulman et al. 2013] [Finn et al. 2016]
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Imitation Learning
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Imitation Learning
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
One-Shot Imitation Learning
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017]
One-Shot Imitation Learning
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
One-Shot Imita{on Learning
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Learning a One-Shot Imitator
One-shot
imitator
[Figure credit: Bradly Stadie]
Demo 1 of task i Demo 2 of task i
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Proof-of-concept: Block Stacking
n Each task is specified by a
desired final layout
n Example: abcd
n “Place c on top of d,
place b on top of c,
place a on top of b.”
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Meta-learning loss:
n Task loss = behavioral cloning loss: [Pomerleau’89,Sammut’92]
Learning a One-Shot Imitator with MAML
[Finn*, Yu*, Zhang, Abbeel, Levine, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Object placing from pixels
subset of
training objects
held-out test objects
input demo resulting policy
[real-time execution]
(via teleoperation)
Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17
n Recall, meta-learning loss:
n Key idea: different loss for “val” and “train”
Can We Learn from Just Video?
[Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018] Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com
min
✓
X
tasks i
L
(i)
val(✓ ↵r✓L
(i)
train(✓)) with
Lval(✓) =
X
t
k⇡✓(ot) a⇤
t k2
Ltrain(✓) =
X
t
kf✓(ot) o⇤
t+1k2
“val” only needed during meta-training,
and continues to assume access to action taken
“train” doesn’t require access to action taken,
so at meta-testing video suffices
n Recall, meta-learning loss:
Can We Learn from Just Video?
Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com
min
✓
X
tasks i
L
(i)
val(✓ ↵r✓L
(i)
train(✓)) with
Lval(✓) =
X
t
k⇡✓(ot) a⇤
t k2
Ltrain(✓) =
X
t
kf✓(ot) o⇤
t+1k2
…ot
Lval(✓) =
X
t
k⇡✓(ot) a⇤
t k2
Ltrain(✓) =
X
t
kf✓(ot) o⇤
t+1k2
⇡✓(ot)
f✓(ot)[Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018]
Can We Learn from Just Video?
[Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018] Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com
…ot
Lval(✓) =
X
t
k⇡✓(ot) a⇤
t k2
Ltrain(✓) =
X
t
kf✓(ot) o⇤
t+1k2
Issue: direct pixel level prediction not a great loss function…
Can We Learn from Just Video?
[Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018] Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com
…ot
Lval(✓) =
X
t
k⇡✓(ot) a⇤
t k2
L (f✓(ot), o⇤
t+1)
Ltrain(✓) =
X
t
kf✓(ot) o⇤
t+1k2
L can be thought of as (learned) Discriminator in GANs
One-shot imitation from human video
input human demo resulting policy
[Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine ’18]
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
Compared to the real world, simulated data collection is…
n Less expensive
n Faster / more scalable
n Less dangerous
n Easier to label
Motivation for Simulation
How can we learn useful real-
world skills in the simulator?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Approach 1 – Use Realistic Simulated Data
Simulation Real
world
Carefully match the simulation to
the world [1,2,3,4]
Augment simulated data with real
data [5,6]
GTA V
Real
world
[5] Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen
Koltun. Playing for data: Ground truth from computer games
(2016)
[6] Bousmalis et al. Using simulation and domain adaptation to
improve efficiency of robotic grasping (2017)
[1] Stephen James, Edward Johns. 3d simulation for robot arm control
with deep q-learning (2016)
[2] Johns, Leutenegger, Davision. Deep learning a grasp function for
grasping under gripper pose uncertainty (2016)
[3] Mahler et al, Dex-Net 3.0 (2017)
[4] Koenemann et al. Whole-body model-predictive control applied to
the HRP-2 humanoid. (2015) FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Approach 2 – Domain Confusion / Adaptation
Andrei A Rusu, Matej Vecerik, Thomas Rotho ̈rl,
Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-
to-real robot learning from pixels with progressive
nets. arXiv preprint arXiv:1610.04286, 2016.
Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko,
Trevor Darrell. Deep Domain Confusion: Maximizing
for Domain Invariance. arXiv preprint arXiv:1412.3474,
2014.
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Approach 3 – Domain Randomization
If the model sees enough simulated variation, the real world
may look like just the next simulator
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Domain Randomization
n Quadcopter collision
avoidance
n ~500 semi-realistic
textures, 12 floor plans
n ~40-50% of 1000m
trajectories are
collision-free
(cad)ˆ2 rl: Real Single-Image Flight Without a Single Real Image.
[3] Fereshteh Sadeghi and Sergey Levine. (cad)ˆ2 rl: Real single-image flight without a single real image. arXiv preprint
arXiv:1611.04201, 2016. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Domain Randomization for Pose Estimation
n Precise object pose
localization
n 100K images with simple
randomly generated
textures
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
How does it work? More Data = Better
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
More Textures = Better
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
Pre-Training is not Necessary
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Domain Randomization for Grasping
[Tobin, Zaremba, Abbeel, 2017]
Hypothesis: Training on a diverse array of procedurally generated objects can
produce comparable performance to training on realistic object meshes.
[see also: Levine et al and Goldberg et al] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: How Good Are Random Shapes?
[Tobin, Zaremba, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Real Robot
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh T
How a Full Hand?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
OpenAI: Dactyl
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Trained with domain randomization
[OpenAI, 2018]
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Trained with domain randomization
[OpenAI, 2019]
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
Current Practice:
ML Solution =
Data + Computation + ML Expertise
[Slide from: Quoc Le]
Current Practice:
ML Solution =
Data + Computation + ML Expertise
But can we turn this into:
Solution =
Data + 100X Computation
[Slide from: Quoc Le]
Importance of architectures for Vision
● Designing neural network architectures is hard
● Lots of human efforts go into tuning them
● Can we try and learn good architectures automatically?
Canziani et al, 2017[Slide from: Quoc Le]
Neural Architecture Search
● We can specify the structure and connectivity of a neural
network by a configuration string
○ [“Filter Width: 5”, “Filter Height: 3”, “Num Filters: 24”]
● Use a RNN (“Controller”) to generate this string (the “Child
Network”)
● Train the Child Network to see how well it performs on a
validation set
● Use reinforcement learning to update the Controller based on
the accuracy of the Child Network
[Slide from: Quoc Le]
Controller: proposes Child
Networks
Train & evaluate Child
Networks
20K
times
Iterate to
find the
most
accurate
Child
Network
[Slide from: Quoc Le]
Neural Architecture Search
Neural Architecture Search for Convolutional Networks
Controller RNNSoftmax
classifier
Embedding
[Slide from: Quoc Le]
Training with REINFORCE
Accuracy of
architecture on held-
out dataset
Architecture predicted by the controller
RNN viewed as a sequence of actions
Parameters of
Controller RNN
Number of models in
minibatch
[Slide from: Quoc Le]
Distributed Training
[Slide from: Quoc Le]
Neural Architecture Search for CIFAR-10
[Slide from: Quoc Le]
Neural Architecture Search for Language Modeling
[Slide from: Quoc Le]
Performance of cell on ImageNet
Updated June/2018: AmoebaNet (found by evolution in
the same search space) is slightly better[Slide from: Quoc Le]
Neural Architecture Search
● Key idea: Architecture is a string
● Get an RNN to generate the string, and train the RNN with the
reward associated with the string
● This idea can be applied to other components of ML:
○ Equation for an optimizer is a string
○ Activation function is a string, e.g., z = 1/(1 + e^(-x))
[Slide from: Quoc Le]
Data
processing
Machine
Learning ModelData
Focus of machine
learning research
[Slide from: Quoc Le]
Data
processing
Machine
Learning ModelData
Focus of machine
learning research
Very important but
manually tuned
[Slide from: Quoc Le]
Data Augmentation
[Slide from: Quoc Le]
Controller: proposes
augmentation policy
Train & evaluate models with
the augmentation policy
20K
times
Iterate to
find the
most
accurate
policy
Data Augmentation
[Slide from: Quoc Le]
AutoAugment: Example Policy
Probability of
applying Magnitude
[Slide from: Quoc Le]
AutoAugment
Model No data aug Standard data-aug AutoAugment
Model No data aug Standard data-aug AutoAugment
[Slide from: Quoc Le]
ImageNet - Top5 error rate
Human (Andrej Karpathy) 5.1
Our best model 3.5
Model No data augmentation Standard data augmentation AutoAugment
[Slide from: Quoc Le]
References
● Neural Architecture Search with Reinforcement Learning. Barret Zoph and
Quoc V. Le. ICLR, 2017
● Learning Transferable Architectures for Scalable Image Recognition. Barret
Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le. CVPR, 2018
● Efficient Neural Architecture Search via Parameter Sharing. Hieu
Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean
● AutoAugment: Learning Augmentation Policies from Data. Ekin D. Cubuk,
Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le. Arxiv, 2018
● Searching for Activation Functions. Prajit Ramachandran, Barret Zoph, Quoc
Le. ICLR Workshop, 2018
[Slide from: Quoc Le]
Population-based Auto-Augment aka
Auto-Augment with 1000x less compute
[Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi (Peter) Chen, ICML 2019]
Population-based Auto-Augment aka
Auto-Augment with 1000x less compute
[Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi (Peter) Chen, ICML 2019]
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
n Supervised learning is limited by amount of labeled data
n Can we use unlabeled data instead?
n Main ideas:
n Learn network that embeds the data
n Later supervised learning on embedding
n (possibly while also fine-tuning the embedding network)
n Learning weights:
n Pretraining of NN
n Auxiliary loss
Unsupervised Learning Rationale
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Variational AutoEncoders (VAEs)
n Generative Adversarial Networks
n Exact Likelihood Models (autoregressive [pixelcnn, charrnn],
realnvp, NICE)
n “puzzles” (gray2rgb, where-does-this-patch-go, rotation-
prediction, contrastive-predictive-coding, etc…)
Main Families of Models
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Generative Models
n “What I cannot create, I do not understand.”
n Ability to generate data that look real entails some form of
understanding
[Radford, Metz & Chintala, ICLR 2016]
Initial “Images”
[Salimans, Goodfellow, Zaremba, Cheung, Radford & Chen, NIPS 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Learning to Generate Images
[Salimans, Goodfellow, Zaremba, Cheung, Radford & Chen, NIPS 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
InfoGAN: Concepts Learning
[Chen, Duan, Houthooft, Schulman, Sutskever & Abbeel, NIPS 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Image Generation
[thispersondoesnotexist.com]
Image Generation
[thispersondoesnotexist.com]
Image Generation
[thispersondoesnotexist.com] Pieter Abbeel -- UC Berkeley | Covariant.AI
Unsupervised Image to Image
Pieter Abbeel -- UC Berkeley | Covariant.AI[CycleGAN: Zhu, Park, Isola & Efros, 2017]
Everybody Dance Now
Pieter Abbeel -- UC Berkeley | Covariant.AI[Everybody Dance Now: Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alyosha Efros (UC Berkeley) 2018]
Neural “Photoshop”
[Brock, Lim, Ritchie & Weston, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Neural Compression 2
[Rippel & Bourdev, 2017]
JPEG JPEG2000 WaveOne
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Data-Efficient Image Recognition with Contrastive Predictive Coding
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Olivier Henaff, Ali Razavi, Carl Doersch, Ali Eslami, Aaron van den Oord, 2019]
Data-Efficient Image Recognition with Contrastive Predictive Coding
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Olivier Henaff, Ali Razavi, Carl Doersch, Ali Eslami, Aaron van den Oord, 2019]
Linear separability in feature space from unsupervised learning
Data-Efficient Image Recognition with Contrastive Predictive Coding
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Olivier Henaff, Ali Razavi, Carl Doersch, Ali Eslami, Aaron van den Oord, 2019]
Contrastive-Predictive Coding
Beyond Images: Amazon Review
[Radford et al, 2017]
This product does what it is
supposed to. I always keep
three of these in my kitchen
just in case ever I need a
replacement cord.
Great little item. Hard to put
on the crib without some
kind of embellishment. My
guess is just like the screw
kind of attachment I had.
Pieter Abbeel -- UC Berkeley | Covariant.AI
Prompt: A train carriage containing controlled nuclear materials was stolen in
Cincinnati today. Its whereabouts are unknown.
GPT-2: The incident occurred on the downtown train line, which runs from Covington
and Ashland stations. In an email to Ohio news outlets, the U.S. Department of
Energy said it is working with the Federal Railroad Administration to find the
thief.
‘The theft of this nuclear material will have significant negative consequences on
public and environmental health, our workforce and the economy of our nation,’
said Tom Hicks, the U.S. Energy Secretary, in a statement. ‘Our top priority is to
secure the theft and ensure it doesn’t happen again.’
The stolen material was taken from the University of Cincinnati’s Research
Triangle Park nuclear research site, according to a news release from Department
officials.
The Nuclear Regulatory Commission did not immediately release any information.
According to the release, the U.S. Department of Energy’s Office of Nuclear
Material Safety and Security is leading that team’s investigation.
‘The safety of people, the environment and the nation’s nuclear stockpile is our
highest priority,’ Hicks said. ‘We will get to the bottom of this and make no
excuses.’
Text Generation (OpenAI’s GPT-2)
Pieter Abbeel -- UC Berkeley | Covariant.AI
Prompt: Recycling is good for the world, no, you could not be more
wrong.
GPT-2: Recycling is NOT good for the world. It is bad for the
environment, it is bad for our health, and it is bad for our
economy. I’m not kidding. Recycling is not good for the
environment. It is destructive to the earth and it is a major
contributor to global warming. Recycling is not good for our
health. It contributes to obesity and diseases like heart
disease and cancer. Recycling is bad for our economy. It
increases the cost of a product, and in turn, the price of
everything that is made with that product. Recycling is not good
for our nation. We pay a tremendous price for the privilege of
having the world’s most advanced and efficient recycling system.
Recycling is a huge, colossal waste of time, energy, money, and
resources.
Text Generation (OpenAI’s GPT-2)
Pieter Abbeel -- UC Berkeley | Covariant.AI
Text Generation (OpenAI’s GPT-2)
Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope[Radford et al, 2019]
Unsupervised Sentiment Neuron
[Radford et al, 2017] Pieter Abbeel -- UC Berkeley | Covariant.AI
Benchmarks
Pieter Abbeel -- UC Berkeley | Covariant.AI
Benchmarks
Pieter Abbeel -- UC Berkeley | Covariant.AI
Benchmarks
Pieter Abbeel -- UC Berkeley | Covariant.AI
Scaling
Pieter Abbeel -- UC Berkeley | Covariant.AI
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
n Few-Shot Learning
n Reinforcement Learning
n Imitation Learning
n Domain Randomization
n Architecture Search
n Unsupervised Learning
Many Exciting Directions in AI
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Lifelong Learning
n Bias in ML (avoiding)
n Long Horizon Reasoning
n Safe Learning
n Value Alignment
n Planning + Learning
n …
n Sampling of research directions
n Overall research theme
n Research <> Real-World gap
n How to keep up
Outline
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
datasets
models
ImageNet
(~10^6 images)
Caltech 101
(~10^4 images)
(how large they are)
Google/FB
Images on the web
(~10^9+ images)
(how well they work)
Image Features
(SIFT etc., learning linear
classifiers on top)
ConvNets
(learn the features,
Structure hard-coded)
2013
2017
90s - 2012
CodeGen
(learn the weights
and the structure)
projection
Hard Coded
(edge detection etc.
no learning)
Lena
(10^0; single image)
70s - 90s
possibilityfrontier
Zone of “not going to happen.”
Pascal VOC
(~10^5 images)
In Computer Vision...
[Slide credit: Andrej Karpathy]
Learning-to-Learn
environments
agents
MuJoCo/ATARI
/Universe
(~few dozen envs)
Cartpole etc.
(and bandits, gridworld,
...few toy tasks)
(how much they measure / incentivise general intelligence)
more multi-agent / non-stationary / real-world-like.
(how impressive they are)
more learning.
more compute.
Value Iteration etc.
(~discrete MDPs, linear
function approximators)
DQN, PG
(deep nets, hard-coded
various tricks)
2013
2016
RL^2
(Learn the RL
algorithm.
structure fixed.)
90s - 2012
CodeGen
(learn structure and
learning algorithm)
projection
(simple multi-agent envs)
Digital worlds
(complex multi-agent envs)
Reality
Hard Coded
(LISP programs, no learning)
BlocksWorld
(SHRDLU etc)
70s - 90s
possibilityfrontier
Zone of “not going to happen.”
[Slide adapted from Andrej Karpathy]
Learning-to-Learn
Compute Increasing Rapidly
[20-April-2018]
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Compute per Major Result
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh TobinSource: OpenAI
Implications for research agenda?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
DATA
COMPUTE
HUMAN INGENUITY
Key Enablers for AI Progress?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Problem Territory 1 Problem Territory 2
E.g., Deep Learning to Learn
n Each PhD student approximately uses:
n 4-8 GPU machine (local development)
n 2-5K / month cloud compute credits (big thanks to Google and Amazon!)
n Trending:
n Towards 10-20 GPUs per student (housed at central campus facility)
n Hoping to get to 10-20K / month cloud compute credits (anyone?)
How much compute do we use at Berkeley Robot Learning Lab?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Sampling of research directions
n Overall research theme
n Research <> Real-World gap
n How to keep up
Outline
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Research:
n 0 to 1
n If achieving 70% on a
benchmark, move on to
the next one
Research <<<<GAP>>>> Real-World
n Real-World:
n Even 90% is not enough!
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Typical station in factory or warehouse:
n 500 - 2,000 task completions per hour
n Now, imagine 90% success of the robot
n 500 per hour à 50 failures that need fixing up
n 2000 per hour à 200 failures that need fixing up
n Fixing up failures typically takes more time than doing task
n --> robot with 50 failures per hour causes more work than it saves
n Realistically, value created at 1-2 human interventions per hour
n 500 per hour, max 2 interventions à 99.6% success rate
n 2,000 per hour, max 2 interventions --> 99.9% success rate
Real-World Reliability Requirements
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Ok, fine, we need beyond 90%
Pieter Abbeel -- UC Berkeley | Covariant.AI
So what, let’s just:
- larger network
- more data
n can NOT ignore the long tail of real world scenarios
n can NOT ignore that real world always changes
n can NOT ignore it’s important to know when you don’t know
Subtleties in going beyond 90%
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n ImageNet
n 1000 categories
n 1000 example images
for each category
Real World: High Variability / Long Tail
n Real world
n Millions of SKUs
n Transparency..
n …
àsignificant AI R&D needed to meet real-world demands
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Deep RL Research
n Simulation
n Train ahead of time
Real World: Always changing
n Real world
n Need to adapt on the
fly…
àsignificant AI R&D needed to meet real-world demands
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Real World: Need to know when don’t know
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
How feasible today?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Sampling of research directions
n Overall research theme
n Research <> Real-World gap
n How to keep up
Outline
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n How to read a paper?
n What papers to read?
n Reading group
How to Keep Up
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
How to Read a Paper?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
• Read the title, abstract, section headers, and figures
• Try and find slides or a video on the paper (these do not
have to be by the authors).
• Read the introduction (Jennifer Widom)
• What is the problem?
• Why is it interesting and important?
• Why is it hard? (e.g., why do naive approaches
fail?)
• Why hasn't it been solved before? (Or, what's
wrong with previous proposed solutions? How
does this paper differ?)
• What are the key components of this approach
and results? Any limitations?
• Skim the related work.
• Is there any related work you are familiar with?
• If so, how does this paper relate to those works?
• Skim the technical section.
• Where are the novelties?
• What are the assumptions?
• Read the experiments.
• What questions are the experiments answering?
• What questions are the experiments not
answering?
• What baselines do they compare against?
• How strong are these baselines?
• Is the experiment methodology sound?
• Do the results support their claims?
• Read the conclusion/discussion.
• Read the technical section.
• Read in “good faith” (i.e., assume the authors are
correct)
• Skip over confusing parts that don’t seem
fundamental
• If any important part is confusing, see if the
material is in class slides or prior papers
• Read the paper as you see fit.
[source: Gregory Kahn]
n How to read a paper?
n What papers to read?
n Reading group
How to Keep Up
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Just read all the papers?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106]
n Import AI Newsletter (by Jack Clark, communications director at OpenAI)
n Arxiv sanity (by Karpathy)
n Twitter
n jackclarkSF, karpathy, goodfellow_ian, hardmaru, smerity, hillbig, pabbeel, …
n AI/DL Facebook Group: active group where members post anything from news
articles to blog posts to general ML questions
n ML Subreddit
What Papers to Read?
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n How to read a paper?
n What papers to read?
n Reading group
How to Keep Up
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Even just 1 other person can already save you ~50% of your time
Reading Group
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Thank you
pabbeel@cs.berkeley.edu

Mais conteúdo relacionado

Mais procurados

Mais procurados (7)

MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
EuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
 
rips-hk-lenovo (1)
rips-hk-lenovo (1)rips-hk-lenovo (1)
rips-hk-lenovo (1)
 
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
 
Learning with Unpaired Data
Learning with Unpaired DataLearning with Unpaired Data
Learning with Unpaired Data
 

Semelhante a Research Directions - Full Stack Deep Learning

5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 

Semelhante a Research Directions - Full Stack Deep Learning (20)

Matching Network
Matching NetworkMatching Network
Matching Network
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Learning Probabilistic Relational Models
Learning Probabilistic Relational ModelsLearning Probabilistic Relational Models
Learning Probabilistic Relational Models
 
Advances in Learning with Bayesian Networks - july 2015
Advances in Learning with Bayesian Networks - july 2015Advances in Learning with Bayesian Networks - july 2015
Advances in Learning with Bayesian Networks - july 2015
 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
SCAI invited talk @EMNLP2020
SCAI invited talk @EMNLP2020SCAI invited talk @EMNLP2020
SCAI invited talk @EMNLP2020
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
 
NVivo use for PhD study
NVivo use for PhD studyNVivo use for PhD study
NVivo use for PhD study
 
Dialogue system②
Dialogue system②Dialogue system②
Dialogue system②
 
Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)
 
1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf1066_multitask_prompted_training_en.pdf
1066_multitask_prompted_training_en.pdf
 
Deep Meta Learning
Deep Meta Learning Deep Meta Learning
Deep Meta Learning
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 

Mais de Sergey Karayev

Mais de Sergey Karayev (20)

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
 
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
 
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep Learning
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep Learning
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Research Directions - Full Stack Deep Learning

  • 1. Full Stack Deep Learning Research Directions Pieter Abbeel, Sergey Karayev, Josh Tobin
  • 2. Number of arxiv papers submitted in AI categories FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[source: http://data-mining.philippe-fournier-viger.com/too-many-machine-learning-papers]
  • 3. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 4. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding!) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 5. n Sampling of research directions n Overall research theme n Research <> Real-World gap n How to keep up Outline FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 6. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 7. Supervised Learning n Huge successes n But requirement: lots of labeled data FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 8. In Contrast: Humans Need Only One Example hoverboard onewheelplus FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 9. What’s this one? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 10. What’s this one? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 11. What’s this one? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 12. What’s this one? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 13. n We have a strong prior notion of what object categories are like Why Can We Do This? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 14. How to equip a machine with such prior? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 15.
  • 16. n Starting observation: n Computer vision practice: n Train on ImageNet [Deng et al. ’09] n Fine-tune on actual task works really well! [Decaf: Donahue et al. ’14; …] n Questions: n How to generalize this to behavior learning? n And can we explicitly train end-to-end for being maximally ready for efficient fine-tuning? Model-Agnostic Meta-Learning (MAML) FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 17. Key idea: End-to-end learning of parameter vector θ that is good init for fine-tuning for many tasks fine-tuning: Model-Agnostic Meta-Learning (MAML) train data for new task pretrained parameters MAML training: MAML test time: [Finn, Abbeel, Levine ICML 2017] finetuned parameters finetuned parameters ✓(i)0 min ✓ X tasks i L (i) val(✓ ↵r✓L (i) train(✓))
  • 18. optimal parameter vector for task i parameter vector being meta-learned Model-Agnostic Meta-Learning (MAML) Pieter Abbeel -- UC Berkeley | Covariant.AI | BerkeleyOpenArms.org min ✓ X tasks i L (i) val(✓ ↵r✓L (i) train(✓))
  • 19. Few-Shot Learning: Classification Algorithm: meta-training meta-testing training data test set training classes held-out classes … … … … Omniglot dataset (Lake et al. ’15), Mini-Imagenet dataset (Vinyals et al. ‘16) diagram adapted from Ravi & Larochelle ‘17
  • 20. Omniglot Few-Shot Classification [1] Koch ’15 [2] Vinyals et al. ’16 [3] Edwards & Storkey ’17 [4] Kaiser et al. ’17 Classification Error Omniglot Dataset: 1200 training classes, 423 test classes
  • 21. Mini-ImageNet Few-Shot Classification [1] Vinyals ’16 [2] Ravi & Larochelle ’17 Mini-Imagenet Dataset: 64 training classes, 24 test classes Classification Error
  • 22. Meta Learning for Classification Task distribution: different classification datasets (input: images, output: class labels) n Hochreiter et al., (2001) Learning to learn using gradient descent n Younger et al., (2001), Meta learning with back propagation n Koch et al., (2015) Siamese neural networks for one-shot image recognition n Santoro et al., (2016) Meta-learning with memory-augmented neural networks n Vinyals et al., (2016) Matching networks for one shot learning n Edwards et al., (2016) Towards a Neural Statistician n Ravi et al., (2017) Optimization as a model for few-shot learning n Munkhdalai et al., (2017) Meta Networks n Snell et al., (2017) Prototypical Networks for Few-shot Learning n Shyam et al., (2017) Attentive Recurrent Comparators n Finn et al., (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks n Mehrotra et al., (2017) Generative Adversarial Residual Pairwise Networks for One Shot Learning n Mishra et al., (2017) Meta-Learning with Temporal Convolutions n Li et al., (2017) Meta-SGD: Learning to Learn Quickly for Few Shot Learning n Finn and Levine, (2017) Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm n Grant et al., (2017) Recasting Gradient-Based Meta-Learning as Hierarchical Bayes FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 23. Meta Learning for Classification Task distribution: different classification datasets (input: images, output: class labels) n Hochreiter et al., (2001) Learning to learn using gradient descent n Younger et al., (2001), Meta learning with back propagation n Koch et al., (2015) Siamese neural networks for one-shot image recognition n Santoro et al., (2016) Meta-learning with memory-augmented neural networks n Vinyals et al., (2016) Matching networks for one shot learning n Edwards et al., (2016) Towards a Neural Statistician n Ravi et al., (2017) Optimization as a model for few-shot learning n Munkhdalai et al., (2017) Meta Networks n Snell et al., (2017) Prototypical Networks for Few-shot Learning n Shyam et al., (2017) Attentive Recurrent Comparators n Finn et al., (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks n Mehrotra et al., (2017) Generative Adversarial Residual Pairwise Networks for One Shot Learning n Mishra et al., (2017) Meta-Learning with Temporal Convolutions n Li et al., (2017) Meta-SGD: Learning to Learn Quickly for Few Shot Learning n Finn and Levine, (2017) Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm n Grant et al., (2017) Recasting Gradient-Based Meta-Learning as Hierarchical Bayes n Raghu et al, (2019) Rapid learning or feature reuse? towards understanding the effectiveness of maml n Lake et al, (2019) The Omniglot Challenge: a 3-year progress report n Rajeswaran et al (2019) Meta-learning with implicit gradients n Wang et al (2019) SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 24. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 25. Few-Shot Learning for Grading FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Tang*, Karayev* et al, 2018 (in prep.)]
  • 26. Meta Learning for Optimization Task distribution: different neural networks, weight initializations, and/or different loss functions n Bengio et al., (1990) Learning a synaptic learning rule n Naik et al., (1992) Meta-neural networks that learn by learning n Hochreiter et al., (2001) Learning to learn using gradient descent n Younger et al., (2001), Meta learning with back propagation n Andrychowicz et al., (2016) Learning to learn by gradient descent by gradient descent n Chen et al., (2016) Learning to Learn for Global Optimization of Black Box Functions n Wichrowska et al., (2017) Learned Optimizers that Scale and Generalize n Ke et al., (2017) Learning to Optimize Neural Nets n Wu et al., (2017) Understanding Short-Horizon Bias in Stochastic Meta-Optimization FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 27. Meta Learning for Generative Models Task distribution: different unsupervised datasets (e.g. collection of images) n Rezende et al., (2016) One-Shot Generalization in Deep Generative Models n Edwards et al., (2016) Towards a Neural Statistician n Bartunov et al., (2016) Fast Adaptation in Generative Models with Generative Matching Networks n Bornschein et al., (2017) Variational Memory Addressing in Generative Models n Reed et al., (2017) Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
  • 28. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 29. Reinforcement Learning (RL) Robot + Environment max ✓ E[ HX t=0 R(st)|⇡✓] [Image credit: Sutton and Barto 1998] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 30. Reinforcement Learning (RL) Robot + Environment n Compared to supervised learning, additional challenges: n Credit assignment n Stability n Exploration [Image credit: Sutton and Barto 1998] max ✓ E[ HX t=0 R(st)|⇡✓] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 31. Deep RL Success Stories DQN Mnih et al, NIPS 2013 / Nature 2015 MCTS Guo et al, NIPS 2014; TRPO Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015; A3C Mnih et al, ICML 2016; Dueling DQN Wang et al ICML 2016; Double DQN van Hasselt et al, AAAI 2016; Prioritized Experience Replay Schaul et al, ICLR 2016; Bootstrapped DQN Osband et al, 2016; Q-Ensembles Chen et al, 2017; Rainbow Hessel et al, 2017; Accelerated Stooke and Abbeel, 2018; … FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 32. Deep RL Success Stories AlphaGo Silver et al, Nature 2015 AlphaGoZero Silver et al, Nature 2017 Tian et al, 2016; Maddison et al, 2014; Clark et al, 2015 AlphaZero Silver et al, 2017 FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 33. n Super-human agent on Dota2 1v1, enabled by n Reinforcement learning n Self-play n Enough computation Deep RL Success Stories [OpenAI, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 34. Deep RL Success Stories ^ TRPO Schulman et al, 2015 + GAE Schulman et al, 2016 See also: DDPG Lillicrap et al 2015; SVG Heess et al, 2015; Q-Prop Gu et al, 2016; Scaling up ES Salimans et al, 2017; PPO Schulman et al, 2017; Parkour Heess et al, 2017; FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 35. Deep RL Success Stories [Peng, Abbeel, Levine, van de Panne, 2018] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 36. Deep RL Success: Dynamic Animation [Peng, Abbeel, Levine, van de Panne, 2018] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 37. Deep RL Success Stories Guided Policy Search Levine*, Finn*, Darrell, Abbeel, JMLR 2016 FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 38. n Rigid rods connected by elastic cables n Controlled by motors that extend / contract cables n Properties: n Lightweight n Low cost n Capable of withstanding significant impact n NASA investigates them for space exploration n Major challenge: control Tensegrity Robotics: NASA SuperBall [Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 39. [Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 40. Mastery? Yes Deep RL (DQN) vs. Human Score: 18.9 Score: 9.3 FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 41. But how good is the learning? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 42. Humans vs. DDQN Black dots: human play Blue curve: mean of human play Blue dashed line: ‘expert’ human play Red dashed lines: DDQN after 10, 25, 200M frames (~ 46, 115, 920 hours) [Tsividis, Pouncy, Xu, Tenenbaum, Gershman, 2017] Humans after 15 minutes tend to outperform DDQN after 115 hours FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 43. How to bridge this gap? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 44. Starting Observations n TRPO, DQN, A3C, DDPG, PPO, Rainbow, … are fully general RL algorithms n i.e., for any environment that can be mathematically defined, these algorithms are equally applicable n Environments encountered in real world = tiny, tiny subset of all environments that could be defined (e.g. they all satisfy our universe’s physics) Can we develop “fast” RL algorithms that take advantage of this? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 45. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] Agent Environment asr FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 46. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] RL Algorithm Policy Environment aor RL Agent FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 47. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] RL Algorithm Policy Environment A asr … RL Agent RL Algorithm Policy Environment B asr RL Agent FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 48. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] … RL Algorithm Policy Environment A aor RL Algorithm Policy Environment B aor RL Agent RL Agent Traditional RL research: • Human experts develop the RL algorithm • After many years, still no RL algorithms nearly as good as humans… Alternative: • Could we learn a better RL algorithm? • Or even learn a better entire agent? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 49. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A Meta RL Algorithm Environment B … Meta-training environments FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 50. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A ar,o Meta RL Algorithm Environment B … Environment F Meta-training environments Testing environments FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 51. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A ar,o Meta RL Algorithm Environment B … Environment G Meta-training environments Testing environments FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 52. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A ar,o Meta RL Algorithm Environment B … Environment H Meta-training environments Testing environments FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 53. Formalizing Learning to Reinforcement Learn [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] max ✓ EM E⌧ (k) M " KX k=1 R(⌧ (k) M ) | RLagent✓ # FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 54. Formalizing Learning to Reinforcement Learn [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] M : sample MDP ⌧ (k) M : k’th trajectory in MDP M max ✓ EM E⌧ (k) M " KX k=1 R(⌧ (k) M ) | RLagent✓ # Meta-train: max ✓ X M2Mtrain E⌧ (k) M " KX k=1 R(⌧ (k) M ) | RLagent✓ # FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 55. n RLagent = RNN = generic computation architecture n different weights in the RNN means different RL algorithm and prior n different activations in the RNN means different current policy n meta-train objective can be optimized with an existing (slow) RL algorithm Representing RLagent✓ max ✓ X M2Mtrain E⌧ (k) M " KX k=1 R(⌧ (k) M ) | RLagent✓ # [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 56. Evaluation: Multi-Armed Bandits n Multi-Armed Bandits setting n Each bandit has its own distribution over pay-outs n Each episode = choose 1 bandit n Good RL agent should explore bandits sufficiently, yet also exploit the good/best ones n Provably (asymptotically) optimal RL algorithms have been invented by humans: Gittins index, UCB1, Thompson sampling, … [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 57. [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] Evaluation: Multi-Armed Bandits We consider Bayesian evaluation setting. Some of these prior works also have adversarial guarantees, which we don’t consider here. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 58. n Task – reward based on target running direction + speed Evaluation: Locomotion – Half Cheetah [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 59. n Task – reward based on target running direction + speed n Result of meta-training = a single agent (the “fast RL agent”), which masters each task almost instantly within 1st episode Evaluation: Locomotion – Half Cheetah FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 60. n Task – reward based on target running direction + speed Evaluation: Locomotion – Ant [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 61. n Task – reward based on target running direction + speed n Result of meta-training = a single agent (the “fast RL agent”), which masters each task almost instantly within 1st episode Evaluation: Locomotion – Ant FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 62. Evaluation: Visual Navigation Agent’s view Maze Agent input: current image Agent action: straight / 2 degrees left / 2 degrees right Map just shown for our purposes, but not available to agent Related work: Mirowski, et al, 2016; Jaderberg et al, 2016; Mnih et al, 2016; Wang et al, 2016 FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 63. Evaluation: Visual Navigation After learning-to-learnBefore learning-to-learn [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] Agent input: current image Agent action: straight / 2 degrees left / 2 degrees right Map just shown for our purposes, but not available to agent FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 64. [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] Meta-Learning Curves FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 65. n Simple Neural Attentive Meta-Learner (SNAIL) [Mishra, Rohaninejad, Chen, Abbeel, 2017] Other Architecture FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 66. n Like RL2 but: replace the LSTM with dilated temporal convolution (like wavenet) + attention Simple Neural Attentive Meta-Learner [Wavenet: van den Oord et al, 2016] [Attention-is-all-you-need: Vaswani et al, 2017] [Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 67. Bandits [Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 68. Vision-based Navigation in Unknown Maze [Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 69. Agent Dropped in New Maze [Mishra*, Rohaninejad*, Chen, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 70. Meta Learning for RL Task distribution: different environments n Schmidhuber. Evolutionary principles in self-referential learning. (1987) n Wiering, Schmidhuber. Solving POMDPs with Levin search and EIRA. (1996) n Schmidhuber, Zhao, Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. (MLJ 1997) n Schmidhuber, Zhao, Schraudolph. Reinforcement learning with self-modifying policies (1998) n Zhao, Schmidhuber. Solving a complex prisoner’s dilemma with self-modifying policies. (1998) n Schmidhuber. A general method for incremental self-improvement and multiagent learning. (1999) n Singh, Lewis, Barto. Where do rewards come from? (2009) n Singh, Lewis, Barto. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective (2010) n Niekum, Spector, Barto. Evolution of reward functions for reinforcement learning (2011) n Duan et al., (2016) RL2: Fast Reinforcement Learning via Slow Reinforcement Learning n Wang et al., (2016) Learning to Reinforcement Learn n Finn et al., (2017) Model-Agnostic Meta-Learning (MAML) n Mishra, Rohinenjad et al., (2017) Simple Neural AttentIve meta-Learner n Frans et al., (2017) Meta-Learning Shared Hierarchies n Stadie et al (2018) Some considerations on learning to explore via meta-reinforcement learning n Gupta et al (2018) Meta-reinforcement learning of structured exploration strategies n Nagabandi*, Clavera*, et al (2018) Learning to adapt in dynamic, real-world environments through meta-reinforcement learning n Clavera et al (2018) Model-based reinforcement learning via meta-policy optimization n Botvinick et al (2019) Reinforcement learning: fast and slow n Rakelly et al (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables n … FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 71. Benchmarks / Environments Vizdoom FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Deepmind Lab mujoco Probably need more / richer environments…
  • 72. OpenAI Retro Contest FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 73. Meta-World [Yu, Quillen, he, Julian, Hausman, Finn, Levine, 2019] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 74. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 75. Imitation Learning in Robotics [Abbeel et al. 2008] [Kolter et al. 2008] [Ziebart et al. 2008] [Schulman et al. 2013] [Finn et al. 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 76. Imitation Learning FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 77. Imitation Learning FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 78. One-Shot Imitation Learning [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017]
  • 79. One-Shot Imitation Learning [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 80. One-Shot Imita{on Learning [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 81. Learning a One-Shot Imitator One-shot imitator [Figure credit: Bradly Stadie] Demo 1 of task i Demo 2 of task i FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 82. Proof-of-concept: Block Stacking n Each task is specified by a desired final layout n Example: abcd n “Place c on top of d, place b on top of c, place a on top of b.” [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 83. Evaluation [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 84. n Meta-learning loss: n Task loss = behavioral cloning loss: [Pomerleau’89,Sammut’92] Learning a One-Shot Imitator with MAML [Finn*, Yu*, Zhang, Abbeel, Levine, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 85. Object placing from pixels subset of training objects held-out test objects input demo resulting policy [real-time execution] (via teleoperation) Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17
  • 86. n Recall, meta-learning loss: n Key idea: different loss for “val” and “train” Can We Learn from Just Video? [Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018] Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com min ✓ X tasks i L (i) val(✓ ↵r✓L (i) train(✓)) with Lval(✓) = X t k⇡✓(ot) a⇤ t k2 Ltrain(✓) = X t kf✓(ot) o⇤ t+1k2 “val” only needed during meta-training, and continues to assume access to action taken “train” doesn’t require access to action taken, so at meta-testing video suffices
  • 87. n Recall, meta-learning loss: Can We Learn from Just Video? Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com min ✓ X tasks i L (i) val(✓ ↵r✓L (i) train(✓)) with Lval(✓) = X t k⇡✓(ot) a⇤ t k2 Ltrain(✓) = X t kf✓(ot) o⇤ t+1k2 …ot Lval(✓) = X t k⇡✓(ot) a⇤ t k2 Ltrain(✓) = X t kf✓(ot) o⇤ t+1k2 ⇡✓(ot) f✓(ot)[Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018]
  • 88. Can We Learn from Just Video? [Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018] Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com …ot Lval(✓) = X t k⇡✓(ot) a⇤ t k2 Ltrain(✓) = X t kf✓(ot) o⇤ t+1k2 Issue: direct pixel level prediction not a great loss function…
  • 89. Can We Learn from Just Video? [Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, RSS 2018] Pieter Abbeel – covariant.ai | UC Berkeley | gradescope.com …ot Lval(✓) = X t k⇡✓(ot) a⇤ t k2 L (f✓(ot), o⇤ t+1) Ltrain(✓) = X t kf✓(ot) o⇤ t+1k2 L can be thought of as (learned) Discriminator in GANs
  • 90. One-shot imitation from human video input human demo resulting policy [Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine ’18]
  • 91. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 92. Compared to the real world, simulated data collection is… n Less expensive n Faster / more scalable n Less dangerous n Easier to label Motivation for Simulation How can we learn useful real- world skills in the simulator? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 93. Approach 1 – Use Realistic Simulated Data Simulation Real world Carefully match the simulation to the world [1,2,3,4] Augment simulated data with real data [5,6] GTA V Real world [5] Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. Playing for data: Ground truth from computer games (2016) [6] Bousmalis et al. Using simulation and domain adaptation to improve efficiency of robotic grasping (2017) [1] Stephen James, Edward Johns. 3d simulation for robot arm control with deep q-learning (2016) [2] Johns, Leutenegger, Davision. Deep learning a grasp function for grasping under gripper pose uncertainty (2016) [3] Mahler et al, Dex-Net 3.0 (2017) [4] Koenemann et al. Whole-body model-predictive control applied to the HRP-2 humanoid. (2015) FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 94. Approach 2 – Domain Confusion / Adaptation Andrei A Rusu, Matej Vecerik, Thomas Rotho ̈rl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim- to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286, 2016. Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, Trevor Darrell. Deep Domain Confusion: Maximizing for Domain Invariance. arXiv preprint arXiv:1412.3474, 2014. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 95. Approach 3 – Domain Randomization If the model sees enough simulated variation, the real world may look like just the next simulator FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 96. Domain Randomization n Quadcopter collision avoidance n ~500 semi-realistic textures, 12 floor plans n ~40-50% of 1000m trajectories are collision-free (cad)ˆ2 rl: Real Single-Image Flight Without a Single Real Image. [3] Fereshteh Sadeghi and Sergey Levine. (cad)ˆ2 rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 97. Domain Randomization for Pose Estimation n Precise object pose localization n 100K images with simple randomly generated textures [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
  • 98. How does it work? More Data = Better [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
  • 99. More Textures = Better [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
  • 100. Pre-Training is not Necessary [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 101. Domain Randomization for Grasping [Tobin, Zaremba, Abbeel, 2017] Hypothesis: Training on a diverse array of procedurally generated objects can produce comparable performance to training on realistic object meshes. [see also: Levine et al and Goldberg et al] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 102. Evaluation: How Good Are Random Shapes? [Tobin, Zaremba, Abbeel, 2017] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 103. Evaluation: Real Robot FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh T
  • 104. How a Full Hand? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 105. OpenAI: Dactyl FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Trained with domain randomization [OpenAI, 2018]
  • 106. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Trained with domain randomization [OpenAI, 2019]
  • 107. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 108. Current Practice: ML Solution = Data + Computation + ML Expertise [Slide from: Quoc Le]
  • 109. Current Practice: ML Solution = Data + Computation + ML Expertise But can we turn this into: Solution = Data + 100X Computation [Slide from: Quoc Le]
  • 110. Importance of architectures for Vision ● Designing neural network architectures is hard ● Lots of human efforts go into tuning them ● Can we try and learn good architectures automatically? Canziani et al, 2017[Slide from: Quoc Le]
  • 111. Neural Architecture Search ● We can specify the structure and connectivity of a neural network by a configuration string ○ [“Filter Width: 5”, “Filter Height: 3”, “Num Filters: 24”] ● Use a RNN (“Controller”) to generate this string (the “Child Network”) ● Train the Child Network to see how well it performs on a validation set ● Use reinforcement learning to update the Controller based on the accuracy of the Child Network [Slide from: Quoc Le]
  • 112. Controller: proposes Child Networks Train & evaluate Child Networks 20K times Iterate to find the most accurate Child Network [Slide from: Quoc Le] Neural Architecture Search
  • 113. Neural Architecture Search for Convolutional Networks Controller RNNSoftmax classifier Embedding [Slide from: Quoc Le]
  • 114. Training with REINFORCE Accuracy of architecture on held- out dataset Architecture predicted by the controller RNN viewed as a sequence of actions Parameters of Controller RNN Number of models in minibatch [Slide from: Quoc Le]
  • 116. Neural Architecture Search for CIFAR-10 [Slide from: Quoc Le]
  • 117. Neural Architecture Search for Language Modeling [Slide from: Quoc Le]
  • 118. Performance of cell on ImageNet Updated June/2018: AmoebaNet (found by evolution in the same search space) is slightly better[Slide from: Quoc Le]
  • 119. Neural Architecture Search ● Key idea: Architecture is a string ● Get an RNN to generate the string, and train the RNN with the reward associated with the string ● This idea can be applied to other components of ML: ○ Equation for an optimizer is a string ○ Activation function is a string, e.g., z = 1/(1 + e^(-x)) [Slide from: Quoc Le]
  • 120. Data processing Machine Learning ModelData Focus of machine learning research [Slide from: Quoc Le]
  • 121. Data processing Machine Learning ModelData Focus of machine learning research Very important but manually tuned [Slide from: Quoc Le]
  • 123. Controller: proposes augmentation policy Train & evaluate models with the augmentation policy 20K times Iterate to find the most accurate policy Data Augmentation [Slide from: Quoc Le]
  • 124. AutoAugment: Example Policy Probability of applying Magnitude [Slide from: Quoc Le]
  • 125. AutoAugment Model No data aug Standard data-aug AutoAugment Model No data aug Standard data-aug AutoAugment [Slide from: Quoc Le]
  • 126. ImageNet - Top5 error rate Human (Andrej Karpathy) 5.1 Our best model 3.5 Model No data augmentation Standard data augmentation AutoAugment [Slide from: Quoc Le]
  • 127. References ● Neural Architecture Search with Reinforcement Learning. Barret Zoph and Quoc V. Le. ICLR, 2017 ● Learning Transferable Architectures for Scalable Image Recognition. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le. CVPR, 2018 ● Efficient Neural Architecture Search via Parameter Sharing. Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean ● AutoAugment: Learning Augmentation Policies from Data. Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le. Arxiv, 2018 ● Searching for Activation Functions. Prajit Ramachandran, Barret Zoph, Quoc Le. ICLR Workshop, 2018 [Slide from: Quoc Le]
  • 128. Population-based Auto-Augment aka Auto-Augment with 1000x less compute [Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi (Peter) Chen, ICML 2019]
  • 129. Population-based Auto-Augment aka Auto-Augment with 1000x less compute [Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi (Peter) Chen, ICML 2019]
  • 130. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 131. n Supervised learning is limited by amount of labeled data n Can we use unlabeled data instead? n Main ideas: n Learn network that embeds the data n Later supervised learning on embedding n (possibly while also fine-tuning the embedding network) n Learning weights: n Pretraining of NN n Auxiliary loss Unsupervised Learning Rationale FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 132. n Variational AutoEncoders (VAEs) n Generative Adversarial Networks n Exact Likelihood Models (autoregressive [pixelcnn, charrnn], realnvp, NICE) n “puzzles” (gray2rgb, where-does-this-patch-go, rotation- prediction, contrastive-predictive-coding, etc…) Main Families of Models FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 133. Generative Models n “What I cannot create, I do not understand.” n Ability to generate data that look real entails some form of understanding [Radford, Metz & Chintala, ICLR 2016]
  • 134. Initial “Images” [Salimans, Goodfellow, Zaremba, Cheung, Radford & Chen, NIPS 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 135. Learning to Generate Images [Salimans, Goodfellow, Zaremba, Cheung, Radford & Chen, NIPS 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 136. InfoGAN: Concepts Learning [Chen, Duan, Houthooft, Schulman, Sutskever & Abbeel, NIPS 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 139. Image Generation [thispersondoesnotexist.com] Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 140. Unsupervised Image to Image Pieter Abbeel -- UC Berkeley | Covariant.AI[CycleGAN: Zhu, Park, Isola & Efros, 2017]
  • 141. Everybody Dance Now Pieter Abbeel -- UC Berkeley | Covariant.AI[Everybody Dance Now: Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alyosha Efros (UC Berkeley) 2018]
  • 142. Neural “Photoshop” [Brock, Lim, Ritchie & Weston, 2016] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 143. Neural Compression 2 [Rippel & Bourdev, 2017] JPEG JPEG2000 WaveOne FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 144. Data-Efficient Image Recognition with Contrastive Predictive Coding FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Olivier Henaff, Ali Razavi, Carl Doersch, Ali Eslami, Aaron van den Oord, 2019]
  • 145. Data-Efficient Image Recognition with Contrastive Predictive Coding FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Olivier Henaff, Ali Razavi, Carl Doersch, Ali Eslami, Aaron van den Oord, 2019] Linear separability in feature space from unsupervised learning
  • 146. Data-Efficient Image Recognition with Contrastive Predictive Coding FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[Olivier Henaff, Ali Razavi, Carl Doersch, Ali Eslami, Aaron van den Oord, 2019] Contrastive-Predictive Coding
  • 147. Beyond Images: Amazon Review [Radford et al, 2017] This product does what it is supposed to. I always keep three of these in my kitchen just in case ever I need a replacement cord. Great little item. Hard to put on the crib without some kind of embellishment. My guess is just like the screw kind of attachment I had. Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 148. Prompt: A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown. GPT-2: The incident occurred on the downtown train line, which runs from Covington and Ashland stations. In an email to Ohio news outlets, the U.S. Department of Energy said it is working with the Federal Railroad Administration to find the thief. ‘The theft of this nuclear material will have significant negative consequences on public and environmental health, our workforce and the economy of our nation,’ said Tom Hicks, the U.S. Energy Secretary, in a statement. ‘Our top priority is to secure the theft and ensure it doesn’t happen again.’ The stolen material was taken from the University of Cincinnati’s Research Triangle Park nuclear research site, according to a news release from Department officials. The Nuclear Regulatory Commission did not immediately release any information. According to the release, the U.S. Department of Energy’s Office of Nuclear Material Safety and Security is leading that team’s investigation. ‘The safety of people, the environment and the nation’s nuclear stockpile is our highest priority,’ Hicks said. ‘We will get to the bottom of this and make no excuses.’ Text Generation (OpenAI’s GPT-2) Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 149. Prompt: Recycling is good for the world, no, you could not be more wrong. GPT-2: Recycling is NOT good for the world. It is bad for the environment, it is bad for our health, and it is bad for our economy. I’m not kidding. Recycling is not good for the environment. It is destructive to the earth and it is a major contributor to global warming. Recycling is not good for our health. It contributes to obesity and diseases like heart disease and cancer. Recycling is bad for our economy. It increases the cost of a product, and in turn, the price of everything that is made with that product. Recycling is not good for our nation. We pay a tremendous price for the privilege of having the world’s most advanced and efficient recycling system. Recycling is a huge, colossal waste of time, energy, money, and resources. Text Generation (OpenAI’s GPT-2) Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 150. Text Generation (OpenAI’s GPT-2) Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope[Radford et al, 2019]
  • 151. Unsupervised Sentiment Neuron [Radford et al, 2017] Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 152. Benchmarks Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 153. Benchmarks Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 154. Benchmarks Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 155. Scaling Pieter Abbeel -- UC Berkeley | Covariant.AI
  • 156. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 157. n Few-Shot Learning n Reinforcement Learning n Imitation Learning n Domain Randomization n Architecture Search n Unsupervised Learning Many Exciting Directions in AI FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Lifelong Learning n Bias in ML (avoiding) n Long Horizon Reasoning n Safe Learning n Value Alignment n Planning + Learning n …
  • 158. n Sampling of research directions n Overall research theme n Research <> Real-World gap n How to keep up Outline FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 159. datasets models ImageNet (~10^6 images) Caltech 101 (~10^4 images) (how large they are) Google/FB Images on the web (~10^9+ images) (how well they work) Image Features (SIFT etc., learning linear classifiers on top) ConvNets (learn the features, Structure hard-coded) 2013 2017 90s - 2012 CodeGen (learn the weights and the structure) projection Hard Coded (edge detection etc. no learning) Lena (10^0; single image) 70s - 90s possibilityfrontier Zone of “not going to happen.” Pascal VOC (~10^5 images) In Computer Vision... [Slide credit: Andrej Karpathy] Learning-to-Learn
  • 160. environments agents MuJoCo/ATARI /Universe (~few dozen envs) Cartpole etc. (and bandits, gridworld, ...few toy tasks) (how much they measure / incentivise general intelligence) more multi-agent / non-stationary / real-world-like. (how impressive they are) more learning. more compute. Value Iteration etc. (~discrete MDPs, linear function approximators) DQN, PG (deep nets, hard-coded various tricks) 2013 2016 RL^2 (Learn the RL algorithm. structure fixed.) 90s - 2012 CodeGen (learn structure and learning algorithm) projection (simple multi-agent envs) Digital worlds (complex multi-agent envs) Reality Hard Coded (LISP programs, no learning) BlocksWorld (SHRDLU etc) 70s - 90s possibilityfrontier Zone of “not going to happen.” [Slide adapted from Andrej Karpathy] Learning-to-Learn
  • 161. Compute Increasing Rapidly [20-April-2018] FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 162. Compute per Major Result FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh TobinSource: OpenAI
  • 163. Implications for research agenda? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 164. DATA COMPUTE HUMAN INGENUITY Key Enablers for AI Progress? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Problem Territory 1 Problem Territory 2 E.g., Deep Learning to Learn
  • 165. n Each PhD student approximately uses: n 4-8 GPU machine (local development) n 2-5K / month cloud compute credits (big thanks to Google and Amazon!) n Trending: n Towards 10-20 GPUs per student (housed at central campus facility) n Hoping to get to 10-20K / month cloud compute credits (anyone?) How much compute do we use at Berkeley Robot Learning Lab? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 166. n Sampling of research directions n Overall research theme n Research <> Real-World gap n How to keep up Outline FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 167. n Research: n 0 to 1 n If achieving 70% on a benchmark, move on to the next one Research <<<<GAP>>>> Real-World n Real-World: n Even 90% is not enough! FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 168. n Typical station in factory or warehouse: n 500 - 2,000 task completions per hour n Now, imagine 90% success of the robot n 500 per hour à 50 failures that need fixing up n 2000 per hour à 200 failures that need fixing up n Fixing up failures typically takes more time than doing task n --> robot with 50 failures per hour causes more work than it saves n Realistically, value created at 1-2 human interventions per hour n 500 per hour, max 2 interventions à 99.6% success rate n 2,000 per hour, max 2 interventions --> 99.9% success rate Real-World Reliability Requirements FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 169. Ok, fine, we need beyond 90% Pieter Abbeel -- UC Berkeley | Covariant.AI So what, let’s just: - larger network - more data
  • 170. n can NOT ignore the long tail of real world scenarios n can NOT ignore that real world always changes n can NOT ignore it’s important to know when you don’t know Subtleties in going beyond 90% FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 171. n ImageNet n 1000 categories n 1000 example images for each category Real World: High Variability / Long Tail n Real world n Millions of SKUs n Transparency.. n … àsignificant AI R&D needed to meet real-world demands FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 172. n Deep RL Research n Simulation n Train ahead of time Real World: Always changing n Real world n Need to adapt on the fly… àsignificant AI R&D needed to meet real-world demands FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 173. Real World: Need to know when don’t know FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 174. How feasible today? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 175. n Sampling of research directions n Overall research theme n Research <> Real-World gap n How to keep up Outline FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 176. n How to read a paper? n What papers to read? n Reading group How to Keep Up FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 177. How to Read a Paper? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin • Read the title, abstract, section headers, and figures • Try and find slides or a video on the paper (these do not have to be by the authors). • Read the introduction (Jennifer Widom) • What is the problem? • Why is it interesting and important? • Why is it hard? (e.g., why do naive approaches fail?) • Why hasn't it been solved before? (Or, what's wrong with previous proposed solutions? How does this paper differ?) • What are the key components of this approach and results? Any limitations? • Skim the related work. • Is there any related work you are familiar with? • If so, how does this paper relate to those works? • Skim the technical section. • Where are the novelties? • What are the assumptions? • Read the experiments. • What questions are the experiments answering? • What questions are the experiments not answering? • What baselines do they compare against? • How strong are these baselines? • Is the experiment methodology sound? • Do the results support their claims? • Read the conclusion/discussion. • Read the technical section. • Read in “good faith” (i.e., assume the authors are correct) • Skip over confusing parts that don’t seem fundamental • If any important part is confusing, see if the material is in class slides or prior papers • Read the paper as you see fit. [source: Gregory Kahn]
  • 178. n How to read a paper? n What papers to read? n Reading group How to Keep Up FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 179. Just read all the papers? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin[source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106]
  • 180. n Import AI Newsletter (by Jack Clark, communications director at OpenAI) n Arxiv sanity (by Karpathy) n Twitter n jackclarkSF, karpathy, goodfellow_ian, hardmaru, smerity, hillbig, pabbeel, … n AI/DL Facebook Group: active group where members post anything from news articles to blog posts to general ML questions n ML Subreddit What Papers to Read? FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 181. n How to read a paper? n What papers to read? n Reading group How to Keep Up FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 182. n Even just 1 other person can already save you ~50% of your time Reading Group FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin