SlideShare uma empresa Scribd logo
1 de 140
Baixar para ler offline
Full Stack Deep Learning
Research Directions
Pieter Abbeel, Sergey Karayev, Josh Tobin
Number of arxiv papers submitted in AI categories
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106]
n Sampling of research directions
n Overall research theme
n How to keep up
Outline
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
n Works!
n BUT: requires so much annotated data
Recall: Deep Supervised Learning
Yes!
n Deep semi-supervised learning
n Deep unsupervised learning
So: Can we learn with less labels?
n Assumption:
n Classification problem
n Each data-point belongs to one of the classes
n Toy Example: “two moons”
Semi-supervised Learning
Noisy-Student
[Xie et al, 2020]
Noisy-Student -- Results
Yes!
n Deep semi-supervised learning
n Deep unsupervised learning
So: Can we learn with less labels?
Transfer with Multi-headed Networks
…
in
Task 1
Task 2
n Key hypothesis:
n IF neural network smart enough to predict:
n Next frame in video
n Next word in sentence
n Generate realistic images
n ``Translate’’ images
n …
n THEN same neural network is ready to do Deep Supervised Learning
from very small data-set
Deep Unsupervised Learning
Task 1
Task 2
Transfer from Unsupervised Learning
…
in
Task 1 = unsupervised
Task 2 = real task
Example Setting
…
text
Task 1 = predict next word
Task 2 = predict sentiment
Prompt: A train carriage containing controlled nuclear materials was stolen in
Cincinnati today. Its whereabouts are unknown.
GPT-2: The incident occurred on the downtown train line, which runs from Covington
and Ashland stations. In an email to Ohio news outlets, the U.S. Department of
Energy said it is working with the Federal Railroad Administration to find the
thief.
‘The theft of this nuclear material will have significant negative consequences on
public and environmental health, our workforce and the economy of our nation,’
said Tom Hicks, the U.S. Energy Secretary, in a statement. ‘Our top priority is to
secure the theft and ensure it doesn’t happen again.’
The stolen material was taken from the University of Cincinnati’s Research
Triangle Park nuclear research site, according to a news release from Department
officials.
The Nuclear Regulatory Commission did not immediately release any information.
According to the release, the U.S. Department of Energy’s Office of Nuclear
Material Safety and Security is leading that team’s investigation.
‘The safety of people, the environment and the nation’s nuclear stockpile is our
highest priority,’ Hicks said. ‘We will get to the bottom of this and make no
excuses.’
Next Word Prediction (OpenAI’s GPT-2)
Prompt: Recycling is good for the world, no, you could not be more
wrong.
GPT-2: Recycling is NOT good for the world. It is bad for the
environment, it is bad for our health, and it is bad for our
economy. I’m not kidding. Recycling is not good for the
environment. It is destructive to the earth and it is a major
contributor to global warming. Recycling is not good for our
health. It contributes to obesity and diseases like heart
disease and cancer. Recycling is bad for our economy. It
increases the cost of a product, and in turn, the price of
everything that is made with that product. Recycling is not good
for our nation. We pay a tremendous price for the privilege of
having the world’s most advanced and efficient recycling system.
Recycling is a huge, colossal waste of time, energy, money, and
resources.
Next Word Prediction (OpenAI’s GPT-2)
Text Generation (OpenAI’s GPT-2)
Pieter Abbeel -- UC Berkeley / OpenAI /
Gradescope
[Radford et al, 2019]
Unsupervised Sentiment Neuron
[Radford et al, 2017]
Benchmarks
Benchmarks
Benchmarks
Scaling
n Main idea:
n GPT-2: predict next word / token
n BERT: predict a word / token that was removed
BERT
BERT
BERT
BERT
Downstream Tasks - NLP (BERT Revolution)
[https://gluebenchmark.com/leaderboard]
Unsupervised Learning in Vision
…
Image
Task 1 = fill in a patch
Task 2 = predict cat vs. dog
Predict Missing Patch
Solving Jigsaw Puzzles
Rotation Prediction
SimCLR and MoCo
SimCLR + linear classifier
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
Reinforcement Learning (RL)
Robot +
Environment
max
✓
E[
H
X
t=0
R(st)|⇡✓]
[Image credit: Sutton and Barto 1998] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning (RL)
Robot +
Environment
n Compared to
supervised learning,
additional challenges:
n Credit assignment
n Stability
n Exploration
[Image credit: Sutton and Barto 1998]
max
✓
E[
H
X
t=0
R(st)|⇡✓]
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL: Atari
DQN Mnih et al, NIPS 2013 / Nature 2015
MCTS Guo et al, NIPS 2014; TRPO Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015; A3C Mnih et al,
ICML 2016; Dueling DQN Wang et al ICML 2016; Double DQN van Hasselt et al, AAAI 2016; Prioritized
Experience Replay Schaul et al, ICLR 2016; Bootstrapped DQN Osband et al, 2016; Q-Ensembles Chen et al,
2017; Rainbow Hessel et al, 2017; Accelerated Stooke and Abbeel, 2018; …
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[Source: Mnih et al., Nature 2015 (DeepMind) ]
Deep Q-Network (DQN): From Pixels to Joystick Commands
32 8x8 filters with stride 4 + ReLU
64 4x4 filters with stride 2 + ReLU
64 3x3 filters with stride 1 + ReLU
fully connected 512 units + ReLU
fully connected output units, one per action
Deep RL: Go
AlphaGo Silver et al, Nature 2015
AlphaGoZero Silver et al, Nature 2017
Tian et al, 2016; Maddison et al, 2014; Clark et al, 2015
AlphaZero Silver et al, 2017
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL: Robot Locomotion
^ TRPO Schulman et al, 2015 + GAE Schulman et al, 2016
See also: DDPG Lillicrap et al 2015; SVG Heess et al, 2015; Q-Prop Gu et al, 2016; Scaling up ES Salimans et
al, 2017; PPO Schulman et al, 2017; Parkour Heess et al, 2017;
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL: Robot Locomotion
[Peng, Abbeel, Levine, van de Panne, 2018] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Deep RL Success: Dynamic Animation
[Peng, Abbeel, Levine, van de Panne, 2018] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
BRETT: Berkeley Robot for the Elimination of Tedious Tasks
[Levine*, Finn*, Darrell, Abbeel, JMLR 2016]
n Rigid rods connected by elastic cables
n Controlled by motors that extend / contract
cables
n Properties:
n Lightweight
n Low cost
n Capable of withstanding significant impact
n NASA investigates them for space exploration
n Major challenge: control
Tensegrity Robotics: NASA SuperBall
[Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Trained with domain randomization
[OpenAI, 2019]
Covariant
This is the entry point of AI Robotics into the real world
Autonomous Order Picking
29-January-2020
Knapp Pick-It-Easy powered by Covariant Brain
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
Mastery? Yes
Deep RL (DQN) vs. Human
Score: 18.9 Score: 9.3
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
How fast is the learning itself?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Humans vs. DDQN
Black dots: human play
Blue curve: mean of human play
Blue dashed line: ‘expert’ human play
Red dashed lines:
DDQN after 10, 25, 200M frames
(~ 46, 115, 920 hours)
[Tsividis, Pouncy, Xu, Tenenbaum, Gershman, 2017]
Humans after 15 minutes tend to outperform DDQN after 115 hours
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
How to bridge this gap?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Can visual RL achieve same data-efficiency
as RL on state?
● State-based D4PG (blue) vs pixel-based D4PG (green)
60M training steps 60M training steps
[Tassa et al., 2018] Tassa, Y., Doron, Y., Muldal, A., Erez,
T., Li, Y., Casas, D.D.L., Budden, D., Abdolmaleki, A.,
Merel, J., Lefrancq, A. and Lillicrap, T DeepMind Control
Suite, arxiv:1801.00690, 2018.
Pixel-based needs > 50M
more training steps than
state-based to solve same
tasks
Pieter Abbeel
Contrastive learning: SOTA in computer vision
[Henaff et al., 2019] Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord Data-Efficient Image
Recognition with Contrastive Coding arxiv:1905.09272, 2019.
[Chen et al., 2020] Chen, T., Kornblith, S., Norouzi, M. and Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations
arxiv:2002.05709, 2020.
CPCv2 top-5 ImageNet
accuracy as function of
labels
SimCLR top-1 ImageNet
accuracy as function of #
of parameters
[Henaff, Srinivas et al., 2019] [Chen et al., 2020]
Pieter Abbeel
Contrastive + RL
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Observations are
stacked frames
Need to define:
1. query / key pairs
2. similarity measure
3. architecture
Pieter Abbeel
1. Query / key pairs: random crop
query
key
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Pieter Abbeel
2. Bilinear inner product with learned weight matrix
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Pieter Abbeel
3. Keys encoded with momentum
[He et al., 2019] He, K., Fan, H., Wu, Y., Xie, S. and Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning arxiv:1911.05722,
2019.
Similar to MoCo
[He et al.]
Pieter Abbeel
CURL from pixels matches state-based SAC
GRAY: SAC State
RED: CURL
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Pieter Abbeel
CURL Comparison: DeepMind Control Suite
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Pieter Abbeel
Atari performance benchmarked at 100K frames
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Pieter Abbeel
CURL Comparison: Atari
Hard Environments
Environment training steps 1 = 1M
GRAY: SAC State
RED: CURL
Environment training steps 1 = 100M
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Observation:
similar struggle
asymptotically for
pixel-based Deep RL
Pieter Abbeel
Predicting state from pixels
Higher prediction error
correlates with more
challenging environments
[CURL: A Srinivas*, M Laskin*, P Abbeel, 2020]
Pieter Abbeel
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
Starting Observations
n TRPO, DQN, A3C, DDPG, PPO, Rainbow, … are fully general RL
algorithms
n i.e., for any environment that can be mathematically defined, these
algorithms are equally applicable
n Environments encountered in real world
= tiny, tiny subset of all environments that could be defined
(e.g. they all satisfy our universe’s physics)
Can we develop “fast” RL algorithms that take advantage of this?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
Agent
Environment
a
s
r
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
RL Algorithm
Policy
Environment
a
o
r
RL Agent
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
RL Algorithm
Policy
Environment A
a
s
r
…
RL Agent
RL Algorithm
Policy
Environment B
a
s
r
RL Agent
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
…
RL Algorithm
Policy
Environment A
a
o
r
RL Algorithm
Policy
Environment B
a
o
r
RL Agent RL Agent
Traditional RL research:
• Human experts develop
the RL algorithm
• After many years, still
no RL algorithms nearly
as good as humans…
Alternative:
• Could we learn a better
RL algorithm?
• Or even learn a better
entire agent?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
Meta RL
Algorithm
Environment B
…
Meta-training environments
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
a
r,o
Meta RL
Algorithm
Environment B
…
Environment F
Meta-training environments
Testing environments
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
a
r,o
Meta RL
Algorithm
Environment B
…
Environment G
Meta-training environments
Testing environments
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta-Reinforcement Learning
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
"Fast” RL
Agent
Environment A
a
r,o
Meta RL
Algorithm
Environment B
…
Environment H
Meta-training environments
Testing environments
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Formalizing Learning to Reinforcement Learn
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
max
✓
EM E⌧
(k)
M
" K
X
k=1
R(⌧
(k)
M ) | RLagent✓
#
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Formalizing Learning to Reinforcement Learn
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016]
M : sample MDP
⌧
(k)
M : k’th trajectory in MDP M
max
✓
EM E⌧
(k)
M
" K
X
k=1
R(⌧
(k)
M ) | RLagent✓
#
Meta-train:
max
✓
X
M2Mtrain
E⌧
(k)
M
" K
X
k=1
R(⌧
(k)
M ) | RLagent✓
#
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n RLagent = RNN = generic computation architecture
n different weights in the RNN means different RL algorithm and prior
n different activations in the RNN means different current policy
n meta-train objective can be optimized with an existing (slow) RL algorithm
Representing RLagent✓
max
✓
X
M2Mtrain
E⌧
(k)
M
" K
X
k=1
R(⌧
(k)
M ) | RLagent✓
#
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Multi-Armed Bandits
n Multi-Armed Bandits setting
n Each bandit has its own distribution
over pay-outs
n Each episode = choose 1 bandit
n Good RL agent should explore
bandits sufficiently, yet also exploit
the good/best ones
n Provably (asymptotically) optimal
RL algorithms have been invented
by humans: Gittins index, UCB1,
Thompson sampling, …
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016]
Evaluation: Multi-Armed Bandits
We consider Bayesian evaluation setting. Some of these prior works also have adversarial
guarantees, which we don’t consider here.
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Visual Navigation
Agent’s view Maze
Agent input: current image
Agent action: straight / 2 degrees left / 2 degrees right
Map just shown for our purposes, but not available to agent
Related work: Mirowski, et al, 2016; Jaderberg et al, 2016; Mnih et al, 2016; Wang et al, 2016
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Visual Navigation
After learning-to-learn
Before learning-to-learn
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016]
Agent input: current image
Agent action: straight / 2 degrees left / 2 degrees right
Map just shown for our purposes, but not available to agent
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016]
Meta-Learning Curves
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta Learning for RL
Task distribution: different environments
n Schmidhuber. Evolutionary principles in self-referential learning. (1987)
n Wiering, Schmidhuber. Solving POMDPs with Levin search and EIRA. (1996)
n Schmidhuber, Zhao, Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search,
and incremental self-improvement. (MLJ 1997)
n Schmidhuber, Zhao, Schraudolph. Reinforcement learning with self-modifying policies (1998)
n Zhao, Schmidhuber. Solving a complex prisoner’s dilemma with self-modifying policies. (1998)
n Schmidhuber. A general method for incremental self-improvement and multiagent learning. (1999)
n Singh, Lewis, Barto. Where do rewards come from? (2009)
n Singh, Lewis, Barto. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective (2010)
n Niekum, Spector, Barto. Evolution of reward functions for reinforcement learning (2011)
n Duan et al., (2016) RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
n Wang et al., (2016) Learning to Reinforcement Learn
n Finn et al., (2017) Model-Agnostic Meta-Learning (MAML)
n Mishra, Rohinenjad et al., (2017) Simple Neural AttentIve meta-Learner
n Frans et al., (2017) Meta-Learning Shared Hierarchies FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Meta Learning for RL
Task distribution: different environments
n Schmidhuber. Evolutionary principles in self-referential learning. (1987)
n Wiering, Schmidhuber. Solving POMDPs with Levin search and EIRA. (1996)
n Schmidhuber, Zhao, Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search,
and incremental self-improvement. (MLJ 1997)
n Schmidhuber, Zhao, Schraudolph. Reinforcement learning with self-modifying policies (1998)
n Zhao, Schmidhuber. Solving a complex prisoner’s dilemma with self-modifying policies. (1998)
n Schmidhuber. A general method for incremental self-improvement and multiagent learning. (1999)
n Singh, Lewis, Barto. Where do rewards come from? (2009)
n Singh, Lewis, Barto. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective (2010)
n Niekum, Spector, Barto. Evolution of reward functions for reinforcement learning (2011)
n Duan et al., (2016) RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
n Wang et al., (2016) Learning to Reinforcement Learn
n Finn et al., (2017) Model-Agnostic Meta-Learning (MAML)
n Mishra, Rohinenjad et al., (2017) Simple Neural AttentIve meta-Learner
n Frans et al., (2017) Meta-Learning Shared Hierarchies FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
Imitation Learning in Robotics
[Abbeel et al. 2008] [Kolter et al. 2008] [Ziebart et al. 2008]
[Schulman et al. 2013] [Finn et al. 2016]
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Imitation Learning
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Imitation Learning
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
One-Shot Imitation Learning
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017]
One-Shot Imitation Learning
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
One-Shot Imitation Learning
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Learning a One-Shot Imitator
One-shot
imitator
[Figure credit: Bradly Stadie]
Demo 1 of task i Demo 2 of task i
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Proof-of-concept: Block Stacking
n Each task is specified by a
desired final layout
n Example: abcd
n “Place c on top of d,
place b on top of c,
place a on top of b.”
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation
[Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many Exciting Directions in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
Compared to the real world, simulated data collection is…
n Less expensive
n Faster / more scalable
n Less dangerous
n Easier to label
Motivation for Simulation
How can we learn useful real-
world skills in the simulator?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Approach 1 – Use Realistic Simulated Data
Simulation Real
world
Carefully match the simulation to
the world [1,2,3,4]
Augment simulated data with real
data [5,6]
GTA V
Real
world
[5] Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen
Koltun. Playing for data: Ground truth from computer games
(2016)
[6] Bousmalis et al. Using simulation and domain adaptation to
improve efficiency of robotic grasping (2017)
[1] Stephen James, Edward Johns. 3d simulation for robot arm control
with deep q-learning (2016)
[2] Johns, Leutenegger, Davision. Deep learning a grasp function for
grasping under gripper pose uncertainty (2016)
[3] Mahler et al, Dex-Net 3.0 (2017)
[4] Koenemann et al. Whole-body model-predictive control applied to
the HRP-2 humanoid. (2015) FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Approach 2 – Domain Confusion / Adaptation
Andrei A Rusu, Matej Vecerik, Thomas Rotho ̈rl,
Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim-
to-real robot learning from pixels with progressive
nets. arXiv preprint arXiv:1610.04286, 2016.
Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko,
Trevor Darrell. Deep Domain Confusion: Maximizing
for Domain Invariance. arXiv preprint arXiv:1412.3474,
2014.
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Approach 3 – Domain Randomization
If the model sees enough simulated variation, the real world
may look like just the next simulator
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Domain Randomization
n Quadcopter collision
avoidance
n ~500 semi-realistic
textures, 12 floor plans
n ~40-50% of 1000m
trajectories are
collision-free
(cad)ˆ2 rl: Real Single-Image Flight Without a Single Real Image.
[3] Fereshteh Sadeghi and Sergey Levine. (cad)ˆ2 rl: Real single-image flight without a single real image. arXiv preprint
arXiv:1611.04201, 2016. FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Domain Randomization for Pose Estimation
n Precise object pose
localization
n 100K images with simple
randomly generated
textures
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
How does it work? More Data = Better
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
More Textures = Beer
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
Pre-Training is not Necessary
[Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Domain Randomization for Grasping
[Tobin, Zaremba, Abbeel, 2017]
Hypothesis: Training on a diverse array of procedurally generated objects can
produce comparable performance to training on realistic object meshes.
[see also: Levine et al and Goldberg et al] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Evaluation: Real Robot
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Trained with domain randomization
[OpenAI, 2019]
n Unsupervised Learning
n Reinforcement Learning
n Unsupervised RL
n Meta-Reinforcement Learning
n Few-Shot Imitation
n Domain Randomization
n DL for Science and Engineering
Many ExciKng DirecKons in AI
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Mitigating Bias
n Multi-modal Learning
n Architecture Search
n Value Alignment
n Scaling Laws
n Human-in-the-Loop
n Explainability
CASP 2020 Competition
n DL to predict properties
à optimize molecule against desired properties
n Synthesis
à DL to propose synthesis steps
Molecule Design
Exploring Machine Learning Applications to Enable Next-Generation Chemistry
Jennifer Wei (Google)
https://docs.google.com/presentation/d/1zGoxOMWmid25hgtSgVkQYu1-
GP9PT3EQ2_tzOlQZcF4/edit#slide=id.p1
A General Approach to Speed Up Design
BagNet: Berkeley Analog Generator with Layout OpAmizer Boosted with Deep Neural Networks
K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic.
IEEE/ACM Interna]onal Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019.
h`ps://arxiv.org/abs/1907.10515
A General Approach to Speed Up Design
BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks
K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic.
IEEE/ACM International Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019.
https://arxiv.org/abs/1907.10515
A General Approach to Speed Up Design
BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks
K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic.
IEEE/ACM International Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019.
https://arxiv.org/abs/1907.10515
GANs in High-Energy Physics
GANs for HEP
Ben Nachman
https://drive.google.com/file/d/1op6Q6OuVZvJ4VbtLkemi3oJWtSBx5FCc/view
Symbolic Math: Integrals and ODEs
[Lample and Charton, ICLR 2020]
Symbolic Math: Integrals and ODEs
[Lample and Charton, ICLR 2020]
n https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
n AlphaFold: Improved protein structure prediction using potentials from deep learning
Deepmind (Senior et al)
https://www.nature.com/articles/s41586-019-1923-7
n BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks
K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic.
IEEE/ACM International Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019.
https://arxiv.org/abs/1907.10515
n Evaluating Protein Transfer Learning with TAPE
R. Rao, N. Bhattacharya, N. Thomas, Y, Duan, X. Chen, J. Canny, P. Abbeel, Y. Song
https://www.biorxiv.org/content/10.1101/676825v1
n Opening the black box: the anatomy of a deep learning atomistic potential
Justin Smith
https://drive.google.com/file/d/1f1iiXKzxNbNz5lL5x2ob9xGYLHNHhxeU/view
n Exploring Machine Learning Applications to Enable Next-Generation Chemistry
Jennifer Wei (Google)
https://docs.google.com/presentation/d/1zGoxOMWmid25hgtSgVkQYu1-GP9PT3EQ2_tzOlQZcF4/edit#slide=id.p1
n GANs for HEP
Ben Nachman
https://drive.google.com/file/d/1op6Q6OuVZvJ4VbtLkemi3oJWtSBx5FCc/view
n Deep Learning for Symbolic Mathematics
G. Lample and F. Charton
https://openreview.net/pdf?id=S1eZYeHFDS
n A Survey of Deep Learning for Scientific Discovery
Maithra Raghu, Eric Schmidt
https://arxiv.org/abs/2003.11755
Learn more about Deep Learning for Science/Engineering?
n Sampling of research directions
n Overarching research trend
n How to keep up
Outline
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Compute Increasing Rapidly
[20-April-2018]
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Architecture Num neurons Num synapses
Fly 100K = 105 10M = 107
AlexNet 650K = 106 60M = 108
Mouse 100M = 108 100B = 1011
Human 100B = 1011 1014 -1015
Let’s Calibrate Compute Scale
If each synapse is 1 FLOP (i.e., can fire / not fire once per second),
Then human brain requires 1015 flops = 1 petaflop.
NVIDIA’s DGX-2 = 2 petaflops in one server rack! ($400,000 price tag)
2 TPU-v3 slices = ~1 petaflop in Google Cloud at $16/hour ($5/hour if pre-emptible)
Compute per Major Result
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Source: OpenAI
ImplicaKons for research agenda?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
DATA
COMPUTE
HUMAN INGENUITY
Key Enablers for AI Progress?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Problem Territory 1 Problem Territory 2
E.g., Deep Learning to Learn
n Sampling of research direchons
n Overarching research theme
n How to keep up
Outline
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n (mostly) keeping up with (mostly) not reading papers
n when you decide to read papers
How to Keep Up
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n (mostly) keeping up with (mostly) not reading papers
n when you decide to read papers
How to Keep Up
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Number of arxiv papers submitted in AI categories
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
[source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106]
n Tutorials at conferences (e.g., ICML, NeurIPS)
n Graduate courses and seminars
n Yannic Kilcher youtube channel – explains papers
n Two Minute Papers youtube channel
n The Batch – newsletter by Andrew Ng
n Import AI – newsletter by Jack Clark
Great Resources
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n (mostly) keeping up with (mostly) not reading papers
n when you decide to read papers
How to Keep Up
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
Which papers to read? (1/2)
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Tutorials at conferences (e.g., ICML, NeurIPS)
n Graduate courses and seminars
n Yannic Kilcher youtube channel – explains papers
n Two Minute Papers youtube channel
n The Batch – newsletter by Andrew Ng
n Import AI – newsletter by Jack Clark
n Arxiv sanity (by Karpathy)
n Twitter
n Jack Clark, Karpathy, Ian Goodfellow, hardmaru, smerity, pabbeel, …
n AI/DL Facebook Group: Very active group where members post anything from news
articles to blog posts to general ML questions
n ML Subreddit
Which papers to read? (2/2)
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
How to Read a Paper?
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
• Read the title, abstract, section headers, and figures
• Try and find slides or a video on the paper (these do not
have to be by the authors).
• Read the introduction (Jennifer Widom)
• What is the problem?
• Why is it interesting and important?
• Why is it hard? (e.g., why do naive approaches
fail?)
• Why hasn't it been solved before? (Or, what's
wrong with previous proposed solutions? How
does this paper differ?)
• What are the key components of this approach
and results? Any limitations?
• Skim the related work.
• Is there any related work you are familiar with?
• If so, how does this paper relate to those works?
• Skim the technical section.
• Where are the novelties?
• What are the assumptions?
• Read the experiments.
• What questions are the experiments answering?
• What questions are the experiments not
answering?
• What baselines do they compare against?
• How strong are these baselines?
• Is the experiment methodology sound?
• Do the results support their claims?
• Read the conclusion/discussion.
• Read the technical section.
• Read in “good faith” (i.e., assume the authors are
correct)
• Skip over confusing parts that don’t seem
fundamental
• If any important part is confusing, see if the
material is in class slides or prior papers
• Read the paper as you see fit.
[source: Gregory Kahn]
n Read papers together with friends:
n Mode 1: everyone reads, then discusses
n Mode 2: 1 (or 2) people read, give small tutorial to others
Reading Group
FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
n Become one of the world’s experts in a topic you really care
about
n Technically deep and demanding
n Crudely: develop new tools/techniques rather than use
exishng tools/techniques
PS: Why do a PhD? (or not)
Thank you

Mais conteúdo relacionado

Mais procurados

Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Alexander Korbonits
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Variants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooVariants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooJaeJun Yoo
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorRoelof Pieters
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
[PR12] intro. to gans jaejun yoo
[PR12] intro. to gans   jaejun yoo[PR12] intro. to gans   jaejun yoo
[PR12] intro. to gans jaejun yooJaeJun Yoo
 
2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and modelsGe Org
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump StartMichele Toni
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksUniversitat Politècnica de Catalunya
 
Polong Lin(林伯龍)/how to approach data science problems from start to end
Polong Lin(林伯龍)/how to approach data science problems from start to endPolong Lin(林伯龍)/how to approach data science problems from start to end
Polong Lin(林伯龍)/how to approach data science problems from start to end台灣資料科學年會
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsArtifacia
 
Deep Reinforcement Learning by Pieter
Deep Reinforcement Learning by PieterDeep Reinforcement Learning by Pieter
Deep Reinforcement Learning by PieterBill Liu
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017mooopan
 

Mais procurados (20)

Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Variants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooVariants of GANs - Jaejun Yoo
Variants of GANs - Jaejun Yoo
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Object Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet IObject Detection Beyond Mask R-CNN and RetinaNet I
Object Detection Beyond Mask R-CNN and RetinaNet I
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
[PR12] intro. to gans jaejun yoo
[PR12] intro. to gans   jaejun yoo[PR12] intro. to gans   jaejun yoo
[PR12] intro. to gans jaejun yoo
 
2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models
 
SEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial NetworkSEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial Network
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump Start
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural NetworksTemporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
 
Polong Lin(林伯龍)/how to approach data science problems from start to end
Polong Lin(林伯龍)/how to approach data science problems from start to endPolong Lin(林伯龍)/how to approach data science problems from start to end
Polong Lin(林伯龍)/how to approach data science problems from start to end
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 
Deep Reinforcement Learning by Pieter
Deep Reinforcement Learning by PieterDeep Reinforcement Learning by Pieter
Deep Reinforcement Learning by Pieter
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 

Semelhante a Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)

“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...Edge AI and Vision Alliance
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...Tomoyuki Suzuki
 
Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?Venu Vasudevan
 
Improving search with neural ranking methods
Improving search with neural ranking methodsImproving search with neural ranking methods
Improving search with neural ranking methodsvoginip
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Universitat Politècnica de Catalunya
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...Lucidworks
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPGHye-min Ahn
 
2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...
2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...
2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...GIS in the Rockies
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingThomas Delteil
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
 
Matching Network
Matching NetworkMatching Network
Matching NetworkSuwhanBaek
 
Optimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problemsOptimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problemsByeongsuSim1
 
PyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings SlidesPyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings SlidesAaron (Ari) Bornstein
 
34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用komunling
 
2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»
2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»
2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»eMadrid network
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Recent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningRecent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningSungjoon Choi
 

Semelhante a Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021) (20)

“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
“From Inference to Action: AI Beyond Pattern Recognition,” a Keynote Presenta...
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
 
Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?Deep Learning for IoT : is there a shallow end of the pool?
Deep Learning for IoT : is there a shallow end of the pool?
 
Improving search with neural ranking methods
Improving search with neural ranking methodsImproving search with neural ranking methods
Improving search with neural ranking methods
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
Inside the Black Box: How Does a Neural Network Understand Names? - Philip Bl...
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
 
2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...
2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...
2015 FOSS4G Track: Visualization and Analysis of Spatiotemporal Data using Fr...
 
Gutenberg H4D Stanford 2019
Gutenberg H4D Stanford 2019Gutenberg H4D Stanford 2019
Gutenberg H4D Stanford 2019
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language Processing
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Matching Network
Matching NetworkMatching Network
Matching Network
 
Optimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problemsOptimal transport driven CycleGAN for unsupervised learning in inverse problems
Optimal transport driven CycleGAN for unsupervised learning in inverse problems
 
PyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings SlidesPyConIL 2019 Beyond word Embeddings Slides
PyConIL 2019 Beyond word Embeddings Slides
 
Learning with Unpaired Data
Learning with Unpaired DataLearning with Unpaired Data
Learning with Unpaired Data
 
34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用34th.余凯.机器学习进展及语音图像中的应用
34th.余凯.机器学习进展及语音图像中的应用
 
2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»
2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»
2021_06_30 «Stealth Assessment of Physics Understanding in Physics Playground»
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Recent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningRecent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy Learning
 

Mais de Sergey Karayev

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Sergey Karayev
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningSergey Karayev
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningSergey Karayev
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningSergey Karayev
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSergey Karayev
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019Sergey Karayev
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Sergey Karayev
 

Mais de Sergey Karayev (14)

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep Learning
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep Learning
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)

  • 1. Full Stack Deep Learning Research Directions Pieter Abbeel, Sergey Karayev, Josh Tobin
  • 2. Number of arxiv papers submitted in AI categories FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin [source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106]
  • 3. n Sampling of research directions n Overall research theme n How to keep up Outline FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 4. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 5. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 6. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 7. n Works! n BUT: requires so much annotated data Recall: Deep Supervised Learning
  • 8. Yes! n Deep semi-supervised learning n Deep unsupervised learning So: Can we learn with less labels?
  • 9. n Assumption: n Classification problem n Each data-point belongs to one of the classes n Toy Example: “two moons” Semi-supervised Learning
  • 12. Yes! n Deep semi-supervised learning n Deep unsupervised learning So: Can we learn with less labels?
  • 13. Transfer with Multi-headed Networks … in Task 1 Task 2
  • 14. n Key hypothesis: n IF neural network smart enough to predict: n Next frame in video n Next word in sentence n Generate realistic images n ``Translate’’ images n … n THEN same neural network is ready to do Deep Supervised Learning from very small data-set Deep Unsupervised Learning Task 1 Task 2
  • 15. Transfer from Unsupervised Learning … in Task 1 = unsupervised Task 2 = real task
  • 16. Example Setting … text Task 1 = predict next word Task 2 = predict sentiment
  • 17. Prompt: A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown. GPT-2: The incident occurred on the downtown train line, which runs from Covington and Ashland stations. In an email to Ohio news outlets, the U.S. Department of Energy said it is working with the Federal Railroad Administration to find the thief. ‘The theft of this nuclear material will have significant negative consequences on public and environmental health, our workforce and the economy of our nation,’ said Tom Hicks, the U.S. Energy Secretary, in a statement. ‘Our top priority is to secure the theft and ensure it doesn’t happen again.’ The stolen material was taken from the University of Cincinnati’s Research Triangle Park nuclear research site, according to a news release from Department officials. The Nuclear Regulatory Commission did not immediately release any information. According to the release, the U.S. Department of Energy’s Office of Nuclear Material Safety and Security is leading that team’s investigation. ‘The safety of people, the environment and the nation’s nuclear stockpile is our highest priority,’ Hicks said. ‘We will get to the bottom of this and make no excuses.’ Next Word Prediction (OpenAI’s GPT-2)
  • 18. Prompt: Recycling is good for the world, no, you could not be more wrong. GPT-2: Recycling is NOT good for the world. It is bad for the environment, it is bad for our health, and it is bad for our economy. I’m not kidding. Recycling is not good for the environment. It is destructive to the earth and it is a major contributor to global warming. Recycling is not good for our health. It contributes to obesity and diseases like heart disease and cancer. Recycling is bad for our economy. It increases the cost of a product, and in turn, the price of everything that is made with that product. Recycling is not good for our nation. We pay a tremendous price for the privilege of having the world’s most advanced and efficient recycling system. Recycling is a huge, colossal waste of time, energy, money, and resources. Next Word Prediction (OpenAI’s GPT-2)
  • 19. Text Generation (OpenAI’s GPT-2) Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope [Radford et al, 2019]
  • 25. n Main idea: n GPT-2: predict next word / token n BERT: predict a word / token that was removed BERT
  • 26. BERT
  • 27. BERT
  • 28. BERT
  • 29. Downstream Tasks - NLP (BERT Revolution) [https://gluebenchmark.com/leaderboard]
  • 30. Unsupervised Learning in Vision … Image Task 1 = fill in a patch Task 2 = predict cat vs. dog
  • 35. SimCLR + linear classifier
  • 36. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 37. Reinforcement Learning (RL) Robot + Environment max ✓ E[ H X t=0 R(st)|⇡✓] [Image credit: Sutton and Barto 1998] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 38. Reinforcement Learning (RL) Robot + Environment n Compared to supervised learning, additional challenges: n Credit assignment n Stability n Exploration [Image credit: Sutton and Barto 1998] max ✓ E[ H X t=0 R(st)|⇡✓] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 39. Deep RL: Atari DQN Mnih et al, NIPS 2013 / Nature 2015 MCTS Guo et al, NIPS 2014; TRPO Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015; A3C Mnih et al, ICML 2016; Dueling DQN Wang et al ICML 2016; Double DQN van Hasselt et al, AAAI 2016; Prioritized Experience Replay Schaul et al, ICLR 2016; Bootstrapped DQN Osband et al, 2016; Q-Ensembles Chen et al, 2017; Rainbow Hessel et al, 2017; Accelerated Stooke and Abbeel, 2018; … FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 40. [Source: Mnih et al., Nature 2015 (DeepMind) ] Deep Q-Network (DQN): From Pixels to Joystick Commands 32 8x8 filters with stride 4 + ReLU 64 4x4 filters with stride 2 + ReLU 64 3x3 filters with stride 1 + ReLU fully connected 512 units + ReLU fully connected output units, one per action
  • 41. Deep RL: Go AlphaGo Silver et al, Nature 2015 AlphaGoZero Silver et al, Nature 2017 Tian et al, 2016; Maddison et al, 2014; Clark et al, 2015 AlphaZero Silver et al, 2017 FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 42. Deep RL: Robot Locomotion ^ TRPO Schulman et al, 2015 + GAE Schulman et al, 2016 See also: DDPG Lillicrap et al 2015; SVG Heess et al, 2015; Q-Prop Gu et al, 2016; Scaling up ES Salimans et al, 2017; PPO Schulman et al, 2017; Parkour Heess et al, 2017; FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 43. Deep RL: Robot Locomotion [Peng, Abbeel, Levine, van de Panne, 2018] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 44. Deep RL Success: Dynamic Animation [Peng, Abbeel, Levine, van de Panne, 2018] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 45. BRETT: Berkeley Robot for the Elimination of Tedious Tasks [Levine*, Finn*, Darrell, Abbeel, JMLR 2016]
  • 46. n Rigid rods connected by elastic cables n Controlled by motors that extend / contract cables n Properties: n Lightweight n Low cost n Capable of withstanding significant impact n NASA investigates them for space exploration n Major challenge: control Tensegrity Robotics: NASA SuperBall [Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 47. [Geng*, Zhang*, Bruce*, Caluwaerts, Vespignani, Sunspiral, Abbeel, Levine, ICRA 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 48. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Trained with domain randomization [OpenAI, 2019]
  • 50. This is the entry point of AI Robotics into the real world Autonomous Order Picking 29-January-2020 Knapp Pick-It-Easy powered by Covariant Brain
  • 51. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 52. Mastery? Yes Deep RL (DQN) vs. Human Score: 18.9 Score: 9.3 FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 53. How fast is the learning itself? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 54. Humans vs. DDQN Black dots: human play Blue curve: mean of human play Blue dashed line: ‘expert’ human play Red dashed lines: DDQN after 10, 25, 200M frames (~ 46, 115, 920 hours) [Tsividis, Pouncy, Xu, Tenenbaum, Gershman, 2017] Humans after 15 minutes tend to outperform DDQN after 115 hours FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 55. How to bridge this gap? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 56. Can visual RL achieve same data-efficiency as RL on state? ● State-based D4PG (blue) vs pixel-based D4PG (green) 60M training steps 60M training steps [Tassa et al., 2018] Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., Casas, D.D.L., Budden, D., Abdolmaleki, A., Merel, J., Lefrancq, A. and Lillicrap, T DeepMind Control Suite, arxiv:1801.00690, 2018. Pixel-based needs > 50M more training steps than state-based to solve same tasks Pieter Abbeel
  • 57. Contrastive learning: SOTA in computer vision [Henaff et al., 2019] Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord Data-Efficient Image Recognition with Contrastive Coding arxiv:1905.09272, 2019. [Chen et al., 2020] Chen, T., Kornblith, S., Norouzi, M. and Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations arxiv:2002.05709, 2020. CPCv2 top-5 ImageNet accuracy as function of labels SimCLR top-1 ImageNet accuracy as function of # of parameters [Henaff, Srinivas et al., 2019] [Chen et al., 2020] Pieter Abbeel
  • 58. Contrastive + RL [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Observations are stacked frames Need to define: 1. query / key pairs 2. similarity measure 3. architecture Pieter Abbeel
  • 59. 1. Query / key pairs: random crop query key [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Pieter Abbeel
  • 60. 2. Bilinear inner product with learned weight matrix [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Pieter Abbeel
  • 61. 3. Keys encoded with momentum [He et al., 2019] He, K., Fan, H., Wu, Y., Xie, S. and Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning arxiv:1911.05722, 2019. Similar to MoCo [He et al.] Pieter Abbeel
  • 62. CURL from pixels matches state-based SAC GRAY: SAC State RED: CURL [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Pieter Abbeel
  • 63. CURL Comparison: DeepMind Control Suite [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Pieter Abbeel
  • 64. Atari performance benchmarked at 100K frames [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Pieter Abbeel CURL Comparison: Atari
  • 65. Hard Environments Environment training steps 1 = 1M GRAY: SAC State RED: CURL Environment training steps 1 = 100M [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Observation: similar struggle asymptotically for pixel-based Deep RL Pieter Abbeel
  • 66. Predicting state from pixels Higher prediction error correlates with more challenging environments [CURL: A Srinivas*, M Laskin*, P Abbeel, 2020] Pieter Abbeel
  • 67. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 68. Starting Observations n TRPO, DQN, A3C, DDPG, PPO, Rainbow, … are fully general RL algorithms n i.e., for any environment that can be mathematically defined, these algorithms are equally applicable n Environments encountered in real world = tiny, tiny subset of all environments that could be defined (e.g. they all satisfy our universe’s physics) Can we develop “fast” RL algorithms that take advantage of this? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 69. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] Agent Environment a s r FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 70. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] RL Algorithm Policy Environment a o r RL Agent FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 71. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] RL Algorithm Policy Environment A a s r … RL Agent RL Algorithm Policy Environment B a s r RL Agent FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 72. Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] … RL Algorithm Policy Environment A a o r RL Algorithm Policy Environment B a o r RL Agent RL Agent Traditional RL research: • Human experts develop the RL algorithm • After many years, still no RL algorithms nearly as good as humans… Alternative: • Could we learn a better RL algorithm? • Or even learn a better entire agent? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 73. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A Meta RL Algorithm Environment B … Meta-training environments FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 74. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A a r,o Meta RL Algorithm Environment B … Environment F Meta-training environments Testing environments FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 75. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A a r,o Meta RL Algorithm Environment B … Environment G Meta-training environments Testing environments FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 76. Meta-Reinforcement Learning [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] "Fast” RL Agent Environment A a r,o Meta RL Algorithm Environment B … Environment H Meta-training environments Testing environments FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 77. Formalizing Learning to Reinforcement Learn [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] max ✓ EM E⌧ (k) M " K X k=1 R(⌧ (k) M ) | RLagent✓ # FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 78. Formalizing Learning to Reinforcement Learn [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] M : sample MDP ⌧ (k) M : k’th trajectory in MDP M max ✓ EM E⌧ (k) M " K X k=1 R(⌧ (k) M ) | RLagent✓ # Meta-train: max ✓ X M2Mtrain E⌧ (k) M " K X k=1 R(⌧ (k) M ) | RLagent✓ # FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 79. n RLagent = RNN = generic computation architecture n different weights in the RNN means different RL algorithm and prior n different activations in the RNN means different current policy n meta-train objective can be optimized with an existing (slow) RL algorithm Representing RLagent✓ max ✓ X M2Mtrain E⌧ (k) M " K X k=1 R(⌧ (k) M ) | RLagent✓ # [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] also: [Wang et al, 2016] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 80. Evaluation: Multi-Armed Bandits n Multi-Armed Bandits setting n Each bandit has its own distribution over pay-outs n Each episode = choose 1 bandit n Good RL agent should explore bandits sufficiently, yet also exploit the good/best ones n Provably (asymptotically) optimal RL algorithms have been invented by humans: Gittins index, UCB1, Thompson sampling, … [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 81. [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] Evaluation: Multi-Armed Bandits We consider Bayesian evaluation setting. Some of these prior works also have adversarial guarantees, which we don’t consider here. FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 82. Evaluation: Visual Navigation Agent’s view Maze Agent input: current image Agent action: straight / 2 degrees left / 2 degrees right Map just shown for our purposes, but not available to agent Related work: Mirowski, et al, 2016; Jaderberg et al, 2016; Mnih et al, 2016; Wang et al, 2016 FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 83. Evaluation: Visual Navigation After learning-to-learn Before learning-to-learn [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] Agent input: current image Agent action: straight / 2 degrees left / 2 degrees right Map just shown for our purposes, but not available to agent FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 84. [Duan, Schulman, Chen, Bartlett, Sutskever, Abbeel, 2016] Meta-Learning Curves FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 85. Meta Learning for RL Task distribution: different environments n Schmidhuber. Evolutionary principles in self-referential learning. (1987) n Wiering, Schmidhuber. Solving POMDPs with Levin search and EIRA. (1996) n Schmidhuber, Zhao, Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. (MLJ 1997) n Schmidhuber, Zhao, Schraudolph. Reinforcement learning with self-modifying policies (1998) n Zhao, Schmidhuber. Solving a complex prisoner’s dilemma with self-modifying policies. (1998) n Schmidhuber. A general method for incremental self-improvement and multiagent learning. (1999) n Singh, Lewis, Barto. Where do rewards come from? (2009) n Singh, Lewis, Barto. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective (2010) n Niekum, Spector, Barto. Evolution of reward functions for reinforcement learning (2011) n Duan et al., (2016) RL2: Fast Reinforcement Learning via Slow Reinforcement Learning n Wang et al., (2016) Learning to Reinforcement Learn n Finn et al., (2017) Model-Agnostic Meta-Learning (MAML) n Mishra, Rohinenjad et al., (2017) Simple Neural AttentIve meta-Learner n Frans et al., (2017) Meta-Learning Shared Hierarchies FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 86. Meta Learning for RL Task distribution: different environments n Schmidhuber. Evolutionary principles in self-referential learning. (1987) n Wiering, Schmidhuber. Solving POMDPs with Levin search and EIRA. (1996) n Schmidhuber, Zhao, Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. (MLJ 1997) n Schmidhuber, Zhao, Schraudolph. Reinforcement learning with self-modifying policies (1998) n Zhao, Schmidhuber. Solving a complex prisoner’s dilemma with self-modifying policies. (1998) n Schmidhuber. A general method for incremental self-improvement and multiagent learning. (1999) n Singh, Lewis, Barto. Where do rewards come from? (2009) n Singh, Lewis, Barto. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective (2010) n Niekum, Spector, Barto. Evolution of reward functions for reinforcement learning (2011) n Duan et al., (2016) RL2: Fast Reinforcement Learning via Slow Reinforcement Learning n Wang et al., (2016) Learning to Reinforcement Learn n Finn et al., (2017) Model-Agnostic Meta-Learning (MAML) n Mishra, Rohinenjad et al., (2017) Simple Neural AttentIve meta-Learner n Frans et al., (2017) Meta-Learning Shared Hierarchies FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 87. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 88. Imitation Learning in Robotics [Abbeel et al. 2008] [Kolter et al. 2008] [Ziebart et al. 2008] [Schulman et al. 2013] [Finn et al. 2016] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 89. Imitation Learning FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 90. Imitation Learning FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 91. One-Shot Imitation Learning [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017]
  • 92. One-Shot Imitation Learning [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 93. One-Shot Imitation Learning [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 94. Learning a One-Shot Imitator One-shot imitator [Figure credit: Bradly Stadie] Demo 1 of task i Demo 2 of task i FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 95. Proof-of-concept: Block Stacking n Each task is specified by a desired final layout n Example: abcd n “Place c on top of d, place b on top of c, place a on top of b.” [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 96. Evaluation [Duan, Andrychowicz, Stadie, Ho, Schneider, Sutskever, Abbeel, Zaremba, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 97. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many Exciting Directions in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 98. Compared to the real world, simulated data collection is… n Less expensive n Faster / more scalable n Less dangerous n Easier to label Motivation for Simulation How can we learn useful real- world skills in the simulator? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 99. Approach 1 – Use Realistic Simulated Data Simulation Real world Carefully match the simulation to the world [1,2,3,4] Augment simulated data with real data [5,6] GTA V Real world [5] Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun. Playing for data: Ground truth from computer games (2016) [6] Bousmalis et al. Using simulation and domain adaptation to improve efficiency of robotic grasping (2017) [1] Stephen James, Edward Johns. 3d simulation for robot arm control with deep q-learning (2016) [2] Johns, Leutenegger, Davision. Deep learning a grasp function for grasping under gripper pose uncertainty (2016) [3] Mahler et al, Dex-Net 3.0 (2017) [4] Koenemann et al. Whole-body model-predictive control applied to the HRP-2 humanoid. (2015) FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 100. Approach 2 – Domain Confusion / Adaptation Andrei A Rusu, Matej Vecerik, Thomas Rotho ̈rl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell. Sim- to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286, 2016. Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, Trevor Darrell. Deep Domain Confusion: Maximizing for Domain Invariance. arXiv preprint arXiv:1412.3474, 2014. FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 101. Approach 3 – Domain Randomization If the model sees enough simulated variation, the real world may look like just the next simulator FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 102. Domain Randomization n Quadcopter collision avoidance n ~500 semi-realistic textures, 12 floor plans n ~40-50% of 1000m trajectories are collision-free (cad)ˆ2 rl: Real Single-Image Flight Without a Single Real Image. [3] Fereshteh Sadeghi and Sergey Levine. (cad)ˆ2 rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016. FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 103. Domain Randomization for Pose Estimation n Precise object pose localization n 100K images with simple randomly generated textures [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
  • 104. How does it work? More Data = Better [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
  • 105. More Textures = Beer [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017]
  • 106. Pre-Training is not Necessary [Tobin, Fong, Ray, Schneider, Zaremba, Abbeel, 2017] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 107. Domain Randomization for Grasping [Tobin, Zaremba, Abbeel, 2017] Hypothesis: Training on a diverse array of procedurally generated objects can produce comparable performance to training on realistic object meshes. [see also: Levine et al and Goldberg et al] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 108. Evaluation: Real Robot FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 109. FSDL Bootcamp -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Trained with domain randomization [OpenAI, 2019]
  • 110. n Unsupervised Learning n Reinforcement Learning n Unsupervised RL n Meta-Reinforcement Learning n Few-Shot Imitation n Domain Randomization n DL for Science and Engineering Many ExciKng DirecKons in AI FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Mitigating Bias n Multi-modal Learning n Architecture Search n Value Alignment n Scaling Laws n Human-in-the-Loop n Explainability
  • 111.
  • 112.
  • 114.
  • 115. n DL to predict properties à optimize molecule against desired properties n Synthesis à DL to propose synthesis steps Molecule Design Exploring Machine Learning Applications to Enable Next-Generation Chemistry Jennifer Wei (Google) https://docs.google.com/presentation/d/1zGoxOMWmid25hgtSgVkQYu1- GP9PT3EQ2_tzOlQZcF4/edit#slide=id.p1
  • 116. A General Approach to Speed Up Design BagNet: Berkeley Analog Generator with Layout OpAmizer Boosted with Deep Neural Networks K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic. IEEE/ACM Interna]onal Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019. h`ps://arxiv.org/abs/1907.10515
  • 117. A General Approach to Speed Up Design BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic. IEEE/ACM International Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019. https://arxiv.org/abs/1907.10515
  • 118. A General Approach to Speed Up Design BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic. IEEE/ACM International Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019. https://arxiv.org/abs/1907.10515
  • 119. GANs in High-Energy Physics GANs for HEP Ben Nachman https://drive.google.com/file/d/1op6Q6OuVZvJ4VbtLkemi3oJWtSBx5FCc/view
  • 120. Symbolic Math: Integrals and ODEs [Lample and Charton, ICLR 2020]
  • 121. Symbolic Math: Integrals and ODEs [Lample and Charton, ICLR 2020]
  • 122. n https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology n AlphaFold: Improved protein structure prediction using potentials from deep learning Deepmind (Senior et al) https://www.nature.com/articles/s41586-019-1923-7 n BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks K. Hakhamaneshi, N. Werblun, P. Abbeel, V. Stojanovic. IEEE/ACM International Conference on Computer-Aided Design (ICAD), Westminster, Colorado, November 2019. https://arxiv.org/abs/1907.10515 n Evaluating Protein Transfer Learning with TAPE R. Rao, N. Bhattacharya, N. Thomas, Y, Duan, X. Chen, J. Canny, P. Abbeel, Y. Song https://www.biorxiv.org/content/10.1101/676825v1 n Opening the black box: the anatomy of a deep learning atomistic potential Justin Smith https://drive.google.com/file/d/1f1iiXKzxNbNz5lL5x2ob9xGYLHNHhxeU/view n Exploring Machine Learning Applications to Enable Next-Generation Chemistry Jennifer Wei (Google) https://docs.google.com/presentation/d/1zGoxOMWmid25hgtSgVkQYu1-GP9PT3EQ2_tzOlQZcF4/edit#slide=id.p1 n GANs for HEP Ben Nachman https://drive.google.com/file/d/1op6Q6OuVZvJ4VbtLkemi3oJWtSBx5FCc/view n Deep Learning for Symbolic Mathematics G. Lample and F. Charton https://openreview.net/pdf?id=S1eZYeHFDS n A Survey of Deep Learning for Scientific Discovery Maithra Raghu, Eric Schmidt https://arxiv.org/abs/2003.11755 Learn more about Deep Learning for Science/Engineering?
  • 123. n Sampling of research directions n Overarching research trend n How to keep up Outline FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 124. Compute Increasing Rapidly [20-April-2018] FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 125. Architecture Num neurons Num synapses Fly 100K = 105 10M = 107 AlexNet 650K = 106 60M = 108 Mouse 100M = 108 100B = 1011 Human 100B = 1011 1014 -1015 Let’s Calibrate Compute Scale If each synapse is 1 FLOP (i.e., can fire / not fire once per second), Then human brain requires 1015 flops = 1 petaflop. NVIDIA’s DGX-2 = 2 petaflops in one server rack! ($400,000 price tag) 2 TPU-v3 slices = ~1 petaflop in Google Cloud at $16/hour ($5/hour if pre-emptible)
  • 126. Compute per Major Result FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Source: OpenAI
  • 127. ImplicaKons for research agenda? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 128. DATA COMPUTE HUMAN INGENUITY Key Enablers for AI Progress? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin Problem Territory 1 Problem Territory 2 E.g., Deep Learning to Learn
  • 129. n Sampling of research direchons n Overarching research theme n How to keep up Outline FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 130. n (mostly) keeping up with (mostly) not reading papers n when you decide to read papers How to Keep Up FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 131. n (mostly) keeping up with (mostly) not reading papers n when you decide to read papers How to Keep Up FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 132. Number of arxiv papers submitted in AI categories FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin [source: https://medium.com/@karpathy/a-peek-at-trends-in-machine-learning-ab8a1085a106]
  • 133. n Tutorials at conferences (e.g., ICML, NeurIPS) n Graduate courses and seminars n Yannic Kilcher youtube channel – explains papers n Two Minute Papers youtube channel n The Batch – newsletter by Andrew Ng n Import AI – newsletter by Jack Clark Great Resources FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 134. n (mostly) keeping up with (mostly) not reading papers n when you decide to read papers How to Keep Up FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 135. Which papers to read? (1/2) FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin n Tutorials at conferences (e.g., ICML, NeurIPS) n Graduate courses and seminars n Yannic Kilcher youtube channel – explains papers n Two Minute Papers youtube channel n The Batch – newsletter by Andrew Ng n Import AI – newsletter by Jack Clark
  • 136. n Arxiv sanity (by Karpathy) n Twitter n Jack Clark, Karpathy, Ian Goodfellow, hardmaru, smerity, pabbeel, … n AI/DL Facebook Group: Very active group where members post anything from news articles to blog posts to general ML questions n ML Subreddit Which papers to read? (2/2) FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 137. How to Read a Paper? FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin • Read the title, abstract, section headers, and figures • Try and find slides or a video on the paper (these do not have to be by the authors). • Read the introduction (Jennifer Widom) • What is the problem? • Why is it interesting and important? • Why is it hard? (e.g., why do naive approaches fail?) • Why hasn't it been solved before? (Or, what's wrong with previous proposed solutions? How does this paper differ?) • What are the key components of this approach and results? Any limitations? • Skim the related work. • Is there any related work you are familiar with? • If so, how does this paper relate to those works? • Skim the technical section. • Where are the novelties? • What are the assumptions? • Read the experiments. • What questions are the experiments answering? • What questions are the experiments not answering? • What baselines do they compare against? • How strong are these baselines? • Is the experiment methodology sound? • Do the results support their claims? • Read the conclusion/discussion. • Read the technical section. • Read in “good faith” (i.e., assume the authors are correct) • Skip over confusing parts that don’t seem fundamental • If any important part is confusing, see if the material is in class slides or prior papers • Read the paper as you see fit. [source: Gregory Kahn]
  • 138. n Read papers together with friends: n Mode 1: everyone reads, then discusses n Mode 2: 1 (or 2) people read, give small tutorial to others Reading Group FSDL -- Pieter Abbeel -- Sergey Karayev -- Josh Tobin
  • 139. n Become one of the world’s experts in a topic you really care about n Technically deep and demanding n Crudely: develop new tools/techniques rather than use exishng tools/techniques PS: Why do a PhD? (or not)