SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
Article overview by
Ilya Kuzovkin
David Silver et al. from Google DeepMind
Reinforcement Learning Seminar
University of Tartu, 2016
Mastering the game of Go with deep
neural networks and tree search
THE GAME OF GO
BOARD
BOARD
STONES
BOARD
STONES
GROUPS
BOARD
STONES
LIBERTIES
GROUPS
BOARD
STONES
LIBERTIES
CAPTURE
GROUPS
BOARD
STONES
LIBERTIES
CAPTURE KO
GROUPS
BOARD
STONES
LIBERTIES
CAPTURE KO
GROUPS
EXAMPLES
BOARD
STONES
LIBERTIES
CAPTURE KO
GROUPS
EXAMPLES
BOARD
STONES
LIBERTIES
CAPTURE KO
GROUPS
EXAMPLES
BOARD
STONES
LIBERTIES
CAPTURE
TWO EYES
KO
GROUPS
EXAMPLES
BOARD
STONES
LIBERTIES
CAPTURE
FINAL COUNT
KO
GROUPS
EXAMPLES
TWO EYES
TRAINING
Supervised
policy network
p (a|s)
Reinforcement
policy network
p⇢(a|s)
Rollout policy
network
p⇡(a|s)
Value!
network
v✓(s)
Tree policy
network
p⌧ (a|s)
TRAINING THE BUILDING BLOCKS
SUPERVISED
CLASSIFICATION
REINFORCEMENT
SUPERVISED
REGRESSION
Supervised
policy network
p (a|s)
Supervised
policy network
p (a|s)
19 x 19 x 48 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
1 convolutional layer 1x1
ReLU
Softmax
Supervised
policy network
p (a|s)
19 x 19 x 48 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Softmax
1 convolutional layer 1x1
ReLU
• 29.4M positions from games
between 6 to 9 dan players
Supervised
policy network
p (a|s)
19 x 19 x 48 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Softmax • stochastic gradient ascent
!
• learning rate = 0.003,
halved every 80M steps
!
• batch size m = 16
!
• 3 weeks on 50 GPUs to make
340M steps
↵1 convolutional layer 1x1
ReLU
• 29.4M positions from games
between 6 to 9 dan players
Supervised
policy network
p (a|s)
19 x 19 x 48 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Softmax • stochastic gradient ascent
!
• learning rate = 0.003,
halved every 80M steps
!
• batch size m = 16
!
• 3 weeks on 50 GPUs to make
340M steps
↵1 convolutional layer 1x1
ReLU
• 29.4M positions from games
between 6 to 9 dan players
• Augmented: 8 reflections/rotations
• Test set (1M) accuracy: 57.0%
• 3 ms to select an action
19 X 19 X 48 INPUT
19 X 19 X 48 INPUT
19 X 19 X 48 INPUT
19 X 19 X 48 INPUT
Rollout policy p⇡(a|s)
• Supervised — same data as
• Less accurate: 24.2% (vs. 57.0%)
• Faster: 2μs per action (1500 times)
• Just a linear model with softmax
p (a|s)
Rollout policy p⇡(a|s)
• Supervised — same data as
• Less accurate: 24.2% (vs. 57.0%)
• Faster: 2μs per action (1500 times)
• Just a linear model with softmax
p (a|s)
Rollout policy p⇡(a|s) Tree policy p⌧ (a|s)
• Supervised — same data as
• Less accurate: 24.2% (vs. 57.0%)
• Faster: 2μs per action (1500 times)
• Just a linear model with softmax
p (a|s)
Rollout policy p⇡(a|s)
• “similar to the rollout policy but
with more features”
Tree policy p⌧ (a|s)
• Supervised — same data as
• Less accurate: 24.2% (vs. 57.0%)
• Faster: 2μs per action (1500 times)
• Just a linear model with softmax
p (a|s)
Reinforcement
policy network
p⇢(a|s)
Same architecture
Weights are initialized with⇢
Reinforcement
policy network
p⇢(a|s)
Same architecture
Weights are initialized with⇢
• Self-play: current network vs.
randomized pool of previous versions
Reinforcement
policy network
p⇢(a|s)
Same architecture
Weights are initialized with⇢
• Self-play: current network vs.
randomized pool of previous versions
• Play a game until the end, get the reward zt = ±r(sT ) = ±1
Reinforcement
policy network
p⇢(a|s)
Same architecture
Weights are initialized with⇢
• Self-play: current network vs.
randomized pool of previous versions
• Play a game until the end, get the reward
• Set and play the same game again, this time
updating the network parameters at each time step t
zt = ±r(sT ) = ±1
zi
t = zt
Reinforcement
policy network
p⇢(a|s)
Same architecture
Weights are initialized with⇢
• Self-play: current network vs.
randomized pool of previous versions
• Play a game until the end, get the reward
• Set and play the same game again, this time
updating the network parameters at each time step t
• = …
‣ 0 “on the first pass through the training pipeline”
‣ “on the second pass”
zt = ±r(sT ) = ±1
zi
t = zt
v(si
t)
v✓(si
t)
Reinforcement
policy network
p⇢(a|s)
Same architecture
Weights are initialized with⇢
• Self-play: current network vs.
randomized pool of previous versions
• Play a game until the end, get the reward
• Set and play the same game again, this time
updating the network parameters at each time step t
• = …
‣ 0 “on the first pass through the training pipeline”
‣ “on the second pass”
• batch size n = 128 games
• 10,000 batches
• One day on 50 GPUs
zt = ±r(sT ) = ±1
zi
t = zt
v(si
t)
v✓(si
t)
Reinforcement
policy network
p⇢(a|s)
Same architecture
Weights are initialized with⇢
• Self-play: current network vs.
randomized pool of previous versions
• 80% wins against Supervised Network
• 85% wins against Pachi (no search yet!)
• 3 ms to select an action
• Play a game until the end, get the reward
• Set and play the same game again, this time
updating the network parameters at each time step t
• = …
‣ 0 “on the first pass through the training pipeline”
‣ “on the second pass”
• batch size n = 128 games
• 10,000 batches
• One day on 50 GPUs
zt = ±r(sT ) = ±1
zi
t = zt
v(si
t)
v✓(si
t)
Value!
network
v✓(s)
19 x 19 x 49 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Fully connected layer
256 ReLU units
Value!
network
v✓(s)
Fully connected layer
1 tanh unit
1 convolutional layer 1x1
ReLU
19 x 19 x 49 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Fully connected layer
256 ReLU units
Value!
network
v✓(s)
Fully connected layer
1 tanh unit
• Evaluate the value of the position s under policy p:
!
• Double approximation
1 convolutional layer 1x1
ReLU
19 x 19 x 49 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Fully connected layer
256 ReLU units
Value!
network
v✓(s)
Fully connected layer
1 tanh unit
• Evaluate the value of the position s under policy p:
!
• Double approximation
1 convolutional layer 1x1
ReLU
• Stochastic gradient descent to
minimize MSE
19 x 19 x 49 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Fully connected layer
256 ReLU units
Value!
network
v✓(s)
Fully connected layer
1 tanh unit
• Evaluate the value of the position s under policy p:
!
• Double approximation
1 convolutional layer 1x1
ReLU
• Stochastic gradient descent to
minimize MSE
• Train on 30M state-outcome
pairs , each from a unique
game generated by self-play:
(s, z)
19 x 19 x 49 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Fully connected layer
256 ReLU units
Value!
network
v✓(s)
Fully connected layer
1 tanh unit
• Evaluate the value of the position s under policy p:
!
• Double approximation
1 convolutional layer 1x1
ReLU
• Stochastic gradient descent to
minimize MSE
• Train on 30M state-outcome
pairs , each from a unique
game generated by self-play:
‣ choose a random time step u
‣ sample moves t=1…u-1 from
SL policy
‣ make random move u
‣ sample t=u+1…T from RL
policy and get game
outcome z
‣ add pair to the
training set
(s, z)
(su, zu)
19 x 19 x 49 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Fully connected layer
256 ReLU units
Value!
network
v✓(s)
Fully connected layer
1 tanh unit
• Evaluate the value of the position s under policy p:
!
• Double approximation
1 convolutional layer 1x1
ReLU
• Stochastic gradient descent to
minimize MSE
• Train on 30M state-outcome
pairs , each from a unique
game generated by self-play:
‣ choose a random time step u
‣ sample moves t=1…u-1 from
SL policy
‣ make random move u
‣ sample t=u+1…T from RL
policy and get game
outcome z
‣ add pair to the
training set
• One week on 50 GPUs to train
on 50M batches of size m=32
(s, z)
(su, zu)
19 x 19 x 49 input
1 convolutional layer 5x5
with k=192 filters, ReLU
11 convolutional layers 3x3
with k=192 filters, ReLU
Fully connected layer
256 ReLU units
Value!
network
v✓(s)
Fully connected layer
1 tanh unit
• Evaluate the value of the position s under policy p:
!
• Double approximation
• MSE on the test set: 0.234
• Close to MC estimation from RL policy; 15,000 faster
1 convolutional layer 1x1
ReLU
• Stochastic gradient descent to
minimize MSE
• Train on 30M state-outcome
pairs , each from a unique
game generated by self-play:
‣ choose a random time step u
‣ sample moves t=1…u-1 from
SL policy
‣ make random move u
‣ sample t=u+1…T from RL
policy and get game
outcome z
‣ add pair to the
training set
• One week on 50 GPUs to train
on 50M batches of size m=32
(s, z)
(su, zu)
PLAYING
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
APV-MCTS
Each node s has edges (s, a) for all legal
actions and stores statistics:
Prior Number of
evaluations
Number of
rollouts
MC value
estimate
Rollout value
estimate
Combined
mean action
value
ASYNCHRONOUS POLICY AND VALUE MCTS
APV-MCTS
Each node s has edges (s, a) for all legal
actions and stores statistics:
Prior Number of
evaluations
Number of
rollouts
MC value
estimate
Rollout value
estimate
Combined
mean action
value
ASYNCHRONOUS POLICY AND VALUE MCTS
Simulation starts at the root and stops at
time L, when a leaf (unexplored state) is
found.
APV-MCTS
Each node s has edges (s, a) for all legal
actions and stores statistics:
Prior Number of
evaluations
Number of
rollouts
MC value
estimate
Rollout value
estimate
Combined
mean action
value
ASYNCHRONOUS POLICY AND VALUE MCTS
Simulation starts at the root and stops at
time L, when a leaf (unexplored state) is
found.
Position is added to evaluation queue.sL
APV-MCTS
Each node s has edges (s, a) for all legal
actions and stores statistics:
Prior Number of
evaluations
Number of
rollouts
MC value
estimate
Rollout value
estimate
Combined
mean action
value
ASYNCHRONOUS POLICY AND VALUE MCTS
Simulation starts at the root and stops at
time L, when a leaf (unexplored state) is
found.
Position is added to evaluation queue.sL
Bunch of nodes selected for evaluation…
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Node s is evaluated using
the value network to obtain
v✓(s)
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Node s is evaluated using
the value network to obtain
and using rollout simulation
with policy till the end of
each simulated game to
get the final game score.
p⇡(a|s)
v✓(s)
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Node s is evaluated using
the value network to obtain
and using rollout simulation
with policy till the end of
each simulated game to
get the final game score.
p⇡(a|s)
v✓(s)
Each leaf is evaluated, we are ready to propagate updates
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Statistics along the paths of each
simulation are updated during the
backward pass though t < L
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Statistics along the paths of each
simulation are updated during the
backward pass though t < L
visits counts are updated as well
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Statistics along the paths of each
simulation are updated during the
backward pass though t < L
visits counts are updated as well
Finally overall evaluation of each
visited state-action edge is updated
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Statistics along the paths of each
simulation are updated during the
backward pass though t < L
visits counts are updated as well
Finally overall evaluation of each
visited state-action edge is updated
Current tree is updated
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Once an edge (s, a) is visited enough ( )
times it is included into the tree with s’
nthr
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Once an edge (s, a) is visited enough ( )
times it is included into the tree with s’
nthr
It is initialized using the tree policy to
and updated with SL policy:
p⌧ (a|s0
)
⌧
APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS
Tree is expanded, fully updated and ready for the next move!
Once an edge (s, a) is visited enough ( )
times it is included into the tree with s’
nthr
It is initialized using the tree policy to
and updated with SL policy:
p⌧ (a|s0
)
⌧
https://www.youtube.com/watch?v=oRvlyEpOQ-8
WINNING
https://www.youtube.com/watch?v=oRvlyEpOQ-8

Mais conteúdo relacionado

Mais procurados

Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelSsu-Rui Lee
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
Double Q-learning Paper Reading
Double Q-learning Paper ReadingDouble Q-learning Paper Reading
Double Q-learning Paper ReadingTakato Yamazaki
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theorySeongwon Hwang
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNEuijin Jeong
 
Introduction to complex networks
Introduction to complex networksIntroduction to complex networks
Introduction to complex networksVincent Traag
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Statistics - SoftMax Equation
Statistics - SoftMax EquationStatistics - SoftMax Equation
Statistics - SoftMax EquationAndrew Ferlitsch
 
Sliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデルSliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデルohken
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...Sangwoo Mo
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 

Mais procurados (13)

Breaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language ModelBreaking the Softmax Bottleneck: a high-rank RNN Language Model
Breaking the Softmax Bottleneck: a high-rank RNN Language Model
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
Double Q-learning Paper Reading
Double Q-learning Paper ReadingDouble Q-learning Paper Reading
Double Q-learning Paper Reading
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
Introduction to complex networks
Introduction to complex networksIntroduction to complex networks
Introduction to complex networks
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Statistics - SoftMax Equation
Statistics - SoftMax EquationStatistics - SoftMax Equation
Statistics - SoftMax Equation
 
Sliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデルSliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデル
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 

Destaque

Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...
Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...
Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...Mark Briggs
 
Introduction to Machine Learning @ Mooncascade ML Camp
Introduction to Machine Learning @ Mooncascade ML CampIntroduction to Machine Learning @ Mooncascade ML Camp
Introduction to Machine Learning @ Mooncascade ML CampIlya Kuzovkin
 
Machine Learning and Data Mining: 03 Data Representation
Machine Learning and Data Mining: 03 Data RepresentationMachine Learning and Data Mining: 03 Data Representation
Machine Learning and Data Mining: 03 Data RepresentationPier Luca Lanzi
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov
 
Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Pier Luca Lanzi
 
Deep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical ToolsDeep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical ToolsIlya Kuzovkin
 
알파고 해부하기 2부
알파고 해부하기 2부알파고 해부하기 2부
알파고 해부하기 2부Donghun Lee
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureGrigory Sapunov
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 

Destaque (10)

Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...
Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...
Order denying Abbott Labs Motion for Summary Judgment and finding in favor of...
 
Introduction to Machine Learning @ Mooncascade ML Camp
Introduction to Machine Learning @ Mooncascade ML CampIntroduction to Machine Learning @ Mooncascade ML Camp
Introduction to Machine Learning @ Mooncascade ML Camp
 
Machine Learning and Data Mining: 03 Data Representation
Machine Learning and Data Mining: 03 Data RepresentationMachine Learning and Data Mining: 03 Data Representation
Machine Learning and Data Mining: 03 Data Representation
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016
 
Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016
 
Deep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical ToolsDeep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical Tools
 
알파고 해부하기 2부
알파고 해부하기 2부알파고 해부하기 2부
알파고 해부하기 2부
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and Future
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 

Semelhante a Mastering the game of Go with deep neural networks and tree search (article overview)

A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago ZeroChia-Ching Lin
 
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryBayes Nets meetup London
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascentjeykottalam
 
Dynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris GameDynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris GameSuelen Carvalho
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojureztellman
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2cindy071434
 
Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2cindy071434
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationAlexandre Monnin
 

Semelhante a Mastering the game of Go with deep neural networks and tree search (article overview) (20)

A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in IndustryRalf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
 
Dynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris GameDynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris Game
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojure
 
RLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdfRLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdf
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
Games
GamesGames
Games
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2
 
Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 

Mais de Ilya Kuzovkin

Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...Ilya Kuzovkin
 
The Brain and the Modern AI: Drastic Differences and Curious Similarities
The Brain and the Modern AI: Drastic Differences and Curious SimilaritiesThe Brain and the Modern AI: Drastic Differences and Curious Similarities
The Brain and the Modern AI: Drastic Differences and Curious SimilaritiesIlya Kuzovkin
 
The First Day at the Deep learning Zoo
The First Day at the Deep learning ZooThe First Day at the Deep learning Zoo
The First Day at the Deep learning ZooIlya Kuzovkin
 
Intuitive Intro to Gödel's Incompleteness Theorem
Intuitive Intro to Gödel's Incompleteness TheoremIntuitive Intro to Gödel's Incompleteness Theorem
Intuitive Intro to Gödel's Incompleteness TheoremIlya Kuzovkin
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Ilya Kuzovkin
 
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...Ilya Kuzovkin
 
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...Ilya Kuzovkin
 
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?Ilya Kuzovkin
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing MachinesIlya Kuzovkin
 
Neuroimaging: Intracortical, fMRI, EEG
Neuroimaging: Intracortical, fMRI, EEGNeuroimaging: Intracortical, fMRI, EEG
Neuroimaging: Intracortical, fMRI, EEGIlya Kuzovkin
 
Article Overview "Reach and grasp by people with tetraplegia using a neurally...
Article Overview "Reach and grasp by people with tetraplegia using a neurally...Article Overview "Reach and grasp by people with tetraplegia using a neurally...
Article Overview "Reach and grasp by people with tetraplegia using a neurally...Ilya Kuzovkin
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPUIlya Kuzovkin
 
Soft Introduction to Brain-Computer Interfaces and Machine Learning
Soft Introduction to Brain-Computer Interfaces and Machine LearningSoft Introduction to Brain-Computer Interfaces and Machine Learning
Soft Introduction to Brain-Computer Interfaces and Machine LearningIlya Kuzovkin
 
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer InterfacesIlya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer InterfacesIlya Kuzovkin
 

Mais de Ilya Kuzovkin (14)

Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...Understanding Information Processing in Human Brain by Interpreting Machine L...
Understanding Information Processing in Human Brain by Interpreting Machine L...
 
The Brain and the Modern AI: Drastic Differences and Curious Similarities
The Brain and the Modern AI: Drastic Differences and Curious SimilaritiesThe Brain and the Modern AI: Drastic Differences and Curious Similarities
The Brain and the Modern AI: Drastic Differences and Curious Similarities
 
The First Day at the Deep learning Zoo
The First Day at the Deep learning ZooThe First Day at the Deep learning Zoo
The First Day at the Deep learning Zoo
 
Intuitive Intro to Gödel's Incompleteness Theorem
Intuitive Intro to Gödel's Incompleteness TheoremIntuitive Intro to Gödel's Incompleteness Theorem
Intuitive Intro to Gödel's Incompleteness Theorem
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
 
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
 
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
 
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
NIPS2014 Article Overview: Do Deep Nets Really Need to be Deep?
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
 
Neuroimaging: Intracortical, fMRI, EEG
Neuroimaging: Intracortical, fMRI, EEGNeuroimaging: Intracortical, fMRI, EEG
Neuroimaging: Intracortical, fMRI, EEG
 
Article Overview "Reach and grasp by people with tetraplegia using a neurally...
Article Overview "Reach and grasp by people with tetraplegia using a neurally...Article Overview "Reach and grasp by people with tetraplegia using a neurally...
Article Overview "Reach and grasp by people with tetraplegia using a neurally...
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
 
Soft Introduction to Brain-Computer Interfaces and Machine Learning
Soft Introduction to Brain-Computer Interfaces and Machine LearningSoft Introduction to Brain-Computer Interfaces and Machine Learning
Soft Introduction to Brain-Computer Interfaces and Machine Learning
 
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer InterfacesIlya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
 

Último

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 

Último (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 

Mastering the game of Go with deep neural networks and tree search (article overview)

  • 1. Article overview by Ilya Kuzovkin David Silver et al. from Google DeepMind Reinforcement Learning Seminar University of Tartu, 2016 Mastering the game of Go with deep neural networks and tree search
  • 15. Supervised policy network p (a|s) Reinforcement policy network p⇢(a|s) Rollout policy network p⇡(a|s) Value! network v✓(s) Tree policy network p⌧ (a|s) TRAINING THE BUILDING BLOCKS SUPERVISED CLASSIFICATION REINFORCEMENT SUPERVISED REGRESSION
  • 17. Supervised policy network p (a|s) 19 x 19 x 48 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU 1 convolutional layer 1x1 ReLU Softmax
  • 18. Supervised policy network p (a|s) 19 x 19 x 48 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Softmax 1 convolutional layer 1x1 ReLU • 29.4M positions from games between 6 to 9 dan players
  • 19. Supervised policy network p (a|s) 19 x 19 x 48 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Softmax • stochastic gradient ascent ! • learning rate = 0.003, halved every 80M steps ! • batch size m = 16 ! • 3 weeks on 50 GPUs to make 340M steps ↵1 convolutional layer 1x1 ReLU • 29.4M positions from games between 6 to 9 dan players
  • 20. Supervised policy network p (a|s) 19 x 19 x 48 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Softmax • stochastic gradient ascent ! • learning rate = 0.003, halved every 80M steps ! • batch size m = 16 ! • 3 weeks on 50 GPUs to make 340M steps ↵1 convolutional layer 1x1 ReLU • 29.4M positions from games between 6 to 9 dan players • Augmented: 8 reflections/rotations • Test set (1M) accuracy: 57.0% • 3 ms to select an action
  • 21. 19 X 19 X 48 INPUT
  • 22. 19 X 19 X 48 INPUT
  • 23. 19 X 19 X 48 INPUT
  • 24. 19 X 19 X 48 INPUT
  • 25. Rollout policy p⇡(a|s) • Supervised — same data as • Less accurate: 24.2% (vs. 57.0%) • Faster: 2μs per action (1500 times) • Just a linear model with softmax p (a|s)
  • 26. Rollout policy p⇡(a|s) • Supervised — same data as • Less accurate: 24.2% (vs. 57.0%) • Faster: 2μs per action (1500 times) • Just a linear model with softmax p (a|s)
  • 27. Rollout policy p⇡(a|s) Tree policy p⌧ (a|s) • Supervised — same data as • Less accurate: 24.2% (vs. 57.0%) • Faster: 2μs per action (1500 times) • Just a linear model with softmax p (a|s)
  • 28. Rollout policy p⇡(a|s) • “similar to the rollout policy but with more features” Tree policy p⌧ (a|s) • Supervised — same data as • Less accurate: 24.2% (vs. 57.0%) • Faster: 2μs per action (1500 times) • Just a linear model with softmax p (a|s)
  • 30. Reinforcement policy network p⇢(a|s) Same architecture Weights are initialized with⇢ • Self-play: current network vs. randomized pool of previous versions
  • 31. Reinforcement policy network p⇢(a|s) Same architecture Weights are initialized with⇢ • Self-play: current network vs. randomized pool of previous versions • Play a game until the end, get the reward zt = ±r(sT ) = ±1
  • 32. Reinforcement policy network p⇢(a|s) Same architecture Weights are initialized with⇢ • Self-play: current network vs. randomized pool of previous versions • Play a game until the end, get the reward • Set and play the same game again, this time updating the network parameters at each time step t zt = ±r(sT ) = ±1 zi t = zt
  • 33. Reinforcement policy network p⇢(a|s) Same architecture Weights are initialized with⇢ • Self-play: current network vs. randomized pool of previous versions • Play a game until the end, get the reward • Set and play the same game again, this time updating the network parameters at each time step t • = … ‣ 0 “on the first pass through the training pipeline” ‣ “on the second pass” zt = ±r(sT ) = ±1 zi t = zt v(si t) v✓(si t)
  • 34. Reinforcement policy network p⇢(a|s) Same architecture Weights are initialized with⇢ • Self-play: current network vs. randomized pool of previous versions • Play a game until the end, get the reward • Set and play the same game again, this time updating the network parameters at each time step t • = … ‣ 0 “on the first pass through the training pipeline” ‣ “on the second pass” • batch size n = 128 games • 10,000 batches • One day on 50 GPUs zt = ±r(sT ) = ±1 zi t = zt v(si t) v✓(si t)
  • 35. Reinforcement policy network p⇢(a|s) Same architecture Weights are initialized with⇢ • Self-play: current network vs. randomized pool of previous versions • 80% wins against Supervised Network • 85% wins against Pachi (no search yet!) • 3 ms to select an action • Play a game until the end, get the reward • Set and play the same game again, this time updating the network parameters at each time step t • = … ‣ 0 “on the first pass through the training pipeline” ‣ “on the second pass” • batch size n = 128 games • 10,000 batches • One day on 50 GPUs zt = ±r(sT ) = ±1 zi t = zt v(si t) v✓(si t)
  • 37. 19 x 19 x 49 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Fully connected layer 256 ReLU units Value! network v✓(s) Fully connected layer 1 tanh unit 1 convolutional layer 1x1 ReLU
  • 38. 19 x 19 x 49 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Fully connected layer 256 ReLU units Value! network v✓(s) Fully connected layer 1 tanh unit • Evaluate the value of the position s under policy p: ! • Double approximation 1 convolutional layer 1x1 ReLU
  • 39. 19 x 19 x 49 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Fully connected layer 256 ReLU units Value! network v✓(s) Fully connected layer 1 tanh unit • Evaluate the value of the position s under policy p: ! • Double approximation 1 convolutional layer 1x1 ReLU • Stochastic gradient descent to minimize MSE
  • 40. 19 x 19 x 49 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Fully connected layer 256 ReLU units Value! network v✓(s) Fully connected layer 1 tanh unit • Evaluate the value of the position s under policy p: ! • Double approximation 1 convolutional layer 1x1 ReLU • Stochastic gradient descent to minimize MSE • Train on 30M state-outcome pairs , each from a unique game generated by self-play: (s, z)
  • 41. 19 x 19 x 49 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Fully connected layer 256 ReLU units Value! network v✓(s) Fully connected layer 1 tanh unit • Evaluate the value of the position s under policy p: ! • Double approximation 1 convolutional layer 1x1 ReLU • Stochastic gradient descent to minimize MSE • Train on 30M state-outcome pairs , each from a unique game generated by self-play: ‣ choose a random time step u ‣ sample moves t=1…u-1 from SL policy ‣ make random move u ‣ sample t=u+1…T from RL policy and get game outcome z ‣ add pair to the training set (s, z) (su, zu)
  • 42. 19 x 19 x 49 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Fully connected layer 256 ReLU units Value! network v✓(s) Fully connected layer 1 tanh unit • Evaluate the value of the position s under policy p: ! • Double approximation 1 convolutional layer 1x1 ReLU • Stochastic gradient descent to minimize MSE • Train on 30M state-outcome pairs , each from a unique game generated by self-play: ‣ choose a random time step u ‣ sample moves t=1…u-1 from SL policy ‣ make random move u ‣ sample t=u+1…T from RL policy and get game outcome z ‣ add pair to the training set • One week on 50 GPUs to train on 50M batches of size m=32 (s, z) (su, zu)
  • 43. 19 x 19 x 49 input 1 convolutional layer 5x5 with k=192 filters, ReLU 11 convolutional layers 3x3 with k=192 filters, ReLU Fully connected layer 256 ReLU units Value! network v✓(s) Fully connected layer 1 tanh unit • Evaluate the value of the position s under policy p: ! • Double approximation • MSE on the test set: 0.234 • Close to MC estimation from RL policy; 15,000 faster 1 convolutional layer 1x1 ReLU • Stochastic gradient descent to minimize MSE • Train on 30M state-outcome pairs , each from a unique game generated by self-play: ‣ choose a random time step u ‣ sample moves t=1…u-1 from SL policy ‣ make random move u ‣ sample t=u+1…T from RL policy and get game outcome z ‣ add pair to the training set • One week on 50 GPUs to train on 50M batches of size m=32 (s, z) (su, zu)
  • 44.
  • 47. APV-MCTS Each node s has edges (s, a) for all legal actions and stores statistics: Prior Number of evaluations Number of rollouts MC value estimate Rollout value estimate Combined mean action value ASYNCHRONOUS POLICY AND VALUE MCTS
  • 48. APV-MCTS Each node s has edges (s, a) for all legal actions and stores statistics: Prior Number of evaluations Number of rollouts MC value estimate Rollout value estimate Combined mean action value ASYNCHRONOUS POLICY AND VALUE MCTS Simulation starts at the root and stops at time L, when a leaf (unexplored state) is found.
  • 49. APV-MCTS Each node s has edges (s, a) for all legal actions and stores statistics: Prior Number of evaluations Number of rollouts MC value estimate Rollout value estimate Combined mean action value ASYNCHRONOUS POLICY AND VALUE MCTS Simulation starts at the root and stops at time L, when a leaf (unexplored state) is found. Position is added to evaluation queue.sL
  • 50. APV-MCTS Each node s has edges (s, a) for all legal actions and stores statistics: Prior Number of evaluations Number of rollouts MC value estimate Rollout value estimate Combined mean action value ASYNCHRONOUS POLICY AND VALUE MCTS Simulation starts at the root and stops at time L, when a leaf (unexplored state) is found. Position is added to evaluation queue.sL Bunch of nodes selected for evaluation…
  • 52. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Node s is evaluated using the value network to obtain v✓(s)
  • 53. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Node s is evaluated using the value network to obtain and using rollout simulation with policy till the end of each simulated game to get the final game score. p⇡(a|s) v✓(s)
  • 54. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Node s is evaluated using the value network to obtain and using rollout simulation with policy till the end of each simulated game to get the final game score. p⇡(a|s) v✓(s) Each leaf is evaluated, we are ready to propagate updates
  • 56. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Statistics along the paths of each simulation are updated during the backward pass though t < L
  • 57. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Statistics along the paths of each simulation are updated during the backward pass though t < L visits counts are updated as well
  • 58. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Statistics along the paths of each simulation are updated during the backward pass though t < L visits counts are updated as well Finally overall evaluation of each visited state-action edge is updated
  • 59. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Statistics along the paths of each simulation are updated during the backward pass though t < L visits counts are updated as well Finally overall evaluation of each visited state-action edge is updated Current tree is updated
  • 61. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Once an edge (s, a) is visited enough ( ) times it is included into the tree with s’ nthr
  • 62. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Once an edge (s, a) is visited enough ( ) times it is included into the tree with s’ nthr It is initialized using the tree policy to and updated with SL policy: p⌧ (a|s0 ) ⌧
  • 63. APV-MCTS ASYNCHRONOUS POLICY AND VALUE MCTS Tree is expanded, fully updated and ready for the next move! Once an edge (s, a) is visited enough ( ) times it is included into the tree with s’ nthr It is initialized using the tree policy to and updated with SL policy: p⌧ (a|s0 ) ⌧
  • 64.