SlideShare a Scribd company logo
1 of 37
UNDERSTANDING ALPHA GO
How Deep Learning Made the Impossible Possible
ABOUT MYSELF
 Ms.c. In computer Science, HUJI
 Research interest: Deep Learning in Computer
Vision, NLP, Reinforcement learning.
 Also, DL Theory and other ML stuff.
 Works in a DL start-up (Imubit)
 Contact: mangate@gmail.com
CREDITS
 A lot of slides were taken from the following publicly
available slideshows:
 https://www.slideshare.net/ShaneSeungwhanMoon/how-
alphago-works
 https://www.slideshare.net/ckmarkohchang/alphago-in-depth
 https://www.slideshare.net/KarelHa1/alphago-mastering-the-
game-of-go-with-deep-neural-networks-and-tree-search
 Original AlphaGo article:
Silver, David, et al. "Mastering the game of Go with
deep neural networks and tree search.“Nature 529.7587
(2016): 484-489.
Available here:
http://web.iitd.ac.in/~sumeet/Silver16.pdf
DEEP LEARNING IS CHANGING OUR LIVES
 Search Engine (also for images and audio)
 Spam filters
 Recommender systems (Netflix, Youtube)
 Self-Driving Cars
 Cyber security (and regular one via computer
vision)
 Machine translation.
 Speech to text, audio recognition.
 Image recognition, smart shopping
 And more and more and more…
AI VERSUS HUMAN
 In 1997, a super computer called Deep Blue (IBM) won Garry
Kasparov.
 This was the first defeat of a reigning world chess champion
by a computer under tournament conditions.
AI VERSUS HUMAN
 In 2011 Watson, another super-computer by IBM, “crashed”
the 2 best player in Jepoerdy, a popular question-answering
tv-show.
GO
 An ancient Chinese Game
(2,500 years old!)
 Despite its relatively simple
rules, Go is very complex,
even more so than chess.
 Winning Go requires a
great deal of intuition and
therefore was considered
unachievable by computer for at least the next 30
years.
AI VESUS HUMAN
 In 2016 a AlphaGo, a computer program by
DeepMind (part of Google) played a 5-games Go
match aginst Lee Sedol.
 Lee Sedol:
 professional 9-Dan (highest ranking in Go) considered
among top 3 players in the world.
 2nd in international titles.
 Won 97 out of 100 games
against european Go
champion Fan Hui.
AI VERSUS HUMAN
 “I’m confident that I can win, at least this time” – Lee Sedol
 Alpha Go won 4-1
 “I kind of felt powerless… misjudged the capabilities of
AlphaGo” – Lee Sedol
 How is it possible? Deep Learning.
AI IN GAME PLAYING
 Almost every game can be “simulated” with a tree search.
 A move is done if it has to most chances to end in a victory.
AI IN GAMES
 More formally: an optimal value function V*(s)
determines the outcome of the game:
 From every board position (state=s)
 Under perfect play by all players.
 This is done by going over the tree containing
possible move sequences where:
 b is the games breadth (number of legal moves in each
position)
 d is the game depth (game length in moves)
 Tic-Tac-Toe:
 Chess:
d
b
4, 4b d 
35 80b d 
TREE SEARCH IN GO
 However in GO:
 This is more than the number of atoms in the entire universe!
 Go Is more complex than chess!
250, 150b d 
100
10 ( )Googol
KEY: REDUCE THE SEARCH SPACE
 Reducing b (possible actions space)
KEY: REDUCE THE SEARCH SPACE
 Reducing d – Position evaluation ahead of time
 Instead of simulating all the way to the end:
Both reductions are done with Deep Learning.
SOME CONCEPTS
 Supervised Learning (classification)
 On a given data, predict a class (or choose 1 option
out of some known number of options)
SOME CONCEPTS
 Supervised Learning (regression)
 On a given data, predict some real number
SOME CONCEPTS
 Reinforcement Learning
 Upon given state (observation) perform some
action which leads to the goal (i.e. winning a game)
SOME CONCEPTS
 CNN’s are able to learn abstract features of a given image
REDUCING ACTION CANDIDATES
 Done by learning to “imitate” expert moves
 Data: Online Go experts. 160K Games 300M moves.
 This is supervised classification (on given data predict the
expert action out of all possible ones)
REDUCING ACTION CANDIDATES
 This deep CNN achieved 55% test accuracy on predicting
expert moves.
 Imitators with no Deep Learning reached only 22% accuracy.
 Small improvement in accuracy lead to big improvement in
playing ability.
ROLLOUT NETWORK
 Train additional smaller network
(Ppi ) for imitating.
 This network achieves only 24.2%
accuracy.
 Works 1000 times faster (2us
compared to 3ms).
 This network is used for rollouts
(explained later).
IMPROVING THE NETWORK
 Improve the imitator network through self playing
(Reinforcement learning)
 An entire game is played and the parameters are
updates according to the results.
IMPROVING THE NETWORK
 Keep generating better models by self-play newer models
against old ones
 The final network also won 85% against the best GO software
(model without self play won only 11%)
 However, the model was eventually not used during the
games. It was used to generate the value function.
REDUCING SEARCH DEPTH - DATASET
 Self-play with the imitator model for some steps (0
to 450).
 Make some random move. This is the starting
position ‘s’.
 Self play until the end with the RL network (latest
model).
 If black won z=1 otherwise z=0.
 Save (s,z) to the dataset.
 Generated 30M (s,z) pairs from 30M games.
REDUCING SEARCH DEPTH –
VALUE FUNCTION
 Regression task, for a given position S give number between
0 and 1.
 Now, for each possible position we can have an evaluation of
how “good” it is for the black player.
REDUCING SEARCH SPACE
PUTTING IT ALL TOGETHER - MCST
 During game time a method called Monte-Carlo
Search Tree (MCTS) is applied.
 This method have 4 steps:
 Selection
 Expansion
 Evaluation
 Backup (update)
 For each play in the game this process is repeated
about 10K times.
MCTS - SELECTION
 At each step we have a starting
position (the board at this point).
 An action is selected
using a combination of the imitator
network and some other value
(Q) which is set to 0 at the start.
 we divide by the
times a state/action pair was
visited to encourage diversity.
( , )
( )
1 ( , )
P s a
u p
N s a


MCTS - EXPANSION
 When building the tree,
position can be expended once
(create new leafs in the tree)
with the imitator network.
 This way we have the new u(P)
for the next searches.
MCTS - EVALUATION
 After simulating 3-4 steps
with the imitating network
we evaluate the board
position.
 This is done in two ways:
 The value network prediction.
 Using the smaller imitator
network to self-play to the end
(rollout), and save the result
(1 for black win 0 for white)
 Both evaluation are combined
to give this board position a
number between 0 and 1.
MCTS – BACKUP (UPDATE)
 After the simulation we
update the tree.
 Update Q (which was
0 in the beginning) with
the value computed with
the value network and the
rollouts.
 Update N(s,a): Increase
by one for each
state/action pair visited.
CHOOSING AN ACTION
 For each step during the game MCTS is done for
10K times.
 In the end the action which was visited the most
times from the root position (the current board) is
taken.
 Notes:
 Since this process is long they had to use the smaller
network for rollouts to keep it feasible (otherwise each
move would have taken the computer several days to
compute).
 The imitator network was better in choosing the first
actions compared to the RL network, probably due to
human taking more diverse actions.
ALPHA GO WEAKNESSES
 In the 4th game, Lee Sedol got the board to a
position which was not on Alpha Go search tree,
causing the program to choose worse actions and
losing the game eventually.
 Most assumptions made for Alpha-Go are not
relevant in real life RL problems. See:
https://medium.com/@karpathy/alphago-in-context-
c47718cb95a5
RETIREMENT
 In March 2017 alpha go won Ke Jie, the 1st ranked in the
world, 3-0.
 Google’s DeepMind unit announced that it would be the last
event match the AI plays.
SUMMARY
 To this day, AlphaGo is considered one of the greatest AI
achievements in recent history.
 This achievement was made by combining Deep
Learning with standard method (like MCST) to “simplify”
the very complex game of Go.
 4 Deep Neural Networks were used:
 3 almost identical Convolutional Neural Network:
 Imitating network for action space reduction.
 RL network created through self-play, for generating the dataset
for the value network.
 Value network for search depth reduction.
 1 small network for rollouts.
 Deep Learning keeps achieving new amazing goals
every day, and is one of the fastest growing fields in
both academy and industry.
QUESTIONS?
Thank you!

More Related Content

What's hot

Adversarial Search
Adversarial SearchAdversarial Search
Adversarial SearchMegha Sharma
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Chess Engine Programming
Chess Engine ProgrammingChess Engine Programming
Chess Engine ProgrammingArno Huetter
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchNilu Desai
 
Local beam search example
Local beam search exampleLocal beam search example
Local beam search exampleMegha Sharma
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeJoonhyung Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction友誠 張
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchDheerendra k
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game PlayingAman Patel
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theorySeongwon Hwang
 
Dempster Shafer Theory AI CSE 8th Sem
Dempster Shafer Theory AI CSE 8th SemDempster Shafer Theory AI CSE 8th Sem
Dempster Shafer Theory AI CSE 8th SemDigiGurukul
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement LearningTakashi Nagata
 

What's hot (20)

Adversarial Search
Adversarial SearchAdversarial Search
Adversarial Search
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Chess Engine Programming
Chess Engine ProgrammingChess Engine Programming
Chess Engine Programming
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Local beam search example
Local beam search exampleLocal beam search example
Local beam search example
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
 
Dempster Shafer Theory AI CSE 8th Sem
Dempster Shafer Theory AI CSE 8th SemDempster Shafer Theory AI CSE 8th Sem
Dempster Shafer Theory AI CSE 8th Sem
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement Learning
 

Similar to How Deep Learning Achieved the Impossible in Go

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Jun Okumura
 
Badiya haihn
Badiya haihnBadiya haihn
Badiya haihnamitp26
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysisDave Selinger
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMatthias Zimmermann
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Data Con LA
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018Juantomás García Molina
 
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...Tobias Pfeiffer
 

Similar to How Deep Learning Achieved the Impossible in Go (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)
 
Badiya haihn
Badiya haihnBadiya haihn
Badiya haihn
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Ai in games
Ai in gamesAi in games
Ai in games
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle Introduction
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
 
Scaling Deep Learning
Scaling Deep LearningScaling Deep Learning
Scaling Deep Learning
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
 

Recently uploaded

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

How Deep Learning Achieved the Impossible in Go

  • 1. UNDERSTANDING ALPHA GO How Deep Learning Made the Impossible Possible
  • 2. ABOUT MYSELF  Ms.c. In computer Science, HUJI  Research interest: Deep Learning in Computer Vision, NLP, Reinforcement learning.  Also, DL Theory and other ML stuff.  Works in a DL start-up (Imubit)  Contact: mangate@gmail.com
  • 3. CREDITS  A lot of slides were taken from the following publicly available slideshows:  https://www.slideshare.net/ShaneSeungwhanMoon/how- alphago-works  https://www.slideshare.net/ckmarkohchang/alphago-in-depth  https://www.slideshare.net/KarelHa1/alphago-mastering-the- game-of-go-with-deep-neural-networks-and-tree-search  Original AlphaGo article: Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search.“Nature 529.7587 (2016): 484-489. Available here: http://web.iitd.ac.in/~sumeet/Silver16.pdf
  • 4. DEEP LEARNING IS CHANGING OUR LIVES  Search Engine (also for images and audio)  Spam filters  Recommender systems (Netflix, Youtube)  Self-Driving Cars  Cyber security (and regular one via computer vision)  Machine translation.  Speech to text, audio recognition.  Image recognition, smart shopping  And more and more and more…
  • 5. AI VERSUS HUMAN  In 1997, a super computer called Deep Blue (IBM) won Garry Kasparov.  This was the first defeat of a reigning world chess champion by a computer under tournament conditions.
  • 6. AI VERSUS HUMAN  In 2011 Watson, another super-computer by IBM, “crashed” the 2 best player in Jepoerdy, a popular question-answering tv-show.
  • 7. GO  An ancient Chinese Game (2,500 years old!)  Despite its relatively simple rules, Go is very complex, even more so than chess.  Winning Go requires a great deal of intuition and therefore was considered unachievable by computer for at least the next 30 years.
  • 8. AI VESUS HUMAN  In 2016 a AlphaGo, a computer program by DeepMind (part of Google) played a 5-games Go match aginst Lee Sedol.  Lee Sedol:  professional 9-Dan (highest ranking in Go) considered among top 3 players in the world.  2nd in international titles.  Won 97 out of 100 games against european Go champion Fan Hui.
  • 9. AI VERSUS HUMAN  “I’m confident that I can win, at least this time” – Lee Sedol  Alpha Go won 4-1  “I kind of felt powerless… misjudged the capabilities of AlphaGo” – Lee Sedol  How is it possible? Deep Learning.
  • 10. AI IN GAME PLAYING  Almost every game can be “simulated” with a tree search.  A move is done if it has to most chances to end in a victory.
  • 11. AI IN GAMES  More formally: an optimal value function V*(s) determines the outcome of the game:  From every board position (state=s)  Under perfect play by all players.  This is done by going over the tree containing possible move sequences where:  b is the games breadth (number of legal moves in each position)  d is the game depth (game length in moves)  Tic-Tac-Toe:  Chess: d b 4, 4b d  35 80b d 
  • 12. TREE SEARCH IN GO  However in GO:  This is more than the number of atoms in the entire universe!  Go Is more complex than chess! 250, 150b d  100 10 ( )Googol
  • 13. KEY: REDUCE THE SEARCH SPACE  Reducing b (possible actions space)
  • 14. KEY: REDUCE THE SEARCH SPACE  Reducing d – Position evaluation ahead of time  Instead of simulating all the way to the end: Both reductions are done with Deep Learning.
  • 15. SOME CONCEPTS  Supervised Learning (classification)  On a given data, predict a class (or choose 1 option out of some known number of options)
  • 16. SOME CONCEPTS  Supervised Learning (regression)  On a given data, predict some real number
  • 17. SOME CONCEPTS  Reinforcement Learning  Upon given state (observation) perform some action which leads to the goal (i.e. winning a game)
  • 18. SOME CONCEPTS  CNN’s are able to learn abstract features of a given image
  • 19. REDUCING ACTION CANDIDATES  Done by learning to “imitate” expert moves  Data: Online Go experts. 160K Games 300M moves.  This is supervised classification (on given data predict the expert action out of all possible ones)
  • 20. REDUCING ACTION CANDIDATES  This deep CNN achieved 55% test accuracy on predicting expert moves.  Imitators with no Deep Learning reached only 22% accuracy.  Small improvement in accuracy lead to big improvement in playing ability.
  • 21. ROLLOUT NETWORK  Train additional smaller network (Ppi ) for imitating.  This network achieves only 24.2% accuracy.  Works 1000 times faster (2us compared to 3ms).  This network is used for rollouts (explained later).
  • 22. IMPROVING THE NETWORK  Improve the imitator network through self playing (Reinforcement learning)  An entire game is played and the parameters are updates according to the results.
  • 23. IMPROVING THE NETWORK  Keep generating better models by self-play newer models against old ones  The final network also won 85% against the best GO software (model without self play won only 11%)  However, the model was eventually not used during the games. It was used to generate the value function.
  • 24. REDUCING SEARCH DEPTH - DATASET  Self-play with the imitator model for some steps (0 to 450).  Make some random move. This is the starting position ‘s’.  Self play until the end with the RL network (latest model).  If black won z=1 otherwise z=0.  Save (s,z) to the dataset.  Generated 30M (s,z) pairs from 30M games.
  • 25. REDUCING SEARCH DEPTH – VALUE FUNCTION  Regression task, for a given position S give number between 0 and 1.  Now, for each possible position we can have an evaluation of how “good” it is for the black player.
  • 27. PUTTING IT ALL TOGETHER - MCST  During game time a method called Monte-Carlo Search Tree (MCTS) is applied.  This method have 4 steps:  Selection  Expansion  Evaluation  Backup (update)  For each play in the game this process is repeated about 10K times.
  • 28. MCTS - SELECTION  At each step we have a starting position (the board at this point).  An action is selected using a combination of the imitator network and some other value (Q) which is set to 0 at the start.  we divide by the times a state/action pair was visited to encourage diversity. ( , ) ( ) 1 ( , ) P s a u p N s a  
  • 29. MCTS - EXPANSION  When building the tree, position can be expended once (create new leafs in the tree) with the imitator network.  This way we have the new u(P) for the next searches.
  • 30. MCTS - EVALUATION  After simulating 3-4 steps with the imitating network we evaluate the board position.  This is done in two ways:  The value network prediction.  Using the smaller imitator network to self-play to the end (rollout), and save the result (1 for black win 0 for white)  Both evaluation are combined to give this board position a number between 0 and 1.
  • 31. MCTS – BACKUP (UPDATE)  After the simulation we update the tree.  Update Q (which was 0 in the beginning) with the value computed with the value network and the rollouts.  Update N(s,a): Increase by one for each state/action pair visited.
  • 32. CHOOSING AN ACTION  For each step during the game MCTS is done for 10K times.  In the end the action which was visited the most times from the root position (the current board) is taken.  Notes:  Since this process is long they had to use the smaller network for rollouts to keep it feasible (otherwise each move would have taken the computer several days to compute).  The imitator network was better in choosing the first actions compared to the RL network, probably due to human taking more diverse actions.
  • 33. ALPHA GO WEAKNESSES  In the 4th game, Lee Sedol got the board to a position which was not on Alpha Go search tree, causing the program to choose worse actions and losing the game eventually.  Most assumptions made for Alpha-Go are not relevant in real life RL problems. See: https://medium.com/@karpathy/alphago-in-context- c47718cb95a5
  • 34. RETIREMENT  In March 2017 alpha go won Ke Jie, the 1st ranked in the world, 3-0.  Google’s DeepMind unit announced that it would be the last event match the AI plays.
  • 35. SUMMARY  To this day, AlphaGo is considered one of the greatest AI achievements in recent history.  This achievement was made by combining Deep Learning with standard method (like MCST) to “simplify” the very complex game of Go.  4 Deep Neural Networks were used:  3 almost identical Convolutional Neural Network:  Imitating network for action space reduction.  RL network created through self-play, for generating the dataset for the value network.  Value network for search depth reduction.  1 small network for rollouts.  Deep Learning keeps achieving new amazing goals every day, and is one of the fastest growing fields in both academy and industry.