Alpha zero - London 2018

•

2 gostaram•974 visualizações

1) Alpha Zero was an AI developed by DeepMind that achieved master level play in the games of chess, shogi, and Go without relying on human data or prior knowledge. 2) It was able to achieve this by using a new form of deep reinforcement learning that allowed it to learn to play solely from games of self-play, starting from random play. 3) Alpha Zero demonstrated superhuman performance in chess, shogi, and Go by defeating previous champion programs in these games, despite being provided no domain knowledge except the game rules.

Engenharia

From Alpha Go to
Alpha Zero
Google London

March 2018

Juantomás García
• Data Solutions Manager @ OpenSistemas
• GDE (Google Developer Expert) for cloud
Others
• Co-Author of the first Spanish free software book “La Pastilla
Roja”
• President of Hispalinux (Spanish Linux User Group)
• Organizer of the Machine Learning Spain and GDG Cloud Madrid.
Who I am

• People interested in Machine Learning
• Wants to know more about what’s is Alpha Go
• With a good technical background.
Who are the Audience

• I love Machine Learning.
• There are a lot of takeaways from this project.
• I wish to divulge it
Why I did this presentation

• Alpha Go: the epic project
• AlphaGo Zero: re-evolution version
• Alpha Zero: Looking for general solutions
• DIY: Alpha Zero Connect 4
• Takeaways
Outline

A brief introduction
• Deep Blue was about brute force
• They were emulating how humans play chess

A brief introduction
• A very huge Search Space
Chess -> Opening 20 possible moves
Go -> Opening 361 possible moves

Alpha Go Main Concepts
• Policy Neural Network
“To decide which are the most sensible moves in
a particular board position”.

Alpha Go Main Concepts
• Value Neural Network
“How great is a particular board arrangements”.
“How likely you are to win the game with this
position”.

Alpha Go First Approach: SL
• Just train both networks using human games.
• Just old and ordinary supervised learning.
• With this: AlphaGo just play with like a weak
human.
• It like the approach of deep blue: just emulating
human chess players

Alpha Go Second Approach: RL
• Improve SL version starting playing again itself.
• With Reinforcement Learning is able to play well
against state of the art go playing programs
• These programs are using MCTS

Alpha Go Second Approach: RL
• It is not 2 NN vs Monte Carlo Tree Search
• Is a better MCTS thanks to the NNs.

Alpha Go Second Approach: RL
• Optimal Value Function V*(s)
“Determine the outcome of the game from every
board position (s is the state)”.
Brute force solution is impossible:
Chess: 35 ** 80
Go: 250 ** 150

Alpha Go Second Approach: RL
• Two solutions for reduce the effective search
space:
Truncate the tree subtree search: V(s) like V*(s)
Reducing the breadth of the search with the
policy: P(a|s)
We MCTS rollout the moves choose by the policy
function and evaluate with the optimal value
function.

AlphaGo Zero: Re-Evolution version
• Just trained with Reinforcement Learning
• Choose the less out different moves: u(s,a)
• Just one neural network for policy and value.
• Every time a search is done the neural network is
retrained.

AlphaGo Zero: Re-Evolution version
• Human games was noisy and not reliable.
• Don’t use rollouts for predict who will win.

Alpha Zero: New Challenges
AlphaGo Zero VS AlphaZero:
• Binary outcome (win / loss) × expected outcome
(including
• 3 draws or potentially other outcomes)
• Board positions transformed before passing to neural
networks (by randomly selected rotation or redirection) × no
data augmentation
• Games generated by the best player from previous iterations
(margin of 55 %) × continual update using the latest
parameters (without the evaluation and selection steps)
• Hyper-parameters tuned by Bayesian optimisation × reused
the same hyper-parameters without game-specific tuning

Alpha Zero: DYI
https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188

Takeaways
RL is more than Atari Games and GO

Takeaways
AI discovery new ways to play.
Think about new projects like proteins fold.

Takeaways
We’re living awesome times.
Sharing AI papers, tools, models, etc. More
than any time before.

Takeaways
As Ms Fei Fei said: “It’s about democratizing AI”

Takeaways
Watch this Documentary Film about Alpha Go:

Mais conteúdo relacionado

Mais procurados

Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin

AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeJoonhyung Lee

Reinforcement LearningCloudxLab

An introduction to reinforcement learningJie-Han Chen

Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras

AlphaGo Zero Introduction友誠張

Multi armed banditJie-Han Chen

Calibrated RecommendationsHarald Steck

Deep Reinforcement LearningUsman Qayyum

Introduction of Deep Reinforcement LearningNAVER Engineering

Reinforcement LearningDongHyun Kwak

Reinforcement learningShahan Ali Memon

An introduction to deep reinforcement learningBig Data Colombia

[RLKorea] Unity ML-agents 발표Kyushik Min

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.

NDC 2014 이은석 - 온라인 게임의 창발적 플레이 디자인Eunseok Yi

Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico

게임 기획 튜토리얼 (2015 개정판)Lee Sangkyoon (Kay)

Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee

AlphaGo 알고리즘 요약Jooyoul Lee

Mais procurados (20)

Monte Carlo Tree Search for the Super Mario Bros

AlphaGo Zero: Mastering the Game of Go Without Human Knowledge

Reinforcement Learning

An introduction to reinforcement learning

Personalizing "The Netflix Experience" with Deep Learning

AlphaGo Zero Introduction

Multi armed bandit

Calibrated Recommendations

Deep Reinforcement Learning

Introduction of Deep Reinforcement Learning

Reinforcement Learning

Reinforcement learning

An introduction to deep reinforcement learning

[RLKorea] Unity ML-agents 발표

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman

NDC 2014 이은석 - 온라인 게임의 창발적 플레이 디자인

Recent Trends in Personalization: A Netflix Perspective

게임 기획 튜토리얼 (2015 개정판)

Reinforcement Learning 2. Multi-armed Bandits

AlphaGo 알고리즘 요약

Semelhante a Alpha zero - London 2018

From alpha go to alpha zero TLP innova 2018Juantomás García Molina

Adversarial search with Game PlayingAman Patel

Chakrabarti alpha go analysisDave Selinger

Gameskalavathisugan

A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi

How DeepMind Mastered The Game Of GoTim Riser

J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl

Games.4Praveen Kumar

AlphaGo: An AI Go player based on deep neural networks and monte carlo tree s...Michael Jongho Moon

Implementation and analysis of search algorithms in single player connect fou...Anmol Rajpurohit

Devoxx 2017 - AI Self-learning Game PlayingRichard Abbuhl

AlphaGo zeroDong Guo

Alpha go 16110226_김영우영우 김

IaGo: an Othello AI inspired by AlphaGoShion Honda

TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...Seldon

[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee

chess-algorithms-theory-and-practice_ver2017.pdfrajdipdas12

Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi

21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCEudayvanand

Mastering the game of go with deep neural networks and tree searchSanFengChang

Semelhante a Alpha zero - London 2018 (20)

From alpha go to alpha zero TLP innova 2018

Adversarial search with Game Playing

Chakrabarti alpha go analysis

Games

A Presentation on the Paper: Mastering the game of Go with deep neural networ...

How DeepMind Mastered The Game Of Go

J-Fall 2017 - AI Self-learning Game Playing

Games.4

AlphaGo: An AI Go player based on deep neural networks and monte carlo tree s...

Implementation and analysis of search algorithms in single player connect fou...

Devoxx 2017 - AI Self-learning Game Playing

AlphaGo zero

Alpha go 16110226_김영우

IaGo: an Othello AI inspired by AlphaGo

TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...

[1312.5602] Playing Atari with Deep Reinforcement Learning

chess-algorithms-theory-and-practice_ver2017.pdf

Deep learning to the rescue - solving long standing problems of recommender ...

21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE

Mastering the game of go with deep neural networks and tree search

Mais de Juantomás García Molina

#AbadIA machine learning pipelines commit conf 2019Juantomás García Molina

AbadIA - sphere it krakow 2019Juantomás García Molina

AbadIA ING Direct - Madrid 2019Juantomás García Molina

AbadIA US Secret Tour - Pittsburgh'19Juantomás García Molina

AbadIA: the abbey of the crime AI - GDG Cloud London 2018Juantomás García Molina

#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018Juantomás García Molina

#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018Juantomás García Molina

AbadIA: the abbey of the crime AI - Vaas Madrid 2018Juantomás García Molina

From Alpha Go to Alpha Zero - Vaas Madrid 2018Juantomás García Molina

Codemotion madrid 2017 Arquitectura kappa 2.0Juantomás García Molina

JBCN barcelona 2017 kappa architecture 2.0Juantomás García Molina

Meetup big data developers 2017 madrid - spark real use casesJuantomás García Molina

Gdg cloud madrid 2017 - GDG kick off metuupJuantomás García Molina

Scalaua 2017 kyev kappa architecture 2.0Juantomás García Molina

Icea 2017 big data - recursos humanosJuantomás García Molina

Gdg cloud london 2017 kappa architecture 2.0 copiaJuantomás García Molina

Datascience lab 2017 odessa kappa architecture 2.0Juantomás García Molina

Databeers madrid 2017 - Paas pigeons as a serviceJuantomás García Molina

How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017Juantomás García Molina

Librecon 2016 bilbao: kappa architecture IoT of the carsJuantomás García Molina

Mais de Juantomás García Molina (20)

#AbadIA machine learning pipelines commit conf 2019

AbadIA - sphere it krakow 2019

AbadIA ING Direct - Madrid 2019

AbadIA US Secret Tour - Pittsburgh'19

AbadIA: the abbey of the crime AI - GDG Cloud London 2018

#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018

#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018

AbadIA: the abbey of the crime AI - Vaas Madrid 2018

From Alpha Go to Alpha Zero - Vaas Madrid 2018

Codemotion madrid 2017 Arquitectura kappa 2.0

JBCN barcelona 2017 kappa architecture 2.0

Meetup big data developers 2017 madrid - spark real use cases

Gdg cloud madrid 2017 - GDG kick off metuup

Scalaua 2017 kyev kappa architecture 2.0

Icea 2017 big data - recursos humanos

Gdg cloud london 2017 kappa architecture 2.0 copia

Datascience lab 2017 odessa kappa architecture 2.0

Databeers madrid 2017 - Paas pigeons as a service

How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017

Librecon 2016 bilbao: kappa architecture IoT of the cars

Último

(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Thermal Engineering Unit - I & II . pptDineshKumar4165

UNIT - IV - Air Compressors and its Performancesivaprakash250

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsArindam Chakraborty, Ph.D., P.E. (CA, TX)

Minimum and Maximum Modes of microprocessor 8086anil_gaur

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)

chapter 5.pptx: drainage and irrigation engineeringmulugeta48

University management System project report..pdfKamal Acharya

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7Call Girls in Nagpur High Profile Call Girls

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698

COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Introduction to Serverless with AWS LambdaOmar Fathy

Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Integrated Test Rig For HTFE-25 - NeometrixNeometrix_Engineering_Pvt_Ltd

Alpha zero - London 2018

1. From Alpha Go to Alpha Zero Google London March 2018

2. Juantomás García • Data Solutions Manager @ OpenSistemas • GDE (Google Developer Expert) for cloud Others • Co-Author of the first Spanish free software book “La Pastilla Roja” • President of Hispalinux (Spanish Linux User Group) • Organizer of the Machine Learning Spain and GDG Cloud Madrid. Who I am

3. • People interested in Machine Learning • Wants to know more about what’s is Alpha Go • With a good technical background. Who are the Audience

4. • I love Machine Learning. • There are a lot of takeaways from this project. • I wish to divulge it Why I did this presentation

5. • Alpha Go: the epic project • AlphaGo Zero: re-evolution version • Alpha Zero: Looking for general solutions • DIY: Alpha Zero Connect 4 • Takeaways Outline

6. A brief introduction • Deep Blue was about brute force • They were emulating how humans play chess

7. A brief introduction • A very huge Search Space Chess -> Opening 20 possible moves Go -> Opening 361 possible moves

8. Alpha Go Main Concepts • Policy Neural Network “To decide which are the most sensible moves in a particular board position”.

9. Alpha Go Main Concepts • Value Neural Network “How great is a particular board arrangements”. “How likely you are to win the game with this position”.

10. Alpha Go Main Concepts

11. Alpha Go First Approach: SL • Just train both networks using human games. • Just old and ordinary supervised learning. • With this: AlphaGo just play with like a weak human. • It like the approach of deep blue: just emulating human chess players

12. Alpha Go First Approach: SL

13. Alpha Go Second Approach: RL • Improve SL version starting playing again itself. • With Reinforcement Learning is able to play well against state of the art go playing programs • These programs are using MCTS

14. Alpha Go Second Approach: RL

15. Alpha Go Second Approach: RL • It is not 2 NN vs Monte Carlo Tree Search • Is a better MCTS thanks to the NNs.

16. Alpha Go Second Approach: RL • Optimal Value Function V*(s) “Determine the outcome of the game from every board position (s is the state)”. Brute force solution is impossible: Chess: 35 ** 80 Go: 250 ** 150

17. Alpha Go Second Approach: RL • Two solutions for reduce the effective search space: Truncate the tree subtree search: V(s) like V*(s) Reducing the breadth of the search with the policy: P(a|s) We MCTS rollout the moves choose by the policy function and evaluate with the optimal value function.

18. AlphaGo: The Match

19. AlphaGo Zero: Re-Evolution version • Just trained with Reinforcement Learning • Choose the less out different moves: u(s,a) • Just one neural network for policy and value. • Every time a search is done the neural network is retrained.

20. AlphaGo Zero: Re-Evolution version • Human games was noisy and not reliable. • Don’t use rollouts for predict who will win.

21. AlphaGo Zero: Re-Evolution version

22. AlphaGo Zero: Re-Evolution version

23. Alpha Zero: New Challenges AlphaGo Zero VS AlphaZero: • Binary outcome (win / loss) × expected outcome (including • 3 draws or potentially other outcomes) • Board positions transformed before passing to neural networks (by randomly selected rotation or redirection) × no data augmentation • Games generated by the best player from previous iterations (margin of 55 %) × continual update using the latest parameters (without the evaluation and selection steps) • Hyper-parameters tuned by Bayesian optimisation × reused the same hyper-parameters without game-specific tuning

24. Alpha Zero

25. Alpha Zero: DYI https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188

26. Takeaways RL is more than Atari Games and GO

27. Takeaways AI discovery new ways to play. Think about new projects like proteins fold.

28. Takeaways We’re living awesome times. Sharing AI papers, tools, models, etc. More than any time before.

29. Takeaways As Ms Fei Fei said: “It’s about democratizing AI”

30. Takeaways Watch this Documentary Film about Alpha Go:

31. Thank You