SlideShare uma empresa Scribd logo
1 de 44
• 自己紹介
• AnyLogic入門
• 強化学習の入門
• AnyLogic+強化学習のメリット
• サンプルと実績の紹介
| OUTLINE
Currently VP. Engineering @ Skymind
• Leading RL Applications
• Previously:
• Assistant Manager @ JBS
• Intern Researcher @ Panasonic
Eduardo Gonzalez
| WHO AM I
3
@wm_eddie
https://qiita.com/wmeddie
https://wm-eddie.info
● Builds AI infrastructure for operating models in
production
● Allows model access from cloud, server,
desktop, and mobile
● Providing tooling for models such as revision
history and accuracy monitoring over time
● Created the widely used open-source AI
framework Deeplearning4j, powering AI for
large enterprises globally, from banking to
telecom
PRODUCTS
SKIL:
ML and DL
Model Server
| ABOUT SKYMIND
4
Skymind’s team has contributed millions of lines of code to Open Source
| OPEN SOURCE CONTRIBUTORS
5
Deep Learning, A Practitioner’s Approach
● Written by Adam Gibson (CTO) and Josh Patterson (Contributor)
● Published in 2017
● Good fundamentals for deep learning and the DL4J framework
● Many Graphics come from the book
| BOOK
6
Deep Learning and the Game of Go
● Written by Max Pumperla, Deep Learning Engineer @ Skymind
● Published in 2019
● Shows how to go from 0 to an entire AlphaZero style Go bot
● Introduces Deep Learning and Reinforcement Learning from
scratch.
| BOOK
7
AnyLogic入門
8
AnyLogic is a multi-modal simulation modeling
software that is capable of doing system
dynamics, agent-based and discrete event based
simulations.
It is a de facto standard in the industry and is
used by almost all of the Fortune 500.
| ANYLOGIC
AnyLogic models can be exported into a Java
application and deployed to customers.
AnyLogic models are extended with Java so you can create custom agents or experiments.
Exported applications are Java libraries and can be integrated into and leverage data from Enterprise
applications and Excel.
| ANYLOGIC DETAILS
DL4J includes RL4J, a reinforcement library for Java. It can be used
inside AnyLogic without friction.
Reinforcement Learning was a main theme of the AnyLogic ’19
Conference. Skymind collaborated closely with AnyLogic for workshops
and panel discussions.
| WHY ANYLOGIC + SKYMIND
強化学習入門
12
| WHAT IS AI?
13
| 4 TYPES OF LEARNING
14
| REINFORCEMENT LEARNING IN DETAIL
| REINFORCEMENT LEARNING ALGORITHMS (VALUE)
Q-learning is a method for training a reinforcement
learning agent to anticipate how much reward it can
expect in the future. The Q comes from the
standard mathematical notation Q(s, a) which is a
function of the state and a possible action
© Intel
Illustration from Deep Learning and the Game of Go © Manning
| REINFORCEMENT LEARNING ALGORITHMS (POLICY)
Actor Critic based algorithms use the current
state as the input and outputs a set of moves it
should play (the policy), and a value of which
player is ahead (the critic)
© Intel
Illustration from Deep Learning and the Game of Go © Manning
AnyLogic+強化学習のメリット
18
• Lots of NP-Hard problems exist in Simulation
• Current Optimization techniques are not able to do anything
• A good enough solution is better than no solution
• And better than hand written heuristics
| WHY REINFORCEMENT LEARNING
© The AnyLogic Company |
www.anylogic.com
20
Learning and decision making from a simulation model
FINAL MODEL
LEARN
Simulation model is an
extension of someone’s
mental model
© The AnyLogic Company |
www.anylogic.com
21
Learning and decision making from a simulation model
FINAL MODEL
LEARN
© The AnyLogic Company |
www.anylogic.com
22
Simulation as the reinforcement learning environment
SIMULATED WORLD
(Simulation Model)
サンプルと実績の紹介
23
© The AnyLogic Company |
www.anylogic.com
24
Traffic Light Example
Eduardo Gonzalez
VP Engineering
Skymind
Samuel Audet
Deep Learning Engineer
Skymind
Tyler Wolfe-Adam
Technical Support Specialist
The AnyLogic Company
© The AnyLogic Company |
www.anylogic.com
25
Arrivalrates(perhour)
Time (seconds)
Traffic Light Example
Cars enter the intersection from 4 directions and
move towards the opposing side.
The objective of the training experiment is to
learn a policy optimally controls the traffic light
based on current status of the traffic.
N
S
W E
© The AnyLogic Company |
www.anylogic.com
26
Implementation Architecture
© The AnyLogic Company |
www.anylogic.com
27
Implementation Architecture
AnyLogic Model
Imported RL4J
library
Custom Experiment
© The AnyLogic Company |
www.anylogic.com
28
What is inside the Custom experiment?
Hyperparameters
Network configuration
Training
© The AnyLogic Company |
www.anylogic.com
29
What is inside the Custom experiment?
Network configuration
10
300 300
2
Input
Hidden 1 Hidden 2
Output
© The AnyLogic Company |
www.anylogic.com
30
What is inside the Custom experiment?
Network configuration
© The AnyLogic Company |
www.anylogic.com
31
What is inside the Custom experiment?
Network configuration
Training
© The AnyLogic Company |
www.anylogic.com
32
What is inside the Custom experiment?
© The AnyLogic Company |
www.anylogic.com
33
What is inside the Custom experiment?
Array with 10 elements
1
2
34
5
6
87
9
© The AnyLogic Company |
www.anylogic.com
34
What is inside the Custom experiment?
© The AnyLogic Company |
www.anylogic.com
35
What is inside the Custom experiment?
Action == 0: do nothing
Action == 1: change the traffic
light phase if not yellow
© The AnyLogic Company |
www.anylogic.com
36
Comparison of results (Optimized vs. Policy)
© The AnyLogic Company |
www.anylogic.com
37
© The AnyLogic Company |
www.anylogic.com
38
Comparison of results (Base vs. Optimized vs. Policy)
Real systems: Dynamic + Stochastic (exogenous inputs / system internals)
Optimization: Optimal fixed input parameters
Policy: Optimal (or near-optimal) decisions over time
© The AnyLogic Company |
www.anylogic.com
39
Reinforcement learning decision points
Hyperparameters Observation Space
Action SpaceReward
© The AnyLogic Company |
www.anylogic.com
40
Trained policies can be deployed in
all types of devices and equipments
to adaptively and autonomously
complete some tasks.
How are learned policies used?
Edge devices could be used as
controllers to deploy the learned
policies.
© The AnyLogic Company |
www.anylogic.com
41
Machine Learning powered by Skymind
http://www.skymind.ai/anylogic
© The AnyLogic Company |
www.anylogic.com
42
• The great news for simulation modelers is that
their skills have a new and exciting application
now!
• To implement a reinforcement learning (or DRL)
a team of DRL expert(s) + simulation modeler(s)
can collaborate. In theory, it is not necessary for
each team to have an in-depth knowledge of the
other group’s tasks.
• In developing simulation models that are going
to be used as training environments, the stakes
are higher because the human buffer is no
longer there.
What should simulation modelers know about this new application?
© The AnyLogic Company |
www.anylogic.com
43
At least in near future, there is NO way to automate the process of abstracting
reality into a simulation model because it has two aspects that [current] machines
are not good at:
̶ The process of abstracting reality is an art
̶ Simulation models are fundamentally based on uncovering causality and how something works
Can simulation modelers’ jobs be replaced with AI too?
© The AnyLogic Company |
www.anylogic.com
44
thank you!

Mais conteúdo relacionado

Semelhante a Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9years
Manoj Sharma
 
Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9years
Manoj Sharma
 

Semelhante a Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind (20)

Introducing the Oracle Cloud Infrastructure (OCI) Best Practices Framework
Introducing the Oracle Cloud Infrastructure (OCI) Best Practices FrameworkIntroducing the Oracle Cloud Infrastructure (OCI) Best Practices Framework
Introducing the Oracle Cloud Infrastructure (OCI) Best Practices Framework
 
PureApplication: Devops and Urbancode
PureApplication: Devops and UrbancodePureApplication: Devops and Urbancode
PureApplication: Devops and Urbancode
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
 
National instruments for Academics: labview multisim & elsvi
National instruments for Academics: labview multisim & elsviNational instruments for Academics: labview multisim & elsvi
National instruments for Academics: labview multisim & elsvi
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Agile development
Agile developmentAgile development
Agile development
 
OA centre of excellence
OA centre of excellenceOA centre of excellence
OA centre of excellence
 
Webinar: iPaaS in the Enterprise - What to Look for in a Cloud Integration Pl...
Webinar: iPaaS in the Enterprise - What to Look for in a Cloud Integration Pl...Webinar: iPaaS in the Enterprise - What to Look for in a Cloud Integration Pl...
Webinar: iPaaS in the Enterprise - What to Look for in a Cloud Integration Pl...
 
Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9years
 
Manoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9yearsManoj Sharma_Enovia_9years
Manoj Sharma_Enovia_9years
 
5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success5 strategies for enterprise cloud infrastructure success
5 strategies for enterprise cloud infrastructure success
 
Tailoring your SDLC for DevOps, Agile and more
Tailoring your SDLC for DevOps, Agile and moreTailoring your SDLC for DevOps, Agile and more
Tailoring your SDLC for DevOps, Agile and more
 
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
PureApp Hybrid Cloud - Mark Willemse ING Presentation 11th September 2014
 
Jfrog artifactory artifact management c tamilmaran presentation - copy
Jfrog artifactory artifact management c tamilmaran presentation - copyJfrog artifactory artifact management c tamilmaran presentation - copy
Jfrog artifactory artifact management c tamilmaran presentation - copy
 
Innovate session-2333
Innovate session-2333Innovate session-2333
Innovate session-2333
 
Navigating Pains When Moving Your Training Solution to the Public Cloud
Navigating Pains When Moving Your Training Solution to the Public CloudNavigating Pains When Moving Your Training Solution to the Public Cloud
Navigating Pains When Moving Your Training Solution to the Public Cloud
 
Why citizen developers should be your new best friend - Oracle APEX
Why citizen developers should be your new best friend - Oracle APEXWhy citizen developers should be your new best friend - Oracle APEX
Why citizen developers should be your new best friend - Oracle APEX
 
Cloud Based Cognitive Learning & IT Project Performance Platform (CLIPP Platf...
Cloud Based Cognitive Learning & IT Project Performance Platform (CLIPP Platf...Cloud Based Cognitive Learning & IT Project Performance Platform (CLIPP Platf...
Cloud Based Cognitive Learning & IT Project Performance Platform (CLIPP Platf...
 
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
IBM Bluemix OpenWhisk: Interconnect 2016, Las Vegas: CCD-1088: The Future of ...
 

Mais de Techon Organization

Mais de Techon Organization (20)

心理学・行動経済学を活用した行動変容とAI
心理学・行動経済学を活用した行動変容とAI心理学・行動経済学を活用した行動変容とAI
心理学・行動経済学を活用した行動変容とAI
 
ポスター掲示板オープンデータ化の裏側
ポスター掲示板オープンデータ化の裏側ポスター掲示板オープンデータ化の裏側
ポスター掲示板オープンデータ化の裏側
 
静岡県が目指す「VIRTUAL SHIZUOKA構想」とは?
静岡県が目指す「VIRTUAL SHIZUOKA構想」とは?静岡県が目指す「VIRTUAL SHIZUOKA構想」とは?
静岡県が目指す「VIRTUAL SHIZUOKA構想」とは?
 
マルチクラウドの悩み
マルチクラウドの悩みマルチクラウドの悩み
マルチクラウドの悩み
 
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
初めてのデータ分析基盤構築をまかされた、その時何を考えておくと良いのか
 
Tech-on MeetUp#10 「NW-JAWS × Tech-on 勉強会#01」アンケート集計結果
Tech-on MeetUp#10 「NW-JAWS × Tech-on 勉強会#01」アンケート集計結果Tech-on MeetUp#10 「NW-JAWS × Tech-on 勉強会#01」アンケート集計結果
Tech-on MeetUp#10 「NW-JAWS × Tech-on 勉強会#01」アンケート集計結果
 
NW-JAWS × Tech-on#01 LT NWaaS(ナース)って、なんなーすか?
NW-JAWS × Tech-on#01  LT NWaaS(ナース)って、なんなーすか?NW-JAWS × Tech-on#01  LT NWaaS(ナース)って、なんなーすか?
NW-JAWS × Tech-on#01 LT NWaaS(ナース)って、なんなーすか?
 
Tech on#9
Tech on#9Tech on#9
Tech on#9
 
Tech-on MeetUp#09_closing
Tech-on MeetUp#09_closingTech-on MeetUp#09_closing
Tech-on MeetUp#09_closing
 
Tech-on MeetUp#09 Microsoft資料
Tech-on MeetUp#09 Microsoft資料Tech-on MeetUp#09 Microsoft資料
Tech-on MeetUp#09 Microsoft資料
 
Tech-on MeetUp#09 hitachi資料
Tech-on MeetUp#09 hitachi資料Tech-on MeetUp#09 hitachi資料
Tech-on MeetUp#09 hitachi資料
 
Tech-on MeetUp#09 KDDI資料
Tech-on MeetUp#09 KDDI資料Tech-on MeetUp#09 KDDI資料
Tech-on MeetUp#09 KDDI資料
 
Tech-on#8 「ロボティクス〜人と生活を支えるTech〜」 アンケート集計結果
Tech-on#8  「ロボティクス〜人と生活を支えるTech〜」 アンケート集計結果Tech-on#8  「ロボティクス〜人と生活を支えるTech〜」 アンケート集計結果
Tech-on#8 「ロボティクス〜人と生活を支えるTech〜」 アンケート集計結果
 
Tech-on MeetUp#08クロージング
Tech-on MeetUp#08クロージングTech-on MeetUp#08クロージング
Tech-on MeetUp#08クロージング
 
Connected Robotics「ロボットと一緒に働くお店をつくる」
Connected Robotics「ロボットと一緒に働くお店をつくる」Connected Robotics「ロボットと一緒に働くお店をつくる」
Connected Robotics「ロボットと一緒に働くお店をつくる」
 
Techh on#7 アンケート集計結果
Techh on#7 アンケート集計結果Techh on#7 アンケート集計結果
Techh on#7 アンケート集計結果
 
Tech-on1周年のあゆみと#07クロージング
Tech-on1周年のあゆみと#07クロージングTech-on1周年のあゆみと#07クロージング
Tech-on1周年のあゆみと#07クロージング
 
Tech-on MeetUp#06「What can AI(I) do?」 アンケート集計結果
Tech-on MeetUp#06「What can AI(I) do?」 アンケート集計結果Tech-on MeetUp#06「What can AI(I) do?」 アンケート集計結果
Tech-on MeetUp#06「What can AI(I) do?」 アンケート集計結果
 
Tech on#06 SXSW2019に見るAIの未来 帆足啓一郎様@KDDI総合研究所
Tech on#06 SXSW2019に見るAIの未来 帆足啓一郎様@KDDI総合研究所Tech on#06 SXSW2019に見るAIの未来 帆足啓一郎様@KDDI総合研究所
Tech on#06 SXSW2019に見るAIの未来 帆足啓一郎様@KDDI総合研究所
 
Tech-on MeetUp#05「xR meets Everything 〜VR/AR/MRが変える日常と取り巻く技術たち〜」 アンケート集計結果
Tech-on MeetUp#05「xR meets Everything 〜VR/AR/MRが変える日常と取り巻く技術たち〜」 アンケート集計結果Tech-on MeetUp#05「xR meets Everything 〜VR/AR/MRが変える日常と取り巻く技術たち〜」 アンケート集計結果
Tech-on MeetUp#05「xR meets Everything 〜VR/AR/MRが変える日常と取り巻く技術たち〜」 アンケート集計結果
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

  • 1.
  • 2. • 自己紹介 • AnyLogic入門 • 強化学習の入門 • AnyLogic+強化学習のメリット • サンプルと実績の紹介 | OUTLINE
  • 3. Currently VP. Engineering @ Skymind • Leading RL Applications • Previously: • Assistant Manager @ JBS • Intern Researcher @ Panasonic Eduardo Gonzalez | WHO AM I 3 @wm_eddie https://qiita.com/wmeddie https://wm-eddie.info
  • 4. ● Builds AI infrastructure for operating models in production ● Allows model access from cloud, server, desktop, and mobile ● Providing tooling for models such as revision history and accuracy monitoring over time ● Created the widely used open-source AI framework Deeplearning4j, powering AI for large enterprises globally, from banking to telecom PRODUCTS SKIL: ML and DL Model Server | ABOUT SKYMIND 4
  • 5. Skymind’s team has contributed millions of lines of code to Open Source | OPEN SOURCE CONTRIBUTORS 5
  • 6. Deep Learning, A Practitioner’s Approach ● Written by Adam Gibson (CTO) and Josh Patterson (Contributor) ● Published in 2017 ● Good fundamentals for deep learning and the DL4J framework ● Many Graphics come from the book | BOOK 6
  • 7. Deep Learning and the Game of Go ● Written by Max Pumperla, Deep Learning Engineer @ Skymind ● Published in 2019 ● Shows how to go from 0 to an entire AlphaZero style Go bot ● Introduces Deep Learning and Reinforcement Learning from scratch. | BOOK 7
  • 9. AnyLogic is a multi-modal simulation modeling software that is capable of doing system dynamics, agent-based and discrete event based simulations. It is a de facto standard in the industry and is used by almost all of the Fortune 500. | ANYLOGIC AnyLogic models can be exported into a Java application and deployed to customers.
  • 10. AnyLogic models are extended with Java so you can create custom agents or experiments. Exported applications are Java libraries and can be integrated into and leverage data from Enterprise applications and Excel. | ANYLOGIC DETAILS
  • 11. DL4J includes RL4J, a reinforcement library for Java. It can be used inside AnyLogic without friction. Reinforcement Learning was a main theme of the AnyLogic ’19 Conference. Skymind collaborated closely with AnyLogic for workshops and panel discussions. | WHY ANYLOGIC + SKYMIND
  • 13. | WHAT IS AI? 13
  • 14. | 4 TYPES OF LEARNING 14
  • 16. | REINFORCEMENT LEARNING ALGORITHMS (VALUE) Q-learning is a method for training a reinforcement learning agent to anticipate how much reward it can expect in the future. The Q comes from the standard mathematical notation Q(s, a) which is a function of the state and a possible action © Intel Illustration from Deep Learning and the Game of Go © Manning
  • 17. | REINFORCEMENT LEARNING ALGORITHMS (POLICY) Actor Critic based algorithms use the current state as the input and outputs a set of moves it should play (the policy), and a value of which player is ahead (the critic) © Intel Illustration from Deep Learning and the Game of Go © Manning
  • 19. • Lots of NP-Hard problems exist in Simulation • Current Optimization techniques are not able to do anything • A good enough solution is better than no solution • And better than hand written heuristics | WHY REINFORCEMENT LEARNING
  • 20. © The AnyLogic Company | www.anylogic.com 20 Learning and decision making from a simulation model FINAL MODEL LEARN Simulation model is an extension of someone’s mental model
  • 21. © The AnyLogic Company | www.anylogic.com 21 Learning and decision making from a simulation model FINAL MODEL LEARN
  • 22. © The AnyLogic Company | www.anylogic.com 22 Simulation as the reinforcement learning environment SIMULATED WORLD (Simulation Model)
  • 24. © The AnyLogic Company | www.anylogic.com 24 Traffic Light Example Eduardo Gonzalez VP Engineering Skymind Samuel Audet Deep Learning Engineer Skymind Tyler Wolfe-Adam Technical Support Specialist The AnyLogic Company
  • 25. © The AnyLogic Company | www.anylogic.com 25 Arrivalrates(perhour) Time (seconds) Traffic Light Example Cars enter the intersection from 4 directions and move towards the opposing side. The objective of the training experiment is to learn a policy optimally controls the traffic light based on current status of the traffic. N S W E
  • 26. © The AnyLogic Company | www.anylogic.com 26 Implementation Architecture
  • 27. © The AnyLogic Company | www.anylogic.com 27 Implementation Architecture AnyLogic Model Imported RL4J library Custom Experiment
  • 28. © The AnyLogic Company | www.anylogic.com 28 What is inside the Custom experiment? Hyperparameters Network configuration Training
  • 29. © The AnyLogic Company | www.anylogic.com 29 What is inside the Custom experiment? Network configuration 10 300 300 2 Input Hidden 1 Hidden 2 Output
  • 30. © The AnyLogic Company | www.anylogic.com 30 What is inside the Custom experiment? Network configuration
  • 31. © The AnyLogic Company | www.anylogic.com 31 What is inside the Custom experiment? Network configuration Training
  • 32. © The AnyLogic Company | www.anylogic.com 32 What is inside the Custom experiment?
  • 33. © The AnyLogic Company | www.anylogic.com 33 What is inside the Custom experiment? Array with 10 elements 1 2 34 5 6 87 9
  • 34. © The AnyLogic Company | www.anylogic.com 34 What is inside the Custom experiment?
  • 35. © The AnyLogic Company | www.anylogic.com 35 What is inside the Custom experiment? Action == 0: do nothing Action == 1: change the traffic light phase if not yellow
  • 36. © The AnyLogic Company | www.anylogic.com 36 Comparison of results (Optimized vs. Policy)
  • 37. © The AnyLogic Company | www.anylogic.com 37
  • 38. © The AnyLogic Company | www.anylogic.com 38 Comparison of results (Base vs. Optimized vs. Policy) Real systems: Dynamic + Stochastic (exogenous inputs / system internals) Optimization: Optimal fixed input parameters Policy: Optimal (or near-optimal) decisions over time
  • 39. © The AnyLogic Company | www.anylogic.com 39 Reinforcement learning decision points Hyperparameters Observation Space Action SpaceReward
  • 40. © The AnyLogic Company | www.anylogic.com 40 Trained policies can be deployed in all types of devices and equipments to adaptively and autonomously complete some tasks. How are learned policies used? Edge devices could be used as controllers to deploy the learned policies.
  • 41. © The AnyLogic Company | www.anylogic.com 41 Machine Learning powered by Skymind http://www.skymind.ai/anylogic
  • 42. © The AnyLogic Company | www.anylogic.com 42 • The great news for simulation modelers is that their skills have a new and exciting application now! • To implement a reinforcement learning (or DRL) a team of DRL expert(s) + simulation modeler(s) can collaborate. In theory, it is not necessary for each team to have an in-depth knowledge of the other group’s tasks. • In developing simulation models that are going to be used as training environments, the stakes are higher because the human buffer is no longer there. What should simulation modelers know about this new application?
  • 43. © The AnyLogic Company | www.anylogic.com 43 At least in near future, there is NO way to automate the process of abstracting reality into a simulation model because it has two aspects that [current] machines are not good at: ̶ The process of abstracting reality is an art ̶ Simulation models are fundamentally based on uncovering causality and how something works Can simulation modelers’ jobs be replaced with AI too?
  • 44. © The AnyLogic Company | www.anylogic.com 44 thank you!