Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

• 自己紹介
• AnyLogic入門
• 強化学習の入門
• AnyLogic＋強化学習のメリット
• サンプルと実績の紹介
| OUTLINE

Currently VP. Engineering @ Skymind
• Leading RL Applications
• Previously:
• Assistant Manager @ JBS
• Intern Researcher @ Panasonic
Eduardo Gonzalez
| WHO AM I
3
@wm_eddie
https://qiita.com/wmeddie
https://wm-eddie.info

● Builds AI infrastructure for operating models in
production
● Allows model access from cloud, server,
desktop, and mobile
● Providing tooling for models such as revision
history and accuracy monitoring over time
● Created the widely used open-source AI
framework Deeplearning4j, powering AI for
large enterprises globally, from banking to
telecom
PRODUCTS
SKIL:
ML and DL
Model Server
| ABOUT SKYMIND
4

Skymind’s team has contributed millions of lines of code to Open Source
| OPEN SOURCE CONTRIBUTORS
5

Deep Learning, A Practitioner’s Approach
● Written by Adam Gibson (CTO) and Josh Patterson (Contributor)
● Published in 2017
● Good fundamentals for deep learning and the DL4J framework
● Many Graphics come from the book
| BOOK
6

Deep Learning and the Game of Go
● Written by Max Pumperla, Deep Learning Engineer @ Skymind
● Published in 2019
● Shows how to go from 0 to an entire AlphaZero style Go bot
● Introduces Deep Learning and Reinforcement Learning from
scratch.
| BOOK
7

AnyLogic is a multi-modal simulation modeling
software that is capable of doing system
dynamics, agent-based and discrete event based
simulations.
It is a de facto standard in the industry and is
used by almost all of the Fortune 500.
| ANYLOGIC
AnyLogic models can be exported into a Java
application and deployed to customers.

AnyLogic models are extended with Java so you can create custom agents or experiments.
Exported applications are Java libraries and can be integrated into and leverage data from Enterprise
applications and Excel.
| ANYLOGIC DETAILS

DL4J includes RL4J, a reinforcement library for Java. It can be used
inside AnyLogic without friction.
Reinforcement Learning was a main theme of the AnyLogic ’19
Conference. Skymind collaborated closely with AnyLogic for workshops
and panel discussions.
| WHY ANYLOGIC + SKYMIND

| REINFORCEMENT LEARNING IN DETAIL

| REINFORCEMENT LEARNING ALGORITHMS (VALUE)
Q-learning is a method for training a reinforcement
learning agent to anticipate how much reward it can
expect in the future. The Q comes from the
standard mathematical notation Q(s, a) which is a
function of the state and a possible action
© Intel
Illustration from Deep Learning and the Game of Go © Manning

| REINFORCEMENT LEARNING ALGORITHMS (POLICY)
Actor Critic based algorithms use the current
state as the input and outputs a set of moves it
should play (the policy), and a value of which
player is ahead (the critic)
© Intel
Illustration from Deep Learning and the Game of Go © Manning

AnyLogic＋強化学習のメリット
18

• Lots of NP-Hard problems exist in Simulation
• Current Optimization techniques are not able to do anything
• A good enough solution is better than no solution
• And better than hand written heuristics
| WHY REINFORCEMENT LEARNING

© The AnyLogic Company |
www.anylogic.com
20
Learning and decision making from a simulation model
FINAL MODEL
LEARN
Simulation model is an
extension of someone’s
mental model

www.anylogic.com
21
Learning and decision making from a simulation model
FINAL MODEL
LEARN

www.anylogic.com
22
Simulation as the reinforcement learning environment
SIMULATED WORLD
(Simulation Model)

サンプルと実績の紹介
23

www.anylogic.com
24
Traffic Light Example
Eduardo Gonzalez
VP Engineering
Skymind
Samuel Audet
Deep Learning Engineer
Skymind
Tyler Wolfe-Adam
Technical Support Specialist
The AnyLogic Company

www.anylogic.com
25
Arrivalrates(perhour)
Time (seconds)
Traffic Light Example
Cars enter the intersection from 4 directions and
move towards the opposing side.
The objective of the training experiment is to
learn a policy optimally controls the traffic light
based on current status of the traffic.
N
S
W E

www.anylogic.com
26
Implementation Architecture

www.anylogic.com
27
Implementation Architecture
AnyLogic Model
Imported RL4J
library
Custom Experiment

www.anylogic.com
28
What is inside the Custom experiment?
Hyperparameters
Network configuration
Training

www.anylogic.com
29
10
300 300
2
Input
Hidden 1 Hidden 2
Output

www.anylogic.com
30

www.anylogic.com
31
Training

www.anylogic.com
32

www.anylogic.com
33
Array with 10 elements
1
2
34
5
6
87
9

www.anylogic.com
34

www.anylogic.com
35
Action == 0: do nothing
Action == 1: change the traffic
light phase if not yellow

www.anylogic.com
36
Comparison of results (Optimized vs. Policy)

www.anylogic.com
37

www.anylogic.com
38
Comparison of results (Base vs. Optimized vs. Policy)
Real systems: Dynamic + Stochastic (exogenous inputs / system internals)
Optimization: Optimal fixed input parameters
Policy: Optimal (or near-optimal) decisions over time

www.anylogic.com
39
Reinforcement learning decision points
Hyperparameters Observation Space
Action SpaceReward

www.anylogic.com
40
Trained policies can be deployed in
all types of devices and equipments
to adaptively and autonomously
complete some tasks.
How are learned policies used?
Edge devices could be used as
controllers to deploy the learned
policies.

www.anylogic.com
41
Machine Learning powered by Skymind
http://www.skymind.ai/anylogic

www.anylogic.com
42
• The great news for simulation modelers is that
their skills have a new and exciting application
now!
• To implement a reinforcement learning (or DRL)
a team of DRL expert(s) + simulation modeler(s)
can collaborate. In theory, it is not necessary for
each team to have an in-depth knowledge of the
other group’s tasks.
• In developing simulation models that are going
to be used as training environments, the stakes
are higher because the human buffer is no
longer there.
What should simulation modelers know about this new application?

www.anylogic.com
43
At least in near future, there is NO way to automate the process of abstracting
reality into a simulation model because it has two aspects that [current] machines
are not good at:
̶ The process of abstracting reality is an art
̶ Simulation models are fundamentally based on uncovering causality and how something works
Can simulation modelers’ jobs be replaced with AI too?

www.anylogic.com
44
thank you!

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

Semelhante a Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind (20)

Mais de Techon Organization

Mais de Techon Organization (20)

Último

Último (20)

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind