SlideShare a Scribd company logo
1 of 33
Download to read offline
Q-Reinforcement Learning in Tensorflow
Ben Ball & David Samuel
www.prediction-machines.com
Take inspiration from Deep Mind – Learning to play Atari video games
How does a child learn to ride a bike?
Lots of this
leading to
this
rather than this . . .
Machine Learning vs Reinforcement Learning
 No supervisor
 Trial and error paradigm
 Feedback delayed
 Time sequenced
 Agent influences the environment
Agent
Environment
Action atState St Reward rt & Next State St+1
Good textbook on this by
Sutton and Barto -
st, at, rt, st+1,at+1,rt+1,st+2,at+2,rt+2, …
 
Mathematical formulation of RL
S
=
A =
T: S4(t) → S1(t+1) = 0.5
a1
1
r(t)=1
Markov Decision Process (MDP)
Example:
reward
Markov Decision Process (MDP)
Example: Grid World 20 states
4 actions:
Game involves moving from a starting
state (box) to one occupied by a
yellow star in as few steps as possible
-1
-1
-1
-1
-1
-1
-2
-2 -2
-2
-2 -2
-2
-2 -3 -3
-3
-4
Markov Decision Process (MDP)
Example: Grid World
Optimal Policy Value function
An aside – neural networks
What is important to remember when creating one?
It is really just a way to represent a nonlinear many-to-many function
mapping. It takes m inputs, giving n outputs. The outputs are a nonlinear
transformation of the inputs.
For this network to learn this nonlinear function, it must be able to represent
the superset of all requisite functions to calculate that mapping.
t-1
t
Markov Decision Process (MDP)
Markov property
The conditional probability distrubution of future states depends only on the
present state, not on the sequence of events that preceded it.
t-2
t-1
t
Q: At a single time step (as a
state), are you able to see the
velocity and acceleration?
Markov Decision Process (MDP)
Markov property
The conditional probability distrubution of future states depends only on the
present state, not on the sequence of events that preceded it.
t-2
t-1
t
Super state
A : No, you are not. For us to
learn a function that utilizes
velocity and accelaration, AND
for this to be Markov, we must
create a synthetic super state as
the single state of the system.
Composed of the last two time
steps
Putting it all together
• Q(s,a) is the Q function that gives the value of taking a specific action
from a specific state
• These states,actions,next-states must be markov
• Deepnets are good at learning complex nonlinear functions, such as the
Q function
• (Optimizing the Q function is done through the bellman equation, which is
used to calculate the reward)
• Once we learn the Q function with our deepnet, we apply it as the optimal
policy
t-1
t
Application to Transactional
Markets
State transitions of lattice simulation of mean reversion:
Short LongFlat
Spreadpricemappedontolatticeindex
i = 0
i = -1
i = -2
i = 1
i = 2
sell buy
These map into:
(State, Action, Reward)
triplets used in the QRL algorithm
http://www.prediction-machines.com/blog/ - for demonstration
As per the Atari games example, our QRL/DQN plays the trading game
… over-and-over
*demo from
https://github.com/Prediction-Machines/Trading-Gym
+
Input
FC ReLU
FC ReLU
Functional
pass-though
Output
+
Input
FC ReLU
FC ReLU
Functional
pass-though
Output
Double Dueling DQN (vanilla DQN does not converge well but this method works much better)
target networktraining network
lattice position
(long,short,flat) state value of Buy
value of Sell
value of Do Nothing
DDDQN and
Tensorflow
Overview
1. DQN - DeepMind, Feb 2015 “DeepMindNature”
http://www.davidqiu.com:8888/research/nature14236.pdf
a. Experience Replay
b. Separate Target Network
2. DDQN - Double Q-learning. DeepMind, Dec 2015
https://arxiv.org/pdf/1509.06461.pdf
3. Prioritized Experience Replay - DeepMind, Feb 2016
https://arxiv.org/pdf/1511.05952.pdf
4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016
https://arxiv.org/pdf/1511.06581.pdf
Enhancements
Experience Replay
Removes correlation in sequences
Smooths over changes in data distribution
Prioritized Experience Replay
Speeds up learning by choosing experiences with weighted distribution
Separate target network from Q network
Removes correlation with target - improves stability
Double Q learning
Removes a lot of the non uniform overestimations by separating selection of action and evaluation
Dueling Q learning
Improves learning with many similar action values. Separates Q value into two : state value and state-
dependent action advantage
Install Tensorflow
My installation was on CentOS in docker with GPU*, but also did locally on
Ubuntu 16 for this demo. *Built from source for maximum speed.
CentOS instructions were adapted from:
https://blog.abysm.org/2016/06/building-tensorflow-centos-6/
Ubuntu install was from:
https://www.tensorflow.org/install/install_sources
Tensorflow - what is it
A computational graph solver
Tensorflow key API
Namespaces for organizing the graph and showing in tensorboard
with tf.variable_scope('prediction'):
Sessions
with tf.Session() as sess:
Create variables and placeholders
var = tf.placeholder('int32', [None, 2, 3], name='varname’)
self.global_step = tf.Variable(0, trainable=False)
Session.run or variable.eval to run parts of the graph and retrieve values
pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1})
q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})
Tensorflow tips and tricks
Injecting data into tensorboard
agent.inject_summary({'average.reward': avg_reward, 'average.loss': avg_loss, 'average.q': avg_q}, step)
def inject_summary(self, tag_dict, step):
summary_str_lists = self.sess.run([self.summary_ops[tag] for tag in tag_dict.keys()], {
self.summary_placeholders[tag]: value for tag, value in tag_dict.items()
})
for summary_str in summary_str_lists:
self.writer.add_summary(summary_str, step)
with tf.variable_scope('summary'):
scalar_summary_tags = ['average.reward', 'average.loss', 'average.q', ]
self.summary_placeholders = {}
self.summary_ops = {}
for tag in scalar_summary_tags:
self.summary_placeholders[tag] = tf.placeholder('float32', None, name=tag.replace(' ', '_'))
self.summary_ops[tag] = tf.summary.scalar("%s-%s" % (self.env_name, tag), self._placeholders[tag])
Tensorflow tips and tricks
Clean design
Tensorflow tips and tricks
Follow Common Patterns
http://www.tensorflowpatterns.org/patterns/
• Cloud ML export
• Evaluate function
• Feed dict as positional arg
• Init functions
• Loss operation
• PEP-8 style for python
• Prepare, train, evaluate
• Save model function
• Summaries operation
• Train function
• Use default graph
Tensorflow tips and tricks
Etc
• Collect all params in one place
• Allows you to easily reconfigure
• Can easily do grid search optimization
• Avoid all magic numbers
• Can compare with other papers and results easily
• Keep as much as you can in c++ code
• Use numpy and pandas dataframes for matrix comptutation
• Via py_func
• Use tensorflow functions when possible
Trading-Gym + Trading-Brain
Architecture Runner
warmup()
train()
run()
Children class
Agent
act()
observe()
end()
DQN
Double DQN
A3C
Abstract class
Memory
add()
sample()
Brain
train()
predict()
Data Generator
Random
Walks
Deterministic
Signals
CSV Replay
Market Data
Streamer
Single Asset
Multi Asset
Market
Making
Environment
render()
step()
reset()
next()
rewind()
Trading-Gym - OpenSourced Trading-Brain – On Github
Trading-Gym
https://github.com/Prediction-Machines/Trading-Gym
Open sourced
Modelled after OpenAI Gym. Compatible with it.
Contains example of DQN with Keras
Contains pair trading example simulator and visualizer
Prediction Machines release of Trading-Gym environment into OpenSource
- - demo - -
Trading-Brain
https://github.com/Prediction-Machines/Trading-Brain
Two rich examples
Contains the Trading-Gym Keras example with suggested structuring
examples/keras_example.py
Contains example of Dueling Double DQN for single stock trading game
examples/tf_example.py
References
Much of the Brain and config code in this example is adapted from devsisters github:
https://github.com/devsisters/DQN-tensorflow
Our github:
https://github.com/Prediction-Machines
Tensorflow patterns:
http://www.tensorflowpatterns.org
Our blog:
http://prediction-machines.com/blog/
Our job openings:
http://prediction-machines.com/jobopenings/
TensorFlow and Deep Learning Tips and Tricks

More Related Content

What's hot

Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuningArsalan Qadri
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance charlesmartin14
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in PythonImry Kissos
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksSteve Nouri
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitShiladitya Sen
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Taehoon Kim
 
Deep learning: what? how? why? How to win a Kaggle competition
Deep learning: what? how? why? How to win a Kaggle competitionDeep learning: what? how? why? How to win a Kaggle competition
Deep learning: what? how? why? How to win a Kaggle competition317070
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 

What's hot (20)

Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuning
 
Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance Applied Machine Learning For Search Engine Relevance
Applied Machine Learning For Search Engine Relevance
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Cheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricksCheatsheet deep-learning-tips-tricks
Cheatsheet deep-learning-tips-tricks
 
Exploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal WabbitExploring Optimization in Vowpal Wabbit
Exploring Optimization in Vowpal Wabbit
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Deep learning: what? how? why? How to win a Kaggle competition
Deep learning: what? how? why? How to win a Kaggle competitionDeep learning: what? how? why? How to win a Kaggle competition
Deep learning: what? how? why? How to win a Kaggle competition
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
parallel
parallelparallel
parallel
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
深層学習前半
深層学習前半深層学習前半
深層学習前半
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 

Similar to TensorFlow and Deep Learning Tips and Tricks

Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfKundjanasith Thonglek
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptxQingsong Guo
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLJie-Han Chen
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 

Similar to TensorFlow and Deep Learning Tips and Tricks (20)

Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 

Recently uploaded

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

TensorFlow and Deep Learning Tips and Tricks

  • 1. Q-Reinforcement Learning in Tensorflow Ben Ball & David Samuel www.prediction-machines.com
  • 2. Take inspiration from Deep Mind – Learning to play Atari video games
  • 3. How does a child learn to ride a bike? Lots of this leading to this rather than this . . .
  • 4. Machine Learning vs Reinforcement Learning  No supervisor  Trial and error paradigm  Feedback delayed  Time sequenced  Agent influences the environment Agent Environment Action atState St Reward rt & Next State St+1 Good textbook on this by Sutton and Barto - st, at, rt, st+1,at+1,rt+1,st+2,at+2,rt+2, …
  • 6. S = A = T: S4(t) → S1(t+1) = 0.5 a1 1 r(t)=1 Markov Decision Process (MDP) Example: reward
  • 7. Markov Decision Process (MDP) Example: Grid World 20 states 4 actions: Game involves moving from a starting state (box) to one occupied by a yellow star in as few steps as possible
  • 8. -1 -1 -1 -1 -1 -1 -2 -2 -2 -2 -2 -2 -2 -2 -3 -3 -3 -4 Markov Decision Process (MDP) Example: Grid World Optimal Policy Value function
  • 9. An aside – neural networks What is important to remember when creating one? It is really just a way to represent a nonlinear many-to-many function mapping. It takes m inputs, giving n outputs. The outputs are a nonlinear transformation of the inputs. For this network to learn this nonlinear function, it must be able to represent the superset of all requisite functions to calculate that mapping. t-1 t
  • 10. Markov Decision Process (MDP) Markov property The conditional probability distrubution of future states depends only on the present state, not on the sequence of events that preceded it. t-2 t-1 t Q: At a single time step (as a state), are you able to see the velocity and acceleration?
  • 11. Markov Decision Process (MDP) Markov property The conditional probability distrubution of future states depends only on the present state, not on the sequence of events that preceded it. t-2 t-1 t Super state A : No, you are not. For us to learn a function that utilizes velocity and accelaration, AND for this to be Markov, we must create a synthetic super state as the single state of the system. Composed of the last two time steps
  • 12. Putting it all together • Q(s,a) is the Q function that gives the value of taking a specific action from a specific state • These states,actions,next-states must be markov • Deepnets are good at learning complex nonlinear functions, such as the Q function • (Optimizing the Q function is done through the bellman equation, which is used to calculate the reward) • Once we learn the Q function with our deepnet, we apply it as the optimal policy t-1 t
  • 14. State transitions of lattice simulation of mean reversion: Short LongFlat Spreadpricemappedontolatticeindex i = 0 i = -1 i = -2 i = 1 i = 2 sell buy These map into: (State, Action, Reward) triplets used in the QRL algorithm
  • 15. http://www.prediction-machines.com/blog/ - for demonstration As per the Atari games example, our QRL/DQN plays the trading game … over-and-over
  • 17. + Input FC ReLU FC ReLU Functional pass-though Output + Input FC ReLU FC ReLU Functional pass-though Output Double Dueling DQN (vanilla DQN does not converge well but this method works much better) target networktraining network lattice position (long,short,flat) state value of Buy value of Sell value of Do Nothing
  • 19. Overview 1. DQN - DeepMind, Feb 2015 “DeepMindNature” http://www.davidqiu.com:8888/research/nature14236.pdf a. Experience Replay b. Separate Target Network 2. DDQN - Double Q-learning. DeepMind, Dec 2015 https://arxiv.org/pdf/1509.06461.pdf 3. Prioritized Experience Replay - DeepMind, Feb 2016 https://arxiv.org/pdf/1511.05952.pdf 4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016 https://arxiv.org/pdf/1511.06581.pdf
  • 20. Enhancements Experience Replay Removes correlation in sequences Smooths over changes in data distribution Prioritized Experience Replay Speeds up learning by choosing experiences with weighted distribution Separate target network from Q network Removes correlation with target - improves stability Double Q learning Removes a lot of the non uniform overestimations by separating selection of action and evaluation Dueling Q learning Improves learning with many similar action values. Separates Q value into two : state value and state- dependent action advantage
  • 21. Install Tensorflow My installation was on CentOS in docker with GPU*, but also did locally on Ubuntu 16 for this demo. *Built from source for maximum speed. CentOS instructions were adapted from: https://blog.abysm.org/2016/06/building-tensorflow-centos-6/ Ubuntu install was from: https://www.tensorflow.org/install/install_sources
  • 22. Tensorflow - what is it A computational graph solver
  • 23. Tensorflow key API Namespaces for organizing the graph and showing in tensorboard with tf.variable_scope('prediction'): Sessions with tf.Session() as sess: Create variables and placeholders var = tf.placeholder('int32', [None, 2, 3], name='varname’) self.global_step = tf.Variable(0, trainable=False) Session.run or variable.eval to run parts of the graph and retrieve values pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1}) q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})
  • 24. Tensorflow tips and tricks Injecting data into tensorboard agent.inject_summary({'average.reward': avg_reward, 'average.loss': avg_loss, 'average.q': avg_q}, step) def inject_summary(self, tag_dict, step): summary_str_lists = self.sess.run([self.summary_ops[tag] for tag in tag_dict.keys()], { self.summary_placeholders[tag]: value for tag, value in tag_dict.items() }) for summary_str in summary_str_lists: self.writer.add_summary(summary_str, step) with tf.variable_scope('summary'): scalar_summary_tags = ['average.reward', 'average.loss', 'average.q', ] self.summary_placeholders = {} self.summary_ops = {} for tag in scalar_summary_tags: self.summary_placeholders[tag] = tf.placeholder('float32', None, name=tag.replace(' ', '_')) self.summary_ops[tag] = tf.summary.scalar("%s-%s" % (self.env_name, tag), self._placeholders[tag])
  • 25. Tensorflow tips and tricks Clean design
  • 26. Tensorflow tips and tricks Follow Common Patterns http://www.tensorflowpatterns.org/patterns/ • Cloud ML export • Evaluate function • Feed dict as positional arg • Init functions • Loss operation • PEP-8 style for python • Prepare, train, evaluate • Save model function • Summaries operation • Train function • Use default graph
  • 27. Tensorflow tips and tricks Etc • Collect all params in one place • Allows you to easily reconfigure • Can easily do grid search optimization • Avoid all magic numbers • Can compare with other papers and results easily • Keep as much as you can in c++ code • Use numpy and pandas dataframes for matrix comptutation • Via py_func • Use tensorflow functions when possible
  • 28. Trading-Gym + Trading-Brain Architecture Runner warmup() train() run() Children class Agent act() observe() end() DQN Double DQN A3C Abstract class Memory add() sample() Brain train() predict() Data Generator Random Walks Deterministic Signals CSV Replay Market Data Streamer Single Asset Multi Asset Market Making Environment render() step() reset() next() rewind() Trading-Gym - OpenSourced Trading-Brain – On Github
  • 29. Trading-Gym https://github.com/Prediction-Machines/Trading-Gym Open sourced Modelled after OpenAI Gym. Compatible with it. Contains example of DQN with Keras Contains pair trading example simulator and visualizer
  • 30. Prediction Machines release of Trading-Gym environment into OpenSource - - demo - -
  • 31. Trading-Brain https://github.com/Prediction-Machines/Trading-Brain Two rich examples Contains the Trading-Gym Keras example with suggested structuring examples/keras_example.py Contains example of Dueling Double DQN for single stock trading game examples/tf_example.py
  • 32. References Much of the Brain and config code in this example is adapted from devsisters github: https://github.com/devsisters/DQN-tensorflow Our github: https://github.com/Prediction-Machines Tensorflow patterns: http://www.tensorflowpatterns.org Our blog: http://prediction-machines.com/blog/ Our job openings: http://prediction-machines.com/jobopenings/