TensorFlow and Deep Learning Tips and Tricks

Q-Reinforcement Learning in Tensorflow
Ben Ball & David Samuel
www.prediction-machines.com

Take inspiration from Deep Mind – Learning to play Atari video games

How does a child learn to ride a bike?
Lots of this
leading to
this
rather than this . . .

Machine Learning vs Reinforcement Learning
 No supervisor
 Trial and error paradigm
 Feedback delayed
 Time sequenced
 Agent influences the environment
Agent
Environment
Action atState St Reward rt & Next State St+1
Good textbook on this by
Sutton and Barto -
st, at, rt, st+1,at+1,rt+1,st+2,at+2,rt+2, …

Mathematical formulation of RL

S
=
A =
T: S4(t) → S1(t+1) = 0.5
a1
1
r(t)=1
Markov Decision Process (MDP)
Example:
reward

Example: Grid World 20 states
4 actions:
Game involves moving from a starting
state (box) to one occupied by a
yellow star in as few steps as possible

-1
-1
-1
-1
-1
-1
-2
-2 -2
-2
-2 -2
-2
-2 -3 -3
-3
-4
Example: Grid World
Optimal Policy Value function

An aside – neural networks
What is important to remember when creating one?
It is really just a way to represent a nonlinear many-to-many function
mapping. It takes m inputs, giving n outputs. The outputs are a nonlinear
transformation of the inputs.
For this network to learn this nonlinear function, it must be able to represent
the superset of all requisite functions to calculate that mapping.
t-1
t

Markov property
The conditional probability distrubution of future states depends only on the
present state, not on the sequence of events that preceded it.
t-2
t-1
t
Q: At a single time step (as a
state), are you able to see the
velocity and acceleration?

Markov property
The conditional probability distrubution of future states depends only on the
present state, not on the sequence of events that preceded it.
t-2
t-1
t
Super state
A : No, you are not. For us to
learn a function that utilizes
velocity and accelaration, AND
for this to be Markov, we must
create a synthetic super state as
the single state of the system.
Composed of the last two time
steps

Putting it all together
• Q(s,a) is the Q function that gives the value of taking a specific action
from a specific state
• These states,actions,next-states must be markov
• Deepnets are good at learning complex nonlinear functions, such as the
Q function
• (Optimizing the Q function is done through the bellman equation, which is
used to calculate the reward)
• Once we learn the Q function with our deepnet, we apply it as the optimal
policy
t-1
t

Application to Transactional
Markets

State transitions of lattice simulation of mean reversion:
Short LongFlat
Spreadpricemappedontolatticeindex
i = 0
i = -1
i = -2
i = 1
i = 2
sell buy
These map into:
(State, Action, Reward)
triplets used in the QRL algorithm

http://www.prediction-machines.com/blog/ - for demonstration
As per the Atari games example, our QRL/DQN plays the trading game
… over-and-over

*demo from
https://github.com/Prediction-Machines/Trading-Gym

+
Input
FC ReLU
FC ReLU
Functional
pass-though
Output
+
Input
FC ReLU
FC ReLU
Functional
pass-though
Output
Double Dueling DQN (vanilla DQN does not converge well but this method works much better)
target networktraining network
lattice position
(long,short,flat) state value of Buy
value of Sell
value of Do Nothing

Overview
1. DQN - DeepMind, Feb 2015 “DeepMindNature”
http://www.davidqiu.com:8888/research/nature14236.pdf
a. Experience Replay
b. Separate Target Network
2. DDQN - Double Q-learning. DeepMind, Dec 2015
https://arxiv.org/pdf/1509.06461.pdf
3. Prioritized Experience Replay - DeepMind, Feb 2016
4. DDDQN - Dueling Double Q-learning. DeepMind, Apr 2016

Enhancements
Experience Replay
Removes correlation in sequences
Smooths over changes in data distribution
Prioritized Experience Replay
Speeds up learning by choosing experiences with weighted distribution
Separate target network from Q network
Removes correlation with target - improves stability
Double Q learning
Removes a lot of the non uniform overestimations by separating selection of action and evaluation
Dueling Q learning
Improves learning with many similar action values. Separates Q value into two : state value and state-
dependent action advantage

Install Tensorflow
My installation was on CentOS in docker with GPU*, but also did locally on
Ubuntu 16 for this demo. *Built from source for maximum speed.
CentOS instructions were adapted from:
https://blog.abysm.org/2016/06/building-tensorflow-centos-6/
Ubuntu install was from:
https://www.tensorflow.org/install/install_sources

Tensorflow - what is it
A computational graph solver

Tensorflow key API
Namespaces for organizing the graph and showing in tensorboard
with tf.variable_scope('prediction'):
Sessions
with tf.Session() as sess:
Create variables and placeholders
var = tf.placeholder('int32', [None, 2, 3], name='varname’)
self.global_step = tf.Variable(0, trainable=False)
Session.run or variable.eval to run parts of the graph and retrieve values
pred_action = self.q_action.eval({self.s_t['p']: s_t_plus_1})
q_t, loss= self.sess.run([q['p'], loss], {target_q_t: target_q_t, action: action})

Tensorflow tips and tricks
Injecting data into tensorboard
agent.inject_summary({'average.reward': avg_reward, 'average.loss': avg_loss, 'average.q': avg_q}, step)
def inject_summary(self, tag_dict, step):
summary_str_lists = self.sess.run([self.summary_ops[tag] for tag in tag_dict.keys()], {
self.summary_placeholders[tag]: value for tag, value in tag_dict.items()
})
for summary_str in summary_str_lists:
self.writer.add_summary(summary_str, step)
with tf.variable_scope('summary'):
scalar_summary_tags = ['average.reward', 'average.loss', 'average.q', ]
self.summary_placeholders = {}
self.summary_ops = {}
for tag in scalar_summary_tags:
self.summary_placeholders[tag] = tf.placeholder('float32', None, name=tag.replace(' ', '_'))
self.summary_ops[tag] = tf.summary.scalar("%s-%s" % (self.env_name, tag), self._placeholders[tag])

Clean design

Follow Common Patterns
http://www.tensorflowpatterns.org/patterns/
• Cloud ML export
• Evaluate function
• Feed dict as positional arg
• Init functions
• Loss operation
• PEP-8 style for python
• Prepare, train, evaluate
• Save model function
• Summaries operation
• Train function
• Use default graph

Etc
• Collect all params in one place
• Allows you to easily reconfigure
• Can easily do grid search optimization
• Avoid all magic numbers
• Can compare with other papers and results easily
• Keep as much as you can in c++ code
• Use numpy and pandas dataframes for matrix comptutation
• Via py_func
• Use tensorflow functions when possible

Trading-Gym + Trading-Brain
Architecture Runner
warmup()
train()
run()
Children class
Agent
act()
observe()
end()
DQN
Double DQN
A3C
Abstract class
Memory
add()
sample()
Brain
train()
predict()
Data Generator
Random
Walks
Deterministic
Signals
CSV Replay
Market Data
Streamer
Single Asset
Multi Asset
Market
Making
Environment
render()
step()
reset()
next()
rewind()
Trading-Gym - OpenSourced Trading-Brain – On Github

Trading-Gym
https://github.com/Prediction-Machines/Trading-Gym
Open sourced
Modelled after OpenAI Gym. Compatible with it.
Contains example of DQN with Keras
Contains pair trading example simulator and visualizer

Prediction Machines release of Trading-Gym environment into OpenSource
- - demo - -

Trading-Brain
https://github.com/Prediction-Machines/Trading-Brain
Two rich examples
Contains the Trading-Gym Keras example with suggested structuring
examples/keras_example.py
Contains example of Dueling Double DQN for single stock trading game
examples/tf_example.py

References
Much of the Brain and config code in this example is adapted from devsisters github:
https://github.com/devsisters/DQN-tensorflow
Our github:
https://github.com/Prediction-Machines
Tensorflow patterns:
http://www.tensorflowpatterns.org
Our blog:
http://prediction-machines.com/blog/
Our job openings:
http://prediction-machines.com/jobopenings/

TensorFlow and Deep Learning Tips and Tricks

TensorFlow and Deep Learning Tips and Tricks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to TensorFlow and Deep Learning Tips and Tricks

Similar to TensorFlow and Deep Learning Tips and Tricks (20)

Recently uploaded

Recently uploaded (20)

TensorFlow and Deep Learning Tips and Tricks