Atari Game State Representation using Convolutional Neural Networks

Training a Multi Layer Perceptron
with Expert Data and Game State
Representation using Convolutional
Neural Networks
JOHN STAMFORD
MSC INTELLIGENT SYSTEMS AND ROBOTICS

Contents
Background and Initial Brief
Previous Work
Motivation
Technical Frameworks
State Representation
Testing
Results
Conclusion
Future work

Background / Brief
Based on a project by Google/Deepmind
Build an App to capture gameplay data
◦Users play Atari games on a mobile device
◦We capture the data (somehow)
Use the data in machine learning
◦Reduce the costliness nature of Reinforcement
Learning

Deepmind
Bought by Google for £400 million
“Playing Atari with Deep Reinforcement Learning” (2013)
General Agent
◦ No prior knowledge of the environment
◦ Inputs (States) and Outputs (Actions)
◦ Learn Policies
◦ Mapping States and Actions
Deep Reinforcement Learning
Deep Q Networks (DQN)
2015 Paper Release (with source code LUA)

Motivation
Starts the Q-Learning Sample Code
◦ Deep Reinforcement Learning (Q-Learning)
◦ Links to Deepmind (Mnih et al. 2013)
Costly nature of Reinforcement Learning
◦ Trial and Error Approach
◦ Issues with long term goals
◦ Makes lots of mistakes
◦ Celiberto et al. (2010) states...
“this technique is not efficient enough to be used in applications
with real world demands due to the time that the agent needs
to learn”

Background
Q-Learning (RL)
◦ Learn the optimal policy, which action to take at each state
◦ Represented as...
Q(s, a)
Functioning: Watkins and Dayan (1992) state that...
◦ system observes its current state xn
◦ selects/performs an action an
◦ observes the subsequent state yn and gets the reward rn
◦ updates the Qn (s, a) values using
◦ a learning rate identified as α
◦ discounted factor as γ
Qn(s,a) = (1 - αn)Qn-1(s, a) + αn[rn + γ(max(Qn-1(yn,a)))]

Pseudo Code
Source: Mnih et al. (2013)

Representation of Q(s,a)
Actions
States
Q Values

Other Methods
Imitation Learning (IL)
◦ Applied to robotics e.g. Nemec et al. (2010), Schmidts et al. (2011) and
Kunze et al. (2013)
Could this be applied to the games agent?
◦ Potentially by mapping the states and the actions from observed game play
◦ Manually updating the policies
Hamahata et al. (2008) states that “imitation learning consisting of a
simple observation cannot give us the sophisticated skill”

Other Methods
Combining RL and IL
◦ Kulkarni (2012, p. 4) refers to this as ‘semi-supervised learning’
◦ Barto and Rosenstein (2004) suggesting the use of a model which acts as a
supervisor and an actor.
Supervisor Information (Barto and Rosenstein, 2004)
State
Representation

The Plan (at this point)
Reduce the costly impact of RL
◦ Use some form of critic or early reward system
◦ If no Q Value exists for that state, then check with an expert
Capture Expert Data
◦ States
◦ Actions
◦ Rewards
Build a model
Use the model to inform the Q Learning System

Data Capture Plan
Capture
Input Data
Using Stella VCS
based Android
Solution
User Actions
Up, Down, Left, Right, Up, Down, Left,
Right, Up, Down, Left, Right,
Right, Up, Down, Left, Right,
Right
Account for SEED Variant
setSeed(12345679)
Replay in the Lab
Extract Score & States
Using ALE

The Big Problem
We couldn’t account for the randomisation
◦ALE is based on Stella
◦ Version problems
◦Tested various approaches
◦Replayed games over Skype
We could save the state..!
◦But had some problems
Other problems

Technical Implementation
Arcade Learning Environment (ALE) (Bellemare et al 2013)
◦ General Agent Testing Environment using Atari Games
◦ Supporting 50+ Games
◦ Based Stella VCS Atari Emulator
◦ Supports Agents in C++, Java and more...
Python 2.7 (Anaconda Distribution)
Theano (ML Framework written in Python)
◦ Mnih et al. (2013)
◦ Q-Learning Sample Code
◦ Korjus (2014)
Linux then Windows 8, Cuda Support

Computational Requirements
Test System
◦ Simple CNN / MLP
◦ 16,000 grayscale
◦ 28x28 images
Results
◦ Significant Difference with Cuda Support
◦ CNN Process is very computationally
costly
MLP Speed Test Results
CNN Speed Test Results

States and Actions
States - Screen Data
◦Raw Screen Data
◦SDL (SDL_Surface)
◦ BMP File
Actions – Controller Inputs
Resulted in….
◦Lots of Images matched to entries in a CSV File

Rewards
ALE Reward Data
void BreakoutSettings::step(const System& system) {
// update the reward
int x = readRam(&system, 77);
int y = readRam(&system, 76);
reward_t score = 1 * (x & 0x000F) + 10 * ((x & 0x00F0) >>
4) + 100 * (y & 0x000F);
m_reward = score - m_score;
m_score = score;
// update terminal status
int byte_val = readRam(&system, 57);
if (!m_started && byte_val == 5) m_started = true;
m_terminal = m_started && byte_val == 0;
}

State Representation
Screen Pixel – 160 x 210 RGB
If we used them as inputs...
◦ RGB: 100,800
◦ Greyscale: 33,000
Mnih et al. (2013) use cropped 84 x 84 images
◦ Good – High Resolutions, Lots of Features Present
◦ Bad – When handling lots of training data
MNIST Example Set use 28 x 28
◦ Good – Computationally Acceptable
◦ Bad – Limited Detail
The problem
◦ Unable to process large amounts of hi-res images
◦ Low-res images gave poor results

Original System - Image
Processing
Image Resize Methods
Temporal Data (Frame Merging)

Original System - Training
Results
28x28 Images
64x64 Images
84x84 (4,100 images) = Memory Error
7 minutes for 16,000 28x28
18 minutes for 4,000 64x64

CNN Framework
Mnih et al. (2013) make use of Convolutional Neural Networks
Feature extraction
◦ Can be used to reduce Dimensionality of the Domain Space
◦ Examples include
◦ Hand Writing Classification Yuan et al. (2012), Bottou et al. (1994)
◦ Face Detection Garcia and Delakis (2004) and Chen et al. (2006)
A CNN as inputs for a fully connected MLP (Bergstra et al. 2010).

Convolutional Neural Networks
Feature Extraction
Developed as a result of the work of LeCun
et al. (1998)
Take inspiration from cats and monkeys
visual processes Hubel and Wiesel (1962,
1968)
Can accommodate changes in Scale,
Rotation, Stroke Width, etc
Can handle Noise
See: http://yann.lecun.com/exdb/lenet/index.html

Convolution of an Image
0 0 0
0 1 0
0 0 0
Example Kernel
Source: https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

Other Examples
0 -1 0
-1 5 -1
0 -1 0
0 1 0
1 -4 1
0 1 0
1 0 -1
0 0 0
-1 0 1
-1 -1 -1
-1 8 -1
-1 -1 -1
Source: http://en.wikipedia.org/wiki/Kernel_(image_processing)

CNN Feature Extraction
Single Convolutional Layer
◦ From Full Resolution Images (160 x 210 RGB)
1,939 Inputs
130 Inputs

CNN Feature Extraction
Binary Conversion
◦ Accurate State Representation
Lower Computational Costs
◦ Single Convolution Layer (15 seconds for 2,391 images / 11.7 seconds for 1,790)
◦ Reduced number of inputs for the MLP
◦ More Manageable

Problems & Limitations
Binary Conversion was too severe (Breakout)
Feature removed by binary conversion as shown above
Seaquest could not differentiate between the enemy and the goals

New System Training Results
Test Configuration
Results
Lowest Error Rate: 32.50%

Evidence of Learning
MLP New System

Conclusion
Large amounts of data
CNN as a Preprocessor...
◦ Reduced Computational Costs
◦ Allowed for good state representation
◦ Reduced dimensionality for the MLP
Old System
◦ No evidence of learning
New System
◦ Evidence of the system learning
◦ Needs to be implemented as an agent to test real-world effectiveness

What would I do differently?
Better Evaluation Methodology
◦ What was the frequency/distribution of controls?
◦ Was the system better at different games or controls?
Went too far with the image conversion...

Future Work
1. Data Collection Methods
2. Foundation for Q-Learning

Future Work
3. State Representation
Step 1
Identify areas of interest
Step 2
Process and Classify Area
Step 3
Update State Representation

Future Work
4. Explore the effects of multiple Convolutional Layers
5. Build a working agent...!
? ?

Useful Links
ALE (Visual Studio Version)
https://github.com/mvacha/A.L.E.-0.4.4.-Visual-Studio
Replicating the Paper “Playing Atari with Deep Reinforcement Learning” - Kristjan Korjus et al
https://courses.cs.ut.ee/MTAT.03.291/2014_spring/uploads/Main/Replicating%20DeepMind.pdf
Github for the above project
https://github.com/kristjankorjus/Replicating-DeepMind/tree/master/src
ALE : http://www.arcadelearningenvironment.org/
ALE Old Site: http://yavar.naddaf.name/ale/

Bibliography
Barto, M. T. and Rosenstein, A. G. (2004), `Supervised actor-critic reinforcement learning', Handbook of Learning and Approximate Dynamic
Programming 2, 359.
Bellemare, M. G., Naddaf, Y., Veness, J. and Bowling, M. (2013), `The arcade learning environment: An evaluation platform for general agents',
Journal of Articial Intelligence Research 47, 253-279.
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D. and Bengio, Y. (2010), Theano: a CPU and
GPU math expression compiler, in `Proceedings of the Python for Scientic Computing Conference (SciPy)'. Oral Presentation.
Celiberto, L., Matsuura, J., Lopez de Mantaras, R. and Bianchi, R. (2010), Using transfer learning to speed-up reinforcement learning: A cased-
based approach, in `Robotics Symposium and Intelligent Robotic Meeting (LARS), 2010 Latin American', pp. 55-60
Korjus, K., Kuzovkin, I., Tampuu, A. and Pungas, T. (2014), Replicating the paper "Playing Atari with Deep Reinforcement Learning", Technical
report, University of Tartu.
Kulkarni, P. (2012), Reinforcement and systemic machine learning for decision making, John Wiley & Sons, Hoboken.
Kunze, L., Haidu, A. and Beetz, M. (2013), Acquiring task models for imitation learning through games with a purpose, in `Intelligent Robots and
Systems (IROS), 2013 IEEE/RSJ International Conference on', pp. 102-107.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M. (2013), Playing Atari with deep reinforcement
learning, in `NIPS Deep Learning Workshop'.
Nemec, B., Zorko, M. and Zlajpah, L. (2010), Learning of a ball-in-a-cup playing robot, in `Robotics in Alpe-Adria-Danube Region (RAAD), 2010 IEEE
19th International Workshop on', pp. 297-301.
Schmidts, A. M., Lee, D. and Peer, A. (2011), Imitation learning of human grasping skills from motion and force data, in `Intelligent Robots and
Systems (IROS), 2011 IEEE/RSJ International Conference on', pp. 1002-1007.
Watkins, C. J. C. H. and Dayan, P. (1992), `Technical note q-learning', Machine Learning 8, 279-292.

Atari Game State Representation using Convolutional Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Atari Game State Representation using Convolutional Neural Networks

Similar to Atari Game State Representation using Convolutional Neural Networks (20)

Recently uploaded

Recently uploaded (20)

Atari Game State Representation using Convolutional Neural Networks