Over the last years machine learning has become a hot topic. This is no surprise as the state of the art has reached levels that started to gain a lot of attention in the media: Self-driving cars, winning to Go game against top professional players and social media platforms that recognize your face in most online pictures.
This presentation provides an overall picture of machine learning and does not require any previous knowledge in the field. Sit back, relax and learn more about key concepts and recent advances of the field.
The demo part looks at two specific examples
* Pattern recognition with handwritten character recognition
* Natural language processing with neural word embeddings
At the end some less well known examples/topics will help to illustrate the current state of the art and should provide some food for thought regarding the future directions of this domain.
The slides have been used for a session with the Java User Group Switzerland: https://www.jug.ch/html/events/2017/machine_learning.html
6. Markov Decision Process
Environment (Atari Breakout)
Agent performing Actions (Left, Right, Release Ball)
State (Bricks, location / direction of ball, …)
Rewards (A Brick is hit)
Deep Reinforcement Learning
7. Q-Learning (simplified)
Markov Decision Process
Q(s, a) Highest sum of future Rewards for action a in state s
initialize Q randomly
set initial state s0
repeat
execute a to maximize Q(si, a)
observe r and new state si+1
set Q = update(Q, r, a, si+1)
set si = si+1
until terminated
Deep Reinforcement Learning
8. Deep Q Learning (DQN)
Q Learning
Q(s, a) = Deep Neural Network (DNN)
Retrain DNN regularly (using it’s own experience)
Deep Reinforcement Learning
Action a
Left, Right, Release
DNN Q(s, a)
State s
11. Challenges
Getting the RIGHT data for the task
And LOTs of it
There is never enough data …
Real World Lessons
Data is crucial for successful ML projects
Most boring and timeconsuming task
Most underestimated task
Getting the Data
12. Rosemary, Rosmarinus officinalis
Sentiment Analysis
1245 NEGATIVE
shallow , noisy and pretentious .
14575 POSITIVE
one of the most splendid
entertainments to emerge from
the french film industry in years
Iris or Flower set or
example for outlier
detection?
86211,B,12.18,17.84,77.79, …
862261,B,9.787,19.94,62.11, …
862485,B,11.6,12.84,74.34, …
862548,M,14.42,19.77,94.48, …
862009,B,13.45,18.3,86.6, …
16. Model Complexity
Training Iterationen
Error Rate
Training Data
Test Data
«Underfitting»
more training needed
«Overfitting»
too much training
model too simple
model too complex
18. Supervised Learning
• Learning from Examples
• Right Answers are known
Unsupervised Learning
• Discover Structure in Data
• Dimensionality Reduction
Reinforcement Learning
• Interaction with Dynamic Environment
21. Data
Which digit is this?
Collect our own data
Model
Deep Neural Network (LeNet-5)
Deeplearning4j
Deep Learning Library
Open Source (Apache)
Java
Pattern Recognition
Handwritten Digits
25. Unsupervised Learning
Natural Language Processing
Data
Google News text training dataset
Texts with total of 3’000’000’000 words
Lexicon: 3’000’000 words/phrases
Model
Word2Vec Skip-gram
Mapping: Word 300-dimensional number space
Many useful properties (word clustering, syntax, semantics)
Deeplearning4j
(Train) load and use Google News word2vec model
29. Games Backgammon 1979, chess 1997, Jeopardy! 2011,
Atari games 2014, Go 2016, Poker (Texas Hold’em) 2017
Visual CAPTCHAs 2005, face recognition 2007,
traffic sign reading 2011, ImageNet 2015,
lip-reading 2016
Other Age estimation from pictures 2013, personality judgement from
Facebook «likes» 2014, conversational speech recognition 2016
ML performance >= Human Levels (2017)
https://finnaarupnielsen.wordpress.com/2015/03/15/status-on-human-vs-machines/
You have probably followed the story of the korean wold champion in go loosing against the alphago system built by the guys at deep mind.
So, how is playing go different to playing chess from a systems perspective?
This is nicely explained in the following video clip
The next clip has «demis hassabis» from deepmind taking about the difference of playing go and chess from a human perspective
Video 1: What is go and how does it compare to chess?
Video 2: As the complexity of go is so much higher than with chess, intuition becomes even more important.
Video 3: The last snippet is a high level description of the training of alpha go
But of course – as in the case of the chess playing system – there is also a hardware story behind the win of alphago
---
AlphaGo was powered by TPUs in the matches against Go world champion, Lee Sedol, enabling it to "think" much faster and look farther ahead between moves.
https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html?utm_content=buffer73148&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
The Tensor Processing Units (TPU) are custom built ASIC boards that boost machine learning applications
by parallelizing large amounts of low precision matrix computations.
This brings around 3 generation of moore’s law making available the processing power today that we could expect to be available 6 years in the future …
https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html?utm_content=buffer73148&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Deep reinforcement learning was used for the Alpha go
A milestone article about Atari games was submitted to Nature just 2 years ago
The Alpha-Go Paper
https://deepmind.com/research/dqn/
http://www.nature.com/nature/journal/v518/n7540/abs/nature14236.html
Deep reinforcement learning
https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
http://rll.berkeley.edu/deeprlcourse/
tensor flow:
https://www.tensorflow.org/
In the follwing we will
Get some stories about where all this came from
Talk about the major concepts
Do some small demos
And have a look at current work that provides a glimpse at the things to come
Most boring task and timeconsuming task of most machine learning projects (often > most of the time of a new machine learning project is used to get at the right data and enough data)
Most underestimated task
But, Good data sets are valuable over many decades
movie reviews: sentiment polarity
http://www.cs.cornell.edu/People/pabo/movie-review-data/
Handwritten digits: mnist
http://yann.lecun.com/exdb/mnist/
breast cancer: wdbc
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
imagenet competition
http://image-net.org/challenges/LSVRC/2012/
http://image-net.org/challenges/LSVRC/2012/browse-synsets
http://image-net.org/synset?wnid=n12864160 (rosemary)
many more data sets
http://deeplearning.net/datasets/
Massive revival with deep deep learning
Classical neural networks
Many layers
Some tricks
Rebranding from neural nets to deep learning
more training needed
overtrained
http://cs229.stanford.edu/schedule.html
In the follwing we will
Get some stories about where all this came from
Talk about the major concepts
Do some small demos
And have a look at current work that provides a glimpse at the things to come
Convolutinoal nn explained
http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/?utm_content=bufferfb698&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
http://yann.lecun.com/exdb/mnist/
https://deeplearning4j.org/
web online demo
https://transcranial.github.io/keras-js/#/mnist-cnn
Word2vec model
https://github.com/mmihaltz/word2vec-GoogleNews-vectors
3 billion words, 3 million vocabular (incl phrases such as «boston globe»)
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
100 billion words from a Google News dataset
Original paper (mikolov et al, 2013a) «Efficient Estimation of Word Representations in Vector Space”
vector dimensionality 300 and context size 5
https://arxiv.org/pdf/1301.3781.pdf [6 billion tokens, 1 million words in voc]
2nd paper (mikolov et al, 2013b)
https://arxiv.org/pdf/1310.4546.pdf []
Also in 2014, researchers started to combine pattern recognition and natural language processing, both based on deep learning
Stanford (2014)
http://cs.stanford.edu/people/karpathy/deepimagesent/devisagen.pdf
Google + stanford (2014)
https://gigaom.com/2014/11/18/google-stanford-build-hybrid-neural-networks-that-can-explain-photos/
https://arxiv.org/pdf/1605.05396.pdf
https://arxiv.org/abs/1703.07511
https://research.googleblog.com/2016/09/a-neural-network-for-machine.html
paper
https://arxiv.org/abs/1609.08144
Newer version (multi-lingual)
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
face2face
http://www.graphics.stanford.edu/~niessner/thies2016face.html
video
https://www.youtube.com/watch?v=ttGUiwfTYvg
paper
http://www.graphics.stanford.edu/~niessner/papers/2016/1facetoface/thies2016face.pdf
https://www.stlouisfed.org/on-the-economy/2016/january/jobs-involving-routine-tasks-arent-growing
1. The 90’ Computers getting used everywhere, first industrial roboters (? Need to verify!)
2. 2008 Financial crisis
3. 2017+ Future …