Deep Learning Tutorial in 100 Mins

Terry Taewoong Um (terry.t.um@gmail.com)
University of Waterloo
Department of Electrical & Computer Engineering
Terry Taewoong Um
DEEP LEARNING TUTORIAL
IN 100 MINS
1

WHO AM I
2
2008 – 2010 M.S. at Seoul National University, Korea
c.f. “Tangent Space RRT with Lazy Projection: An Efficient Planning
Algorithm for Constrained Motions”, T. T. Um et al., ARK2010.
2010 – 2014 Robotics researcher at LIG Nex1 / KIST, Korea
c.f. “Independent Joint Learning: A Novel Task-to-Task Transfer Learning
Scheme for Robot Models”, T. T. Um et al., ICRA2014.
• I am a robotics researcher

WHO AM I
3
2014 – Present PhD candidate at U.Waterloo, Canada
c.f. “Exercise Motion Classification from Large-Scale Wearable Sensor
Data Using Convolutional Neural Networks”, T. T. Um et al., IROS2017.
• I am a deep learning researcher
http://hookedoneverything.com/parkinsons/
https://www.trainwithpush.com/
PUSH project Parkinson’s disease (PD) project

WHO AM I
4
• But I am more known for ... - Facebook communities
: 로봇공학을 위한 열린 모임,
Tensorflow Korea, etc.
- Blog / Youtube
: 테리의 딥러닝 토크, T-Robotics,
대학원생 때 알았더라면
좋았을 이야기들
- Etc.
: Most-cited DL papers (github)

CONTENTS
5
1. Introduction to ML & DL 50min
2. DL methods: CNN, RNN, VAE, GAN 35min
3. Can we believe DNNs? 15min
4. Q & A 15min
Break 10min

CONTENTS
6
4. Q & A 15min
Break 10min

7
https://github.com/sjchoi86/
dl_tutorials_10weeks
https://github.com/terryum/
awesome-deep-learning-papers
http://videolectures.net/
deeplearning2017_montreal/
• Deep learning summer school
STUDY MATERIALS
• Andrew Ng, Deeplearning.ai / Coursera
• Stanford Univ., CS231n (CNNs) / CS224d (RNNs)
• Various tutorials presented in NIPS, ICML, etc.

8
AI, ML, DL, NN
https://medium.com/zeroth-ai/understanding-
artificial-intelligence-b9b58f9b25c2

9
RECOGNITION - IMAGE
Google photos
Object recognition (image retrieval)

10
YOLO v2, https://www.youtube.com/watch?v=VOC3huqHrss
RECOGNITION - IMAGE
Object detection

11
RECOGNITION - NATURAL LANGUAGE
Sentiment classification
SAD Joyful

12
Speech recognition
RECOGNITION - SPEECH

PUSH Inc., https://youtu.be/JpzuVPesFLY
13
RECOGNITION - WEARABLES
Exercise recognition Parkinson’s disease assessment

14
SUPERVISED LEARNING
Train : X → Y
image, text, speech,
wearable data, etc.
labels
Test : X → ?
(real practice)
* Never use the test dataset during the development of a model (training)

16
OVERFITTING
good performance for training data
bad performance for test data
model complexity
error
training error
test error
• Model complexity vs. Error

17
SUPERVISED LEARNING
Train : X → Y
wearable data, etc.
labels
Test : X → ?
(real practice)
* Never use the test dataset during the development of a model (training)

18
VALIDATION SET
Train : X → Y
wearable data, etc.
labels
Validation : X → ?
(real-practice indicator)
Test : X → ?
(real practice)

19
PREVENTING OVERFITTING
training time
error
training error
test error
we should
stop here
training
set
validation
set
test
set
for training
(parameter
optimization)
for early
stopping
(avoid
overfitting)
for evaluation
(measure the
performance)
keep watching the validation error
• Training / Validation / Test datasets

20
PREVENTING OVERFITTING
training validation test
• N-fold cross validation

21
BOLTS & NUTS OF BUILDING DL
http://www.computervisionblog.com/2016/12/
nuts-and-bolts-of-building-deep.html
Andrew Ng at NIPS2016

22
REGULARIZATION
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ ||𝑦 − 𝑓(𝑥)||2
, where,
𝑓 𝑥 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯
= 𝑊𝑋
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ෍ ||𝑦 − 𝑓(𝑥)||2 + ||𝑊|| 𝑚
(minimize error & prefer a simpler model)

23
GENERAL PROCEDURE OF ML
Task
Representation
(Features)
Feature
extraction
Machine
learning
IMAGE
SPEECH
Feature engineering

24
WHAT ARE THE GOOD FEATURES?
http://twistedsifter.com/2016/03/puppy-or-bagel-meme-gallery/

25
GENERAL PROCEDURE OF DL
Task
Representation
(Features)
Feature
extraction
Machine
learning
Task
Deep learning
(end-to-end)
* Feature
extraction included

26
DEEP LEARNING
• What is Deep Learning (DL) ?
- Learning methods which have deep (not shallow) architecture
- It usually allows end-to-end learning
- It automatically learn intermediate representation. Thus,
it can be regarded as a representation learning
- It often contains stacked “neural network”. Thus,
Deep learning usually indicates “deep neural network”
“Deep Gaussian Process” (2013)
https://youtu.be/NwoGqYsQifg
http://goo.gl/fxmmPE
http://goo.gl/5Ry08S

27
BIOLOGICAL EVIDENCE
Yann LeCun, https://goo.gl/VVQXJG
• The vental pathway in the visual cortex has multiple stages
• There exist a lot of intermediate representations

28
IMAGENET CHALLENGE (ILSVRC)
http://image-net.org/challenges/talks/2016/ILSVRC2016_10_09_clsloc.pdf
• 1000 classes, 1.4 million images
• The first “large-scale” ML challenge
• Labeled by Amazon Mechanical Turk
(Fei-Fei Lee, Stanford Univ.)
• Need large-scale data → ImageNet
• Need a scalable method → DL
• Need computation power → GPU
• Convolutional Neural Networks (CNNs)
AlexNet (2012), VGG (2014), GoogLeNet
(2015), ResNet (2016), DenseNet (2017)...

29
NEURAL NETWORKS
(H. Lalochelle, DLSS2017)
• A large parametric model
(like high-order polynomials)
• Learn the parameters using
gradient descent (GD) method
• Local minima problem? → Stochastic GD (SGD)
• Overfitting problem? → Large-scale data

30
NEURAL NETWORKS
• Neural networks =
Composition of functions
Linear combination
𝑊𝑥 + 𝑏
Activation
σ(𝑊𝑥 + 𝑏)
(…repeat…)
Linear combination
𝑊 σ 𝑊(… ) + 𝑏 + 𝑏
Output activation
σ 𝑜𝑢𝑡 (… )
Forward pass Backward pass
Calculate the loss
Loss(𝑦𝑡𝑟𝑢𝑒, 𝑦 𝑝𝑟𝑒𝑑)
Gradient of the loss
Gradient of the activation
Gradient of the weights
(…repeat…)
Update the weights
(H. Lalochelle, DLSS2017)
ReLU or tanh
Softmax or Linear
Optimization:
SGD, RMSProb, or Adam
cross-entropy or MSE

31
from keras.models import Sequential
from keras.layers import Dense
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=100))
model.add(Dense(units=10, activation='softmax’))
model.compile(loss='categorical_crossentropy',
optimizer='sgd’)
model.fit(x_train, y_train, epochs=5)
NN IN KERAS

• Recognition & Supervised learning
• Model selection & Overfitting
• Training set split & Cross validation
• Regularization
• Deep learning : End-to-end learning
• Neural Network Basics
SUMMARY – PART1

CONTENTS
33
4. Q & A 15min
Break 10min

34
POPULAR DL METHODS
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Unsupervised learningSupervised learning
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN)
Reinforcement learning
Deep Q-
Network
(DQN)
Actor-Critic
Policy gradient
Yuxi Li, “Deep reinforcement
learning: Overview”
https://arxiv.org/abs/1701.07274

35
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Labels (O) Labels (X)
(mostly)
Discriminative model
(mostly)
Generative model
* RNN can be used as unsupervised manner
POPULAR DL METHODS

36
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Explicit
density
Implicit
density
(try to generate
realistic samples)
POPULAR DL METHODS

37
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Static
data
(e.g. image)
Sequence
data
(e.g. natural
language)
The area that I am
most familiar with
POPULAR DL METHODS
Explicit
density
Implicit
density
(try to generate
realistic samples)

38
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Static
data
(e.g. image)
Sequence
data
(e.g. natural
language)
POPULAR DL METHODS
Explicit
density
Implicit
density
(try to generate
realistic samples)

CONVOLUTIONAL NN (CNN)
Fully-connected layers Convolutional layers
w
h
n
39 / 39
p × 𝑞
e.g.) (1k*1k) image * 1k nodes = 1 billion parameters [Fully-connected]
(3*3) kernel size * 64 kernels = 576 parameters [Convolutional]
https://github.com/vdumoulin
/conv_arithmetic

40
• How can we deal with real images which is
much bigger than MNIST digit images?
- Use not fully-connected, but locally-connected NN
- Use convolutions to get various feature maps
- Abstract the results into higher layer by using pooling
- Fine tune with fully-connected NN
https://goo.gl/G7kBjI
https://goo.gl/Xswsbd
http://goo.gl/5OR5oH
CONVOLUTIONAL NN (CNN)

CNN FEATURES
41 / 39
http://yosinski.com/deepvis

42
CNN ARCHITECTURES
AlexNet (2012)
VGG (2014)
GoogLeNet (2014) ResNet (2014)

43
APPLICATIONS
https://goo.gl/1SjmTp
A. Karpathy @ Bay area DL school 2016
https://docs.google.com/presentation/d/
1Q1CmVVnjVJM_9CDk3B8Y6MWCavZOti
KmOLQ0XB7s9Vg/edit

45
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Static
data
(e.g. image)
Sequence
data
(e.g. natural
language)
POPULAR DL METHODS
Explicit
density
Implicit
density
(try to generate
realistic samples)

RECURRENT NN (RNN)
𝑥
ℎ
RNN
(folded)
RNN
(unfolded)
• Vanishing / exploding gradient problem
• Recurrent Neural Network (RNN)
46 / 39

LONG-SHORT TERM MEMORY (LSTM)
• Long-short term memory (LSTM)
LSTM
47 / 39
[S. Hochreiter & J. Schmidhuber 1998]

RNN APPLICATIONS
(Andrej Karpathy, http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

49
RNN APPLICATIONS
• Sequence generation
• Classification
Speech recognition, Sentence/document classification,
Video classification, Activity recognition, …
𝑥
ℎ

RNN APPLICATIONS
(Andrej Karpathy, http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

RNN APPLICATIONS
• Machine translation with attention mechanism
https://research.googleblog.com/
2016/09/a-neural-network-for-
machine.html

52
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Static
data
(e.g. image)
Sequence
data
(e.g. natural
language)
POPULAR DL METHODS
Explicit
density
Implicit
density
(try to generate
realistic samples)

53
Task 2:
emotion estimation
Task 1:
person identification
TASK-SPECIFIC FEATURES

54
- Labeled data are difficult to collect
- Is this a right way to obtain a good representation?
(Lack of generalizability / transferability)
WHY UNSUPERVISED LEARNING?
Task
Deep learning
(end-to-end)
* Feature
extraction included

55
GOOD REPRESENTATION?

Good representation
GOOD & BAD REPRESENTATION
Bad representation

57
• Attempt to learn a good representation without labels
• Unsupervised learning is far more difficult than supervised learning
• Turn unsupervised learning into supervised learning!
UNSUPERVISED LEARNING

58
• Objective : Minimize reconstruction error “오토엔코더의 모든것“,
https://www.slideshare.net/
NaverEngineering/ss-
96581209
AUTOENCODER

59
“All about VAE”, H. Lee, https://www.slideshare.net/NaverEngineering/ss-96581209
VARIATIONAL AUTOENCODER (VAE)
• Objective : Minimize reconstruction error + regularization loss

60
OVERFITTING & REGULARIZATION
• Objective : Minimize reconstruction error + regularization loss

61
http://blog.fastforwardlabs.com/2016/08/12/introdu
cing-variational-autoencoders-in-prose-and.html
VARIATIONAL AUTOENCODER (VAE)

62
GENERATED IMAGES BY VAE
https://github.com/davidsandberg/facenet/wiki/Variational-autoencoder

63
GENERATED IMAGES BY VAE
https://github.com/davidsandberg/facenet/wiki/Variational-autoencoder

64 / 39
[X. Yan et al. 2016]

CONDITIONAL VAE
65 / 39
[X. Yan et al. 2016]

66
MUSIC VAE

67
Generative
Adversarial
Network (GAN)
Variational
Autoencoder
(VAE)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Static
data
(e.g. image)
Sequence
data
(e.g. natural
language)
POPULAR DL METHODS
Explicit
density
Implicit
density
(try to generate
realistic samples)

68
GENERATIVE MODELS
이활석, “그림 그리는 AI”,
https://www.slideshare.net/NaverEngineering/ai-83896428

69
NOT OPTIMIZATION, BUT GAME
이활석, “그림 그리는 AI”,
https://www.slideshare.net/NaverEngineering/ai-83896428
http://bzit.donga.com/List/3/all/50/1202090/1

70 / 39
DCGAN EBGAN LSGAN
WGAN BEGAN DRAGAN
GAN

CYCLE-GAN
72

CYCLE-GAN
73

GAN VARIANTS
74 / 39
GAN zoo,
https://deephunt.in/the-
gan-zoo-79597dc8c347
Most of them have
been developed for
the last year

VOICE GENERATION ( A U TOR EGR ESSIVE)
75
김태훈 (OpenAI), 네이버 Deview2017 “책읽는 딥러닝”
https://www.youtube.com/watch?v=klnfWhPGPRs&t=1992s

76
Google Duplex
https://www.youtube.com/watch?v=D5VN56jQMWM&t=2m47s
RECOGNITION + GENERATION

77
POPULAR METHODS
Variational
Autoencoder
(VAE)
Generative
Adversarial
Network (GAN)
Convolutional
Neural Network
(CNN)
Recurrent
Neural Network
(RNN*)
Static
data
(e.g. image)
Sequence
data
(e.g. natural
language)
Explicit
density
Implicit
density
(try to generate
realistic samples)

CONTENTS
78
4. Q & A 15min
Break 10min

BELIEVE OR NOT
79
green?
enemy?
1. Adversarial attacks
2. Uncertainty
3. Interpretability

BELIEVE OR NOT
80
[Keyword]
= NOISE
(perturbation)
2. Uncertainty
3. Interpretability

81
[Wang &
Bovik, 2002]
ERRORS IN INPUT SPACE

ADVERSARIAL ATTACKS
82
Gradient ascent method:
Increase “the changes of the loss” w.r.t. the changes of the input”

83
ADVERSARIAL ATTACKS
• Adversarial examples in the physical world (Kurakin et al. 2016)

ADVERSARIAL ATTACKS
84
• Adversarial patch (Brown et al. 2017)

ADVERSARIAL TRAINING
85
https://www.spsc.tugraz.at/research/roM/virtual-adversarial-training-
applied-neural-higher-order-factors-phone-classification
• Virtual adversarial training (Miyato et al. 2016)
https://youtu.be/kvPmArtVoFE

BELIEVE OR NOT
86
green?
enemy?
2. Uncertainty
3. Interpretability

BAYESIAN APPROACHES
87
• Posterior ∝ Prior * Likelihood

GAUSSIAN PROCESS
88
Beautiful, but not scalable!

DROPOUT AS BAYESIAN
89
• Dropout: Randomly drop nodes
→ regularization

BELIEVE OR NOT
90
green?
enemy?
2. Uncertainty
3. Interpretability

OCCLUSION TEST

CLASS ACTIVATION MAP (CAM)
• Detect the most discriminative
parts from the label (without
the need of bounding boxes)

CAM
93

AI FOR ETHICS?
94
green?
enemy?
2. Uncertainty
3. Interpretability

CONTENTS
95
1. Introduction & ML basics 35min
2. Supervised Learning: CNN & RNN 20min
3. Unsupervised Learning: VAE & GAN 20min
5. Q & A 15min
Break 10min

Deep Learning Tutorial in 100 Mins

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Deep Learning Tutorial in 100 Mins

Semelhante a Deep Learning Tutorial in 100 Mins (20)

Mais de Terry Taewoong Um

Mais de Terry Taewoong Um (14)

Último

Último (20)

Deep Learning Tutorial in 100 Mins