SlideShare a Scribd company logo
1 of 52
Download to read offline
[course site]
Javier Ruiz Hidalgo
javier.ruiz@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Loss functions
Day 3 Lecture 1
#DLUPC
Me
● Javier Ruiz Hidalgo
○ Email: j.ruiz@upc.edu
○ Office: UPC, Campus Nord, D5-008
● Teaching experience
○ Basic signal processing
○ Project based
○ Image processing & computer vision
● Research experience
○ Master on hierarchical image representations by UEA (UK)
○ PhD on video coding by UPC (Spain)
○ Interests in image & video coding, 3D analysis and super-resolution
2
Acknowledgements
3
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Xavier Giró
xavier.giro@upc.edu
Associate Professor
ETSETB TelecomBCN
Universitat Politècnica de Catalunya
Motivation
4
Model
● 3 layer neural network (2 hidden layers)
● Tanh units (activation function)
● 512-512-10
● Softmax on top layer
● Cross entropy loss
28
28
Layer #Weights #Biases Total
1 784 x 512 512 401,920
2 512 x 512 512 262,656
3 512 x 10 10 5,130
669,706
??
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
5
Nomenclature
loss function
cost function
objective function
error function
6
=
=
=
Definition
In a supervised deep learning context the loss function
measures the quality of a particular set of parameters
based on how well the output of the network agrees with
the ground truth labels in the training data.
7
Loss function (1)
8
How good does
our network with
the training data?
labels (ground truth)
input
parameters (weights, biases)error
Deep Network
input output
Loss function (2)
● The loss function does not want to measure the entire
performance of the network against a validation/test
dataset.
9
Loss function (3)
● The loss function is used to
guide the training process
in order to find a set of
parameters that reduce the
value of the loss function.
10
Training process
Stochastic gradient descent
● Find a set of parameters which make the loss as small
as possible.
● Change parameters at a rate determined by the partial
derivatives of the loss function:
11
Properties (1)
● Minimum (0 value) when the
output of the network is equal
to the ground truth data.
● Increase value when output
differs from ground truth.
12
epoch
Loss
Similar to GTDifferent from GT
Properties (2)
13
● Ideally → convex function
Properties (2)
14
● In reality → some many parameters (in the order of
millions) than it is not convex
Properties (3)
15
● Varies smoothly with changes on the output
○ Better gradients for gradient descent
○ Easy to compute small changes in the parameters to get an
improvement in the loss
Assumptions
● For backpropagation to work:
○ Loss function can be written as an average over loss
functions for individual training examples:
○ Loss functions can be written as a function of the
output activations from the neural network.
16Neural Networks and Deep Learning
empirical risk
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
17
Common types of loss functions (1)
● Loss functions depend on the type of task:
○ Regression: the network predicts continuous,
numeric variables
■ Example: Length of fishes in images, price of a
house
■ Absolute value, square error
18
19
Loss functions for regression
L1 Distance /
Absolute Value
L2 Distance /
Euclidian Distance / Square Value
Lighter computation
(only |·|)
Heavier computation
(square)
20
Loss functions for regression
L1 Distance /
Absolute Value
L2 Distance /
Euclidian Distance / Square Value
Less penalty to errors > 1
More penalty to errors < 1
More penalty to error > 1
Less penalty to errors < 1
21
Loss functions for regression
L1 Distance /
Absolute Value
L2 Distance /
Euclidian Distance / Square Value
Less sensitive to outliers,
which are very high yi
-fθ
(xi
)
More sensitive to outliers,
which are very high yi
-fθ
(xi
)
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
22
Common types of loss functions (2)
● Loss functions depend on the type of task:
○ Classification: the network predicts categorical
variables (fixed number of classes)
■ Example: classify email as spam, classify images
of numbers.
■ hinge loss, Cross-entropy loss
23
Classification (1)
We want the network to classify the input into a
fixed number of classes
24
class “1”
class “2”
class “3”
Classification (2)
● Each input can have only one label
○ One prediction per output class
■ The network will have “k” outputs (number of classes)
25
0.1
2
1
class “1”
class “2”
class “3”
input
Network
output
scores / logits
Classification (3)
● How can we create a loss function to improve the scores?
○ Somehow write the labels (ground truth of the data) into a vector →
One-hot encoding
○ Non-probabilistic interpretation → hinge loss
○ Probabilistic interpretation: need to transform the scores into a probability
function → Softmax
26
One-hot encoding
● Transform each label into a vector (with only 1 and 0)
○ Length equal to the total number of classes “k”
○ Value of 1 for the correct class and 0 elsewhere
27
0
1
0
class “2”
1
0
0
class “1”
0
0
1
class “3”
Softmax (1)
● Convert scores into confidences/probabilities
○ From 0.0 to 1.0
○ Probability for all classes adds to 1.0
28
0.1
2
1
input
Network
output
scores / logits
0.1
0.7
0.2
probability
Softmax (2)
● Softmax function
29
0.1
2
1
input
Network
output
scores / logits
0.1
0.7
0.2
probability
scores (logits)
Neural Networks and Deep Learning (softmax)
30
Cross-entropy loss (1)
31
0.1
2
1
input
Network
output
scores / logits lk
0.1
0.7
0.2
probability S(lk
)
0
1
0
labels yk
loss function
Loss
contribution
from input xi
in a training
batch
Cross-entropy loss (2)
32
0.1
2
1
input
Network
output
scores / logits lk
0.1
0.7
0.2
probability S(lk
)
0
1
0
labels yk
loss function
In a one-hot vector, only
one yk
=1 (rest are 0) !!
Cross-entropy loss (2)
33
0.1
2
1
input
Network
output
scores / logits lk
0.1
0.7
0.2
probability S(lk
)
0
1
0
labels yk
loss function
Only the confidence for the
GT class is considered.
Cross-entropy loss (3)
34
-log penalizes small confidence
scores for GT class
Cross-entropy loss (3)
● For a set of n inputs
35
labels (one-hot)
Softmax
Cross-entropy loss (4)
● In general, cross-entropy loss works better than
square error loss:
○ Square error loss usually gives too much emphasis to
incorrect outputs.
○ In square error loss, as the output gets closer to either 0.0
or 1.0 the gradients get smaller, the change in weights gets
smaller and training is slower.
36
Weighted cross-entropy
37
● Dealing with class imbalance
○ Different number of samples for different classes
○ Introduce a weighing factor (related to inverse class frequency or treated as
hyperparameter)
class “ant”
class “panda”class “flamingo”
Multi-label classification (1)
● Outputs can be matched to more than one label
○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car.
38Bucak, Serhat Selcuk, Rong Jin, and Anil K. Jain. "Multi-label learning with incomplete class assignments." CVPR 2011.
Multi-label classification (2)
● Outputs can be matched to more than one label
○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car.
● .. directly use sigmoid at each output independently instead
of softmax
39
Multi-label classification (3)
● Cross-entropy loss for multi-label classification:
40
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
41
Regularization
● Control the capacity of the network to prevent
overfitting
○ L2-regularization (weight decay):
○ L1-regularization:
42
regularization parameter
Outline
● Introduction
○ Definition, properties, training process
● Common types of loss functions
○ Regression
○ Classification
● Regularization
● Example
43
cross-entropy loss = 1.63
cross-entropy loss = 4.14
Example
44Why you should use cross-entropy instead of classification error
Classification accuracy = 2/3 = 0.66
Classification accuracy = 2/3 = 0.66
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
0.3
0.3
0.4
Confidences S(lk
)
0.1
0.7
0.2
0.1
0.2
0.7
Ground Truth
1
0
0
0
1
0
0
0
1
Predictions
0
0
1
0
1
0
0
0
1
Example
45Why you should use cross-entropy instead of classification error
a) Compute the accuracy of the predictions.
0.1
0.2
0.7
Confidences S(lk
) Predictions
0.3
0.4
0.3
0.3
0.3
0.4
0
0
1
0
1
0
0
0
1
The class with the maximum confidence is predicted.
Ground Truth
1
0
0
0
1
0
0
0
1
Example
46
a) Compute the accuracy of the predictions.
0.1
0.2
0.7
Confidences S(lk
) Predictions
0.3
0.4
0.3
0.3
0.3
0.4
0
0
1
0
1
0
0
0
1
Class predictions and ground truth can be compared with a confusion matrix.
Ground Truth
1
0
0
0
1
0
0
0
1
Prediction
Class
1
Class
2
Class
3
GT
Class
1
0 0 1
Class
2
0 1 0
Class
3
0 0 1
Accuracy =2 / 3 = 0.66 Accuracy =2 / 3 = 0.66
Example
47Why you should use cross-entropy instead of classification error
Classification accuracy = 2/3
cross-entropy loss = 4.14
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
Ground Truth y
1
0
0
0
1
0
0
0
1
Example
48Why you should use cross-entropy instead of classification error
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
Ground Truth y
1
0
0
0
1
0
0
0
1
-log
2.3 0.92 0.92
Example
49Why you should use cross-entropy instead of classification error
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
0.1
0.2
0.7
Confidences S(lk
)
0.3
0.4
0.3
0.3
0.3
0.4
Ground Truth y
1
0
0
0
1
0
0
0
1
L=4.14
-log
2.3 0.92 0.92
Assignment
50
a) Compute the accuracy of the predictions
b) Compute the cross-entropy loss of the prediction
c) Compare the result with the one obtained with the example and discuss how accuray and
cross-entropy have captured the quality of the predictions.
0.3
0.3
0.4
Ground Truth
0.1
0.7
0.2
0.1
0.2
0.7
1
0
0
0
1
0
0
0
1
Confidences S(lk
)
Why you should use cross-entropy instead of classification error
References
● About loss functions
● Neural networks and deep learning
● Are loss functions all the same?
● Convolutional neural networks for Visual Recognition
● Deep learning book, MIT Press, 2016
● On Loss Functions for Deep Neural Networks in
Classification
51
Thanks! Questions?
52
https://imatge.upc.edu/web/people/javier-ruiz-hidalgo

More Related Content

What's hot

Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptxfunnyworld18
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Edureka!
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuningArsalan Qadri
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkAtul Krishna
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 

What's hot (20)

Loss Function.pptx
Loss Function.pptxLoss Function.pptx
Loss Function.pptx
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Deep learning
Deep learningDeep learning
Deep learning
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuning
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 

Similar to Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018

Lcdf4 chap 03_p2
Lcdf4 chap 03_p2Lcdf4 chap 03_p2
Lcdf4 chap 03_p2ozgur_can
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdfdsfajkh
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxvipul6601
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Archi Modelling
Archi ModellingArchi Modelling
Archi Modellingdilane007
 
TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...Igalia
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-ryuz88
 
lossy compression JPEG
lossy compression JPEGlossy compression JPEG
lossy compression JPEGMahmoud Hikmet
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Logic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicLogic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicGouda Mando
 

Similar to Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018 (20)

Lcdf4 chap 03_p2
Lcdf4 chap 03_p2Lcdf4 chap 03_p2
Lcdf4 chap 03_p2
 
Eye deep
Eye deepEye deep
Eye deep
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
ICT FIRST LECTURE.pptx
ICT FIRST LECTURE.pptxICT FIRST LECTURE.pptx
ICT FIRST LECTURE.pptx
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
Archi Modelling
Archi ModellingArchi Modelling
Archi Modelling
 
DL (v2).pptx
DL (v2).pptxDL (v2).pptx
DL (v2).pptx
 
TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-LUT-Network Revision2 -English version-
LUT-Network Revision2 -English version-
 
Auto Tuning
Auto TuningAuto Tuning
Auto Tuning
 
Neural networks
Neural networksNeural networks
Neural networks
 
lossy compression JPEG
lossy compression JPEGlossy compression JPEG
lossy compression JPEG
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Logic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional LogicLogic Design - Chapter 5: Part1 Combinattional Logic
Logic Design - Chapter 5: Part1 Combinattional Logic
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Recently uploaded

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018

  • 1. [course site] Javier Ruiz Hidalgo javier.ruiz@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Loss functions Day 3 Lecture 1 #DLUPC
  • 2. Me ● Javier Ruiz Hidalgo ○ Email: j.ruiz@upc.edu ○ Office: UPC, Campus Nord, D5-008 ● Teaching experience ○ Basic signal processing ○ Project based ○ Image processing & computer vision ● Research experience ○ Master on hierarchical image representations by UEA (UK) ○ PhD on video coding by UPC (Spain) ○ Interests in image & video coding, 3D analysis and super-resolution 2
  • 3. Acknowledgements 3 Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Xavier Giró xavier.giro@upc.edu Associate Professor ETSETB TelecomBCN Universitat Politècnica de Catalunya
  • 4. Motivation 4 Model ● 3 layer neural network (2 hidden layers) ● Tanh units (activation function) ● 512-512-10 ● Softmax on top layer ● Cross entropy loss 28 28 Layer #Weights #Biases Total 1 784 x 512 512 401,920 2 512 x 512 512 262,656 3 512 x 10 10 5,130 669,706 ??
  • 5. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 5
  • 6. Nomenclature loss function cost function objective function error function 6 = = =
  • 7. Definition In a supervised deep learning context the loss function measures the quality of a particular set of parameters based on how well the output of the network agrees with the ground truth labels in the training data. 7
  • 8. Loss function (1) 8 How good does our network with the training data? labels (ground truth) input parameters (weights, biases)error Deep Network input output
  • 9. Loss function (2) ● The loss function does not want to measure the entire performance of the network against a validation/test dataset. 9
  • 10. Loss function (3) ● The loss function is used to guide the training process in order to find a set of parameters that reduce the value of the loss function. 10
  • 11. Training process Stochastic gradient descent ● Find a set of parameters which make the loss as small as possible. ● Change parameters at a rate determined by the partial derivatives of the loss function: 11
  • 12. Properties (1) ● Minimum (0 value) when the output of the network is equal to the ground truth data. ● Increase value when output differs from ground truth. 12 epoch Loss Similar to GTDifferent from GT
  • 13. Properties (2) 13 ● Ideally → convex function
  • 14. Properties (2) 14 ● In reality → some many parameters (in the order of millions) than it is not convex
  • 15. Properties (3) 15 ● Varies smoothly with changes on the output ○ Better gradients for gradient descent ○ Easy to compute small changes in the parameters to get an improvement in the loss
  • 16. Assumptions ● For backpropagation to work: ○ Loss function can be written as an average over loss functions for individual training examples: ○ Loss functions can be written as a function of the output activations from the neural network. 16Neural Networks and Deep Learning empirical risk
  • 17. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 17
  • 18. Common types of loss functions (1) ● Loss functions depend on the type of task: ○ Regression: the network predicts continuous, numeric variables ■ Example: Length of fishes in images, price of a house ■ Absolute value, square error 18
  • 19. 19 Loss functions for regression L1 Distance / Absolute Value L2 Distance / Euclidian Distance / Square Value Lighter computation (only |·|) Heavier computation (square)
  • 20. 20 Loss functions for regression L1 Distance / Absolute Value L2 Distance / Euclidian Distance / Square Value Less penalty to errors > 1 More penalty to errors < 1 More penalty to error > 1 Less penalty to errors < 1
  • 21. 21 Loss functions for regression L1 Distance / Absolute Value L2 Distance / Euclidian Distance / Square Value Less sensitive to outliers, which are very high yi -fθ (xi ) More sensitive to outliers, which are very high yi -fθ (xi )
  • 22. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 22
  • 23. Common types of loss functions (2) ● Loss functions depend on the type of task: ○ Classification: the network predicts categorical variables (fixed number of classes) ■ Example: classify email as spam, classify images of numbers. ■ hinge loss, Cross-entropy loss 23
  • 24. Classification (1) We want the network to classify the input into a fixed number of classes 24 class “1” class “2” class “3”
  • 25. Classification (2) ● Each input can have only one label ○ One prediction per output class ■ The network will have “k” outputs (number of classes) 25 0.1 2 1 class “1” class “2” class “3” input Network output scores / logits
  • 26. Classification (3) ● How can we create a loss function to improve the scores? ○ Somehow write the labels (ground truth of the data) into a vector → One-hot encoding ○ Non-probabilistic interpretation → hinge loss ○ Probabilistic interpretation: need to transform the scores into a probability function → Softmax 26
  • 27. One-hot encoding ● Transform each label into a vector (with only 1 and 0) ○ Length equal to the total number of classes “k” ○ Value of 1 for the correct class and 0 elsewhere 27 0 1 0 class “2” 1 0 0 class “1” 0 0 1 class “3”
  • 28. Softmax (1) ● Convert scores into confidences/probabilities ○ From 0.0 to 1.0 ○ Probability for all classes adds to 1.0 28 0.1 2 1 input Network output scores / logits 0.1 0.7 0.2 probability
  • 29. Softmax (2) ● Softmax function 29 0.1 2 1 input Network output scores / logits 0.1 0.7 0.2 probability scores (logits) Neural Networks and Deep Learning (softmax)
  • 30. 30
  • 31. Cross-entropy loss (1) 31 0.1 2 1 input Network output scores / logits lk 0.1 0.7 0.2 probability S(lk ) 0 1 0 labels yk loss function Loss contribution from input xi in a training batch
  • 32. Cross-entropy loss (2) 32 0.1 2 1 input Network output scores / logits lk 0.1 0.7 0.2 probability S(lk ) 0 1 0 labels yk loss function In a one-hot vector, only one yk =1 (rest are 0) !!
  • 33. Cross-entropy loss (2) 33 0.1 2 1 input Network output scores / logits lk 0.1 0.7 0.2 probability S(lk ) 0 1 0 labels yk loss function Only the confidence for the GT class is considered.
  • 34. Cross-entropy loss (3) 34 -log penalizes small confidence scores for GT class
  • 35. Cross-entropy loss (3) ● For a set of n inputs 35 labels (one-hot) Softmax
  • 36. Cross-entropy loss (4) ● In general, cross-entropy loss works better than square error loss: ○ Square error loss usually gives too much emphasis to incorrect outputs. ○ In square error loss, as the output gets closer to either 0.0 or 1.0 the gradients get smaller, the change in weights gets smaller and training is slower. 36
  • 37. Weighted cross-entropy 37 ● Dealing with class imbalance ○ Different number of samples for different classes ○ Introduce a weighing factor (related to inverse class frequency or treated as hyperparameter) class “ant” class “panda”class “flamingo”
  • 38. Multi-label classification (1) ● Outputs can be matched to more than one label ○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car. 38Bucak, Serhat Selcuk, Rong Jin, and Anil K. Jain. "Multi-label learning with incomplete class assignments." CVPR 2011.
  • 39. Multi-label classification (2) ● Outputs can be matched to more than one label ○ “car”, “automobile”, “motor vehicle” can be applied to a same image of a car. ● .. directly use sigmoid at each output independently instead of softmax 39
  • 40. Multi-label classification (3) ● Cross-entropy loss for multi-label classification: 40
  • 41. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 41
  • 42. Regularization ● Control the capacity of the network to prevent overfitting ○ L2-regularization (weight decay): ○ L1-regularization: 42 regularization parameter
  • 43. Outline ● Introduction ○ Definition, properties, training process ● Common types of loss functions ○ Regression ○ Classification ● Regularization ● Example 43
  • 44. cross-entropy loss = 1.63 cross-entropy loss = 4.14 Example 44Why you should use cross-entropy instead of classification error Classification accuracy = 2/3 = 0.66 Classification accuracy = 2/3 = 0.66 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 0.3 0.3 0.4 Confidences S(lk ) 0.1 0.7 0.2 0.1 0.2 0.7 Ground Truth 1 0 0 0 1 0 0 0 1 Predictions 0 0 1 0 1 0 0 0 1
  • 45. Example 45Why you should use cross-entropy instead of classification error a) Compute the accuracy of the predictions. 0.1 0.2 0.7 Confidences S(lk ) Predictions 0.3 0.4 0.3 0.3 0.3 0.4 0 0 1 0 1 0 0 0 1 The class with the maximum confidence is predicted. Ground Truth 1 0 0 0 1 0 0 0 1
  • 46. Example 46 a) Compute the accuracy of the predictions. 0.1 0.2 0.7 Confidences S(lk ) Predictions 0.3 0.4 0.3 0.3 0.3 0.4 0 0 1 0 1 0 0 0 1 Class predictions and ground truth can be compared with a confusion matrix. Ground Truth 1 0 0 0 1 0 0 0 1 Prediction Class 1 Class 2 Class 3 GT Class 1 0 0 1 Class 2 0 1 0 Class 3 0 0 1 Accuracy =2 / 3 = 0.66 Accuracy =2 / 3 = 0.66
  • 47. Example 47Why you should use cross-entropy instead of classification error Classification accuracy = 2/3 cross-entropy loss = 4.14 a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 Ground Truth y 1 0 0 0 1 0 0 0 1
  • 48. Example 48Why you should use cross-entropy instead of classification error a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 Ground Truth y 1 0 0 0 1 0 0 0 1 -log 2.3 0.92 0.92
  • 49. Example 49Why you should use cross-entropy instead of classification error a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction 0.1 0.2 0.7 Confidences S(lk ) 0.3 0.4 0.3 0.3 0.3 0.4 Ground Truth y 1 0 0 0 1 0 0 0 1 L=4.14 -log 2.3 0.92 0.92
  • 50. Assignment 50 a) Compute the accuracy of the predictions b) Compute the cross-entropy loss of the prediction c) Compare the result with the one obtained with the example and discuss how accuray and cross-entropy have captured the quality of the predictions. 0.3 0.3 0.4 Ground Truth 0.1 0.7 0.2 0.1 0.2 0.7 1 0 0 0 1 0 0 0 1 Confidences S(lk ) Why you should use cross-entropy instead of classification error
  • 51. References ● About loss functions ● Neural networks and deep learning ● Are loss functions all the same? ● Convolutional neural networks for Visual Recognition ● Deep learning book, MIT Press, 2016 ● On Loss Functions for Deep Neural Networks in Classification 51