Deep learning from a novice perspective

Anirban Santara
Anirban SantaraThird year student em Department of Electronics & Communication Engineering
Deep learning from a novice perspective and recent innovations from KGPians
Anirban Santara
Doctoral Research Fellow
Department of CSE, IIT Kharagpur
bit.do/AnirbanSantara
Deep Learning
Just a kind of
Machine Learning
Classification
Regression
Clustering
3 main tasks:
CLASSIFICATION
Pandas Dogs
Cats
Rather:
P(class| )?
REGRESSION
Independent variable (feature)
Dependent variable
(target attribute)
CLUSTERING
Attribute 1
Attribute 2
The methodology:
1. Design a hypothesis function: h(y|x,θ)
Target attribute Input Parameters of the
learning machine
2. Keep improving the hypothesis until
the prediction happens really good
Well, how bad is your hypothesis?
In case of regressions:
A very common measure is mean
squared error:
𝐸 =
𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠
|𝑦 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 − 𝑦 𝑎𝑠 𝑝𝑒𝑟 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠|
2
In classification problems: 1
0
0
1
In one-hot classification frameworks, we often use mean square error
However, often we ask for the probabilities of occurrence of the different classes for a
given input ( Pr(class|X) ). In that case we use K-L divergence between the observed
(p(output classes)) and predicted (q(output classes)) distributions as the measure of
error. This is sometimes referred to as the cross entropy error criterion.
𝐾𝐿(𝑃| 𝑄 =
𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠,𝑖
𝑝𝑖 𝑙𝑜𝑔
𝑝𝑖
𝑞𝑖
Clustering uses a plethora of criteria
like:
• Entropy of a cluster
• Maximum distance between 2
neighbors in a cluster
--and a lot more
Now its time to rectify the machine and improve
$100,000
$50,000
Learning
We perform “gradient descent” along the “error-plane” in
the “parameter space”:
∆𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = −learning_rate ∗ 𝛻𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑒𝑟𝑟𝑜𝑟_𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ← 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 + ∆𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
Lets now look into a practical learning system:
Artificial Neural Network
Cat
Dog
Panda
- A very small unit of computation
So the parameters of an ANN are:
1. Incoming weights of every neuron
2. Bias of every neuron
These are the ones that need to be tuned
during learning
We perform gradient descent on these
parameters
Backpropagation algorithm is a popular
method of computing
𝛁 𝒘𝒆𝒊𝒈𝒉𝒕𝒔 𝒂𝒏𝒅 𝒃𝒊𝒂𝒔𝒆𝒔 𝑬𝒓𝒓𝒐𝒓 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏
Backpropagation algorithm
Input
pattern
vector
W21 W32
Forward propagate:
Error calculation:
Backward propagation:
If k  output layer
If k  hidden layer
Well after all, life is tough…
• The parameters of a neural network are generally initialized to random values.
• Starting from these random values (with useless information)
it is very difficult (well not impossible, in fact time consuming)
for backpropagation to arrive at the correct values of these
parameters.
• Exponential activation functions like sigmoid and hyperbolic-
tangent are traditionally used in artificial neurons. These
functions have gradients that are prone to become zero
in course of backpropagation.
• If the gradients in a layer get close to zero, they induce the
gradients in the previous layers to vanish too. As a result the
weights and biases in the lower layers remain immature.
• This phenomenon is called “vanishing gradient” problem in the literature.
These problems crop up very frequently in neural networks that contain a
large number of hidden layers and way too many parameters
(the so called Deep Neural Networks).
How to get around? Ans: Make “informed” initialization
• A signal is nothing but a set of random variables.
• These random variables jointly take values from a probability distribution that is dependent on the nature of the
source of the signal.
E.g.: A blank 28x28 pixel array like can house numerous kinds of images. The set of 784 random variables assume
values from a different joint probability distribution for every class of objects/scenes.
~𝑃𝑑𝑖𝑔𝑖𝑡(𝑥1, 𝑥2, … , 𝑥784)
~𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑥1, 𝑥2, … , 𝑥784)
Lets try and model the probability distribution of interest
Our target distribution: 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑥1, 𝑥2, … , 𝑥784)
We try to capture this
distribution in a model
that looks quite similar to
a single layer neural
network
The Restricted Boltzmann Machine: It’s a probabilistic graphical model (a special kind of Markov Random Field) that is
capable of modelling a wide variety of probability distributions.
Capture the dependencies among the “visible”
variables
The working of RBM
Parameters of the RBM:
1. Weights on the edges 𝑤𝑖,𝑗
2. Biases on each node 𝑏𝑖
′
s and 𝑐𝑗
′
𝑠
Using these we define a joint probability distribution over the
“visible” variables 𝑣𝑗
′
𝑠 and the “hidden” variables ℎ𝑖
′
𝑠 as:
Where the energy function is defined as:
And Z is a normalization term called the “Partition function”
𝑃𝑅𝐵𝑀 𝒗, 𝒉 =
1
𝑍
𝑒 𝐸(𝒗,𝒉)
𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑣1, 𝑣2, … , 𝑣784)
𝒉
𝑃𝑅𝐵𝑀 𝒗, 𝒉
𝑃𝑅𝐵𝑀 𝑣1, 𝑣2, … , 𝑣784
𝐾𝐿(𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒| 𝑃𝑅𝐵𝑀 =
𝑣1,𝑣2,…,𝑣784
𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒 𝑣1, … , 𝑣784 𝑙𝑛
𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑣1, … , 𝑣784)
𝑃𝑅𝐵𝑀 𝑣1, … , 𝑣784
= −𝐻 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒 −
𝑣1,𝑣2,…,𝑣784
𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒 𝑣1, … , 𝑣784 𝑙𝑛𝑃𝑅𝐵𝑀 𝑣1, … , 𝑣784
Empirical average of the log-likelihood of data under the model distribution
Not under our control
MAXIMIZE
Layer-wise pre-training using RBM
• Every hidden layer is pre-trained
as the hidden layer of a RBM
As RBM models the statistics of
the input, the weights and
biases carry meaningful
information about the input.
Use of these as initial values of
the parameters of a deep neural
network has shown phenomenal
improvement over random
initialization both in terms of
time complexity and
performance.
• This is followed by fine-tuning
over the entire network via
back-propagation
• Autoencoder is a neural network operating in unsupervised
learning mode
• The output and the input are set equal to each other
• Learns an identity mapping from the input to the output
• Applications:
• Dimensionality reduction (Efficient, non-linear)
• Representation learning (discovering interesting structures)
• Alternative to RBM for layer-wise pre-training of DNN.
The Autoencoder
A deep stacked autoencoder
So deep learning ≈ training “deep” neural
networks with many hidden layers
Step 1: Unsupervised layer-wise pre-training
Step 2: Supervised fine-tuning
- This is pretty much all about how deep learning works. However
there is a class of deep networks called convolutional neural
networks that often do not need pre-training because these
networks use extensive parameter sharing and use rectified linear
activation functions.
Well, deep learning when viewed from a different
perspective looks really amazing!!!
Traditional machine learning v.s. deep learning
Data
Hand-engineering of feature
extractors
Data–driven target-oriented representation learning
Data
representations by
feature extractors
• Classification
• Regression
• Clustering
• Efficient
coding
Inference
engine
What’s so special about it?
Traditional machine learning Deep learning
• Designing feature detectors requires careful engineering and
considerable domain expertise
• Representations must be selective to aspects of data that are
important for our task and invariant to the irrelevant aspects
(selectivity-invariance dilemma)
• Abstractions of hierarchically increasing complexity are learnt by
a data driven approach using general purpose learning
procedures
• A composition of simple non-linear modules can learn very
complex functions
• Cost functions specific to the problem amplify aspects of the
input that are important for the task and suppress irrelevant
variations
Pretty much how we humans go about analyzing…
Some deep architectures:-
Deep stacked autoencoder
Deep convolutional neural
network
Recurrent neural network
Used for efficient non-linear dimensionality reduction and
discovering salient underlying structures in data
Exploits stationarity of
natural data and uses the
concept of parameter
sharing to study large
images, long spoken/
written strings to make
inferences from them
Custom made for modelling dynamic systems
and find use in natural language (speech and
text) processing, machine translation, etc.
Classical automatic speech recognition system
Viterbi
beam
search /
A*
decoding
N-best
sentences or
word lattice
Rescoring
FINAL
UTTERRENCE
Acoustic model generation
Sentence model preparation
Phonetic
utterance models
Sentence model
Signal
acquisition
Feature extraction
Acoustic modelling
Some of our works:-
2015:
Deep neural network and Random Forest hybrid architecture for
learning to detect retinal vessels in fundus images (accepted at
EMBC-2015, Milan, Italy)
Our architecture:
Average accuracy of detection: 93.27%
2014-15:
Faster learning of deep stacked autoencoders on multi-core
systems through synchronized layer-wise pre-training (accepted at
PDCKDD Workshop, a part of ECML-PKDD 2015, Porto, Portugal)
Conventional serial pre-training:
Proposed algorithm:
26% speedup for compression of MNIST handwritten digits
Take-home messages
• Deep learning is a set of algorithms that have been designed to
1. Train neural networks with a large number of hidden layers.
2. Learn features of hierarchically increasing complexity in a data and objective – driven method.
• Deep neural networks are breaking all world records in AI because it can be proved that they have the capacity of
modelling highly non-linear functions of the data with fewer parameters than shallow networks.
• Deep learning is extremely interesting and a breeze to implement once the underlying philosophies are understood. It
has great potential of being used in a lot of ongoing projects at KGP.
If you are interested to go deep into deep learning…
Take Andrew Ng’s
Machine Learning
course on Coursera
Visit
ufldl.Stanford.edu
and read the entire
tutorial
Read LeCun’s latest
deep learning
review published in
Nature
Thank you so much 
Please give me some feedback for this talk by visiting:
bit.do/RateAnirban
Or just scan the QR code 
1 de 25

Recomendados

Introduction to Deep Learning por
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningAdam Rogers
809 visualizações31 slides
mohsin dalvi artificial neural networks presentation por
mohsin dalvi   artificial neural networks presentationmohsin dalvi   artificial neural networks presentation
mohsin dalvi artificial neural networks presentationAkash Maurya
827 visualizações46 slides
Deep neural networks por
Deep neural networksDeep neural networks
Deep neural networksSi Haem
162.4K visualizações36 slides
Intro to Neural Networks por
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural NetworksDean Wyatte
1.6K visualizações25 slides
Introduction to Neural networks (under graduate course) Lecture 3 of 9 por
Introduction to Neural networks (under graduate course) Lecture 3 of 9Introduction to Neural networks (under graduate course) Lecture 3 of 9
Introduction to Neural networks (under graduate course) Lecture 3 of 9Randa Elanwar
1.8K visualizações21 slides
Neural Networks por
Neural NetworksNeural Networks
Neural NetworksIsmail El Gayar
4.9K visualizações39 slides

Mais conteúdo relacionado

Mais procurados

Project presentation por
Project presentationProject presentation
Project presentationMadhv Kushawah
1.7K visualizações14 slides
mohsin dalvi artificial neural networks questions por
mohsin dalvi   artificial neural networks questionsmohsin dalvi   artificial neural networks questions
mohsin dalvi artificial neural networks questionsAkash Maurya
928 visualizações1 slide
Neural network & its applications por
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
195.3K visualizações50 slides
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning... por
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Simplilearn
592 visualizações93 slides
Artificial neural networks and its applications por
Artificial neural networks and its applications Artificial neural networks and its applications
Artificial neural networks and its applications PoojaKoshti2
4.8K visualizações20 slides
Synthetic dialogue generation with Deep Learning por
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
778 visualizações44 slides

Mais procurados(20)

Project presentation por Madhv Kushawah
Project presentationProject presentation
Project presentation
Madhv Kushawah1.7K visualizações
mohsin dalvi artificial neural networks questions por Akash Maurya
mohsin dalvi   artificial neural networks questionsmohsin dalvi   artificial neural networks questions
mohsin dalvi artificial neural networks questions
Akash Maurya928 visualizações
Neural network & its applications por Ahmed_hashmi
Neural network & its applications Neural network & its applications
Neural network & its applications
Ahmed_hashmi195.3K visualizações
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning... por Simplilearn
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Simplilearn592 visualizações
Artificial neural networks and its applications por PoojaKoshti2
Artificial neural networks and its applications Artificial neural networks and its applications
Artificial neural networks and its applications
PoojaKoshti24.8K visualizações
Synthetic dialogue generation with Deep Learning por S N
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
S N778 visualizações
Introduction to Neural Networks por Databricks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks21.6K visualizações
Artifical Neural Network por mahalakshmimalini
Artifical Neural NetworkArtifical Neural Network
Artifical Neural Network
mahalakshmimalini744 visualizações
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ... por Simplilearn
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn873 visualizações
what is neural network....??? por Adii Shah
what is neural network....???what is neural network....???
what is neural network....???
Adii Shah2.3K visualizações
Artificial neural networks por ernj
Artificial neural networksArtificial neural networks
Artificial neural networks
ernj1.6K visualizações
Unit+i por Chetan Dhembre
Unit+iUnit+i
Unit+i
Chetan Dhembre1.3K visualizações
Artificial Neural Networks: Pointers por Fariz Darari
Artificial Neural Networks: PointersArtificial Neural Networks: Pointers
Artificial Neural Networks: Pointers
Fariz Darari1.2K visualizações
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural... por Simplilearn
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Simplilearn1.8K visualizações
Neural network por Facebook
Neural networkNeural network
Neural network
Facebook194 visualizações
Perceptron & Neural Networks por NAGUR SHAREEF SHAIK
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK3.6K visualizações
Artificial Neural Network por Knoldus Inc.
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Knoldus Inc.5.4K visualizações
Neural network in matlab por Fahim Khan
Neural network in matlab Neural network in matlab
Neural network in matlab
Fahim Khan7.3K visualizações
Artificial Neural Network.pptx por ASHUTOSHMISHRA720383
Artificial Neural Network.pptxArtificial Neural Network.pptx
Artificial Neural Network.pptx
ASHUTOSHMISHRA720383282 visualizações

Similar a Deep learning from a novice perspective

Introduction to deep learning por
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
1.4K visualizações61 slides
Deep Learning Sample Class (Jon Lederman) por
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Jon Lederman
214 visualizações57 slides
Fundamental of deep learning por
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
1.3K visualizações62 slides
ML Module 3 Non Linear Learning.pptx por
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
4 visualizações147 slides
Deep Learning: Application & Opportunity por
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
1.6K visualizações93 slides
Recurrent Neural Networks, LSTM and GRU por
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
29.8K visualizações38 slides

Similar a Deep learning from a novice perspective(20)

Introduction to deep learning por Junaid Bhat
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat1.4K visualizações
Deep Learning Sample Class (Jon Lederman) por Jon Lederman
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
Jon Lederman214 visualizações
Fundamental of deep learning por Stanley Wang
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
Stanley Wang1.3K visualizações
ML Module 3 Non Linear Learning.pptx por DebabrataPain1
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain14 visualizações
Deep Learning: Application & Opportunity por iTrain
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
iTrain1.6K visualizações
Recurrent Neural Networks, LSTM and GRU por ananth
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth29.8K visualizações
DEF CON 24 - Clarence Chio - machine duping 101 por Felipe Prado
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
Felipe Prado70 visualizações
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies por Value Amplify Consulting
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
Value Amplify Consulting107 visualizações
deeplearning por huda2018
deeplearningdeeplearning
deeplearning
huda2018109 visualizações
Deep Learning por Pierre de Lacaze
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze1.3K visualizações
Visualization of Deep Learning por YaminiAlapati1
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
YaminiAlapati1158 visualizações
Deep learning por Ratnakar Pandey
Deep learningDeep learning
Deep learning
Ratnakar Pandey11.4K visualizações
Deep Learning por MoctardOLOULADE
Deep LearningDeep Learning
Deep Learning
MoctardOLOULADE155 visualizações
Deep learning - a primer por Uwe Friedrichsen
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Uwe Friedrichsen2.8K visualizações
Deep learning - a primer por Shirin Elsinghorst
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Shirin Elsinghorst4.7K visualizações
Neural Networks in Data Mining - “An Overview” por Dr.(Mrs).Gethsiyal Augasta
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta282 visualizações
Machine Duping 101: Pwning Deep Learning Systems por Clarence Chio
Machine Duping 101: Pwning Deep Learning SystemsMachine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning Systems
Clarence Chio1.9K visualizações
Automatic Attendace using convolutional neural network Face Recognition por vatsal199567
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567532 visualizações
Convolutional Neural Networks (CNN) por Gaurav Mittal
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal58.5K visualizações
Introduction to Neural networks (under graduate course) Lecture 9 of 9 por Randa Elanwar
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar1.6K visualizações

Último

TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors por
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
15 visualizações15 slides
DALI Basics Course 2023 por
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023Ivory Egg
14 visualizações12 slides
PharoJS - Zürich Smalltalk Group Meetup November 2023 por
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023Noury Bouraqadi
120 visualizações17 slides
RADIUS-Omnichannel Interaction System por
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction SystemRADIUS
15 visualizações21 slides
[2023] Putting the R! in R&D.pdf por
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdfEleanor McHugh
38 visualizações127 slides
Roadmap to Become Experts.pptx por
Roadmap to Become Experts.pptxRoadmap to Become Experts.pptx
Roadmap to Become Experts.pptxdscwidyatamanew
11 visualizações45 slides

Último(20)

TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors por sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab15 visualizações
DALI Basics Course 2023 por Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg14 visualizações
PharoJS - Zürich Smalltalk Group Meetup November 2023 por Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi120 visualizações
RADIUS-Omnichannel Interaction System por RADIUS
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction System
RADIUS15 visualizações
[2023] Putting the R! in R&D.pdf por Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 visualizações
Roadmap to Become Experts.pptx por dscwidyatamanew
Roadmap to Become Experts.pptxRoadmap to Become Experts.pptx
Roadmap to Become Experts.pptx
dscwidyatamanew11 visualizações
Report 2030 Digital Decade por Massimo Talia
Report 2030 Digital DecadeReport 2030 Digital Decade
Report 2030 Digital Decade
Massimo Talia14 visualizações
The details of description: Techniques, tips, and tangents on alternative tex... por BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada121 visualizações
Top 10 Strategic Technologies in 2024: AI and Automation por AutomationEdge Technologies
Top 10 Strategic Technologies in 2024: AI and AutomationTop 10 Strategic Technologies in 2024: AI and Automation
Top 10 Strategic Technologies in 2024: AI and Automation
AutomationEdge Technologies14 visualizações
Special_edition_innovator_2023.pdf por WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2216 visualizações
Five Things You SHOULD Know About Postman por Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman27 visualizações
Future of Learning - Khoong Chan Meng por NUS-ISS
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
NUS-ISS33 visualizações
Voice Logger - Telephony Integration Solution at Aegis por Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma17 visualizações
Business Analyst Series 2023 - Week 3 Session 5 por DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10209 visualizações
The Importance of Cybersecurity for Digital Transformation por NUS-ISS
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
NUS-ISS27 visualizações
SAP Automation Using Bar Code and FIORI.pdf por Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Virendra Rai, PMP19 visualizações
AMAZON PRODUCT RESEARCH.pdf por JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta15 visualizações
Perth MeetUp November 2023 por Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price15 visualizações
How the World's Leading Independent Automotive Distributor is Reinventing Its... por NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 visualizações
Uni Systems for Power Platform.pptx por Uni Systems S.M.S.A.
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptx
Uni Systems S.M.S.A.50 visualizações

Deep learning from a novice perspective

  • 1. Deep learning from a novice perspective and recent innovations from KGPians Anirban Santara Doctoral Research Fellow Department of CSE, IIT Kharagpur bit.do/AnirbanSantara
  • 2. Deep Learning Just a kind of Machine Learning Classification Regression Clustering 3 main tasks:
  • 6. The methodology: 1. Design a hypothesis function: h(y|x,θ) Target attribute Input Parameters of the learning machine 2. Keep improving the hypothesis until the prediction happens really good
  • 7. Well, how bad is your hypothesis? In case of regressions: A very common measure is mean squared error: 𝐸 = 𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 |𝑦 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 − 𝑦 𝑎𝑠 𝑝𝑒𝑟 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠| 2 In classification problems: 1 0 0 1 In one-hot classification frameworks, we often use mean square error However, often we ask for the probabilities of occurrence of the different classes for a given input ( Pr(class|X) ). In that case we use K-L divergence between the observed (p(output classes)) and predicted (q(output classes)) distributions as the measure of error. This is sometimes referred to as the cross entropy error criterion. 𝐾𝐿(𝑃| 𝑄 = 𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠,𝑖 𝑝𝑖 𝑙𝑜𝑔 𝑝𝑖 𝑞𝑖 Clustering uses a plethora of criteria like: • Entropy of a cluster • Maximum distance between 2 neighbors in a cluster --and a lot more
  • 8. Now its time to rectify the machine and improve $100,000 $50,000 Learning We perform “gradient descent” along the “error-plane” in the “parameter space”: ∆𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 = −learning_rate ∗ 𝛻𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑒𝑟𝑟𝑜𝑟_𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 ← 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 + ∆𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
  • 9. Lets now look into a practical learning system: Artificial Neural Network Cat Dog Panda - A very small unit of computation So the parameters of an ANN are: 1. Incoming weights of every neuron 2. Bias of every neuron These are the ones that need to be tuned during learning We perform gradient descent on these parameters Backpropagation algorithm is a popular method of computing 𝛁 𝒘𝒆𝒊𝒈𝒉𝒕𝒔 𝒂𝒏𝒅 𝒃𝒊𝒂𝒔𝒆𝒔 𝑬𝒓𝒓𝒐𝒓 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏
  • 10. Backpropagation algorithm Input pattern vector W21 W32 Forward propagate: Error calculation: Backward propagation: If k  output layer If k  hidden layer
  • 11. Well after all, life is tough… • The parameters of a neural network are generally initialized to random values. • Starting from these random values (with useless information) it is very difficult (well not impossible, in fact time consuming) for backpropagation to arrive at the correct values of these parameters. • Exponential activation functions like sigmoid and hyperbolic- tangent are traditionally used in artificial neurons. These functions have gradients that are prone to become zero in course of backpropagation. • If the gradients in a layer get close to zero, they induce the gradients in the previous layers to vanish too. As a result the weights and biases in the lower layers remain immature. • This phenomenon is called “vanishing gradient” problem in the literature. These problems crop up very frequently in neural networks that contain a large number of hidden layers and way too many parameters (the so called Deep Neural Networks).
  • 12. How to get around? Ans: Make “informed” initialization • A signal is nothing but a set of random variables. • These random variables jointly take values from a probability distribution that is dependent on the nature of the source of the signal. E.g.: A blank 28x28 pixel array like can house numerous kinds of images. The set of 784 random variables assume values from a different joint probability distribution for every class of objects/scenes. ~𝑃𝑑𝑖𝑔𝑖𝑡(𝑥1, 𝑥2, … , 𝑥784) ~𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑥1, 𝑥2, … , 𝑥784)
  • 13. Lets try and model the probability distribution of interest Our target distribution: 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑥1, 𝑥2, … , 𝑥784) We try to capture this distribution in a model that looks quite similar to a single layer neural network The Restricted Boltzmann Machine: It’s a probabilistic graphical model (a special kind of Markov Random Field) that is capable of modelling a wide variety of probability distributions. Capture the dependencies among the “visible” variables
  • 14. The working of RBM Parameters of the RBM: 1. Weights on the edges 𝑤𝑖,𝑗 2. Biases on each node 𝑏𝑖 ′ s and 𝑐𝑗 ′ 𝑠 Using these we define a joint probability distribution over the “visible” variables 𝑣𝑗 ′ 𝑠 and the “hidden” variables ℎ𝑖 ′ 𝑠 as: Where the energy function is defined as: And Z is a normalization term called the “Partition function” 𝑃𝑅𝐵𝑀 𝒗, 𝒉 = 1 𝑍 𝑒 𝐸(𝒗,𝒉) 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑣1, 𝑣2, … , 𝑣784) 𝒉 𝑃𝑅𝐵𝑀 𝒗, 𝒉 𝑃𝑅𝐵𝑀 𝑣1, 𝑣2, … , 𝑣784 𝐾𝐿(𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒| 𝑃𝑅𝐵𝑀 = 𝑣1,𝑣2,…,𝑣784 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒 𝑣1, … , 𝑣784 𝑙𝑛 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒(𝑣1, … , 𝑣784) 𝑃𝑅𝐵𝑀 𝑣1, … , 𝑣784 = −𝐻 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒 − 𝑣1,𝑣2,…,𝑣784 𝑃ℎ𝑢𝑚𝑎𝑛 𝑓𝑎𝑐𝑒 𝑣1, … , 𝑣784 𝑙𝑛𝑃𝑅𝐵𝑀 𝑣1, … , 𝑣784 Empirical average of the log-likelihood of data under the model distribution Not under our control MAXIMIZE
  • 15. Layer-wise pre-training using RBM • Every hidden layer is pre-trained as the hidden layer of a RBM As RBM models the statistics of the input, the weights and biases carry meaningful information about the input. Use of these as initial values of the parameters of a deep neural network has shown phenomenal improvement over random initialization both in terms of time complexity and performance. • This is followed by fine-tuning over the entire network via back-propagation
  • 16. • Autoencoder is a neural network operating in unsupervised learning mode • The output and the input are set equal to each other • Learns an identity mapping from the input to the output • Applications: • Dimensionality reduction (Efficient, non-linear) • Representation learning (discovering interesting structures) • Alternative to RBM for layer-wise pre-training of DNN. The Autoencoder A deep stacked autoencoder
  • 17. So deep learning ≈ training “deep” neural networks with many hidden layers Step 1: Unsupervised layer-wise pre-training Step 2: Supervised fine-tuning - This is pretty much all about how deep learning works. However there is a class of deep networks called convolutional neural networks that often do not need pre-training because these networks use extensive parameter sharing and use rectified linear activation functions. Well, deep learning when viewed from a different perspective looks really amazing!!!
  • 18. Traditional machine learning v.s. deep learning Data Hand-engineering of feature extractors Data–driven target-oriented representation learning Data representations by feature extractors • Classification • Regression • Clustering • Efficient coding Inference engine
  • 19. What’s so special about it? Traditional machine learning Deep learning • Designing feature detectors requires careful engineering and considerable domain expertise • Representations must be selective to aspects of data that are important for our task and invariant to the irrelevant aspects (selectivity-invariance dilemma) • Abstractions of hierarchically increasing complexity are learnt by a data driven approach using general purpose learning procedures • A composition of simple non-linear modules can learn very complex functions • Cost functions specific to the problem amplify aspects of the input that are important for the task and suppress irrelevant variations
  • 20. Pretty much how we humans go about analyzing…
  • 21. Some deep architectures:- Deep stacked autoencoder Deep convolutional neural network Recurrent neural network Used for efficient non-linear dimensionality reduction and discovering salient underlying structures in data Exploits stationarity of natural data and uses the concept of parameter sharing to study large images, long spoken/ written strings to make inferences from them Custom made for modelling dynamic systems and find use in natural language (speech and text) processing, machine translation, etc.
  • 22. Classical automatic speech recognition system Viterbi beam search / A* decoding N-best sentences or word lattice Rescoring FINAL UTTERRENCE Acoustic model generation Sentence model preparation Phonetic utterance models Sentence model Signal acquisition Feature extraction Acoustic modelling
  • 23. Some of our works:- 2015: Deep neural network and Random Forest hybrid architecture for learning to detect retinal vessels in fundus images (accepted at EMBC-2015, Milan, Italy) Our architecture: Average accuracy of detection: 93.27% 2014-15: Faster learning of deep stacked autoencoders on multi-core systems through synchronized layer-wise pre-training (accepted at PDCKDD Workshop, a part of ECML-PKDD 2015, Porto, Portugal) Conventional serial pre-training: Proposed algorithm: 26% speedup for compression of MNIST handwritten digits
  • 24. Take-home messages • Deep learning is a set of algorithms that have been designed to 1. Train neural networks with a large number of hidden layers. 2. Learn features of hierarchically increasing complexity in a data and objective – driven method. • Deep neural networks are breaking all world records in AI because it can be proved that they have the capacity of modelling highly non-linear functions of the data with fewer parameters than shallow networks. • Deep learning is extremely interesting and a breeze to implement once the underlying philosophies are understood. It has great potential of being used in a lot of ongoing projects at KGP. If you are interested to go deep into deep learning… Take Andrew Ng’s Machine Learning course on Coursera Visit ufldl.Stanford.edu and read the entire tutorial Read LeCun’s latest deep learning review published in Nature
  • 25. Thank you so much  Please give me some feedback for this talk by visiting: bit.do/RateAnirban Or just scan the QR code 