AlexNet

Bertil Hatt
Bertil HattSenior Data scientist em Independent
AlexNet:
Context, Summary & Impact
Discussion by B. Hatt on
ImageNet Classification with Deep
Convolutional Neural Network, NIPS 2012
by A. Krizhevsky, I. Sutskever & G. Hinton
Outline
• Importance of AlexNet
• Scientific Context
• Neural nets in 2012
• Convolutional nets
• KSH ’12 findings
• Limits
• Critics & costs
• Further works
• Industrial impact
• This presentation should last
about 50 min. without questions
• Feel free to interrupt at anytime
for questions, misunderstanding
Deep convolutional
neural networks
The most influential industrial discovery in a decade
The most influential paper on data science
• 20,000 citations, more than any cited by or citing this paper
• Taught to all aspiring data scientists, at university & on-line
• Fastest growing academic requirement for new positions
• Applications are rather narrow: image to tags
• Recognising faces for photos in social context
• Flagging content for further attention, search
• Ideas could be applied to other computationally intensive tasks
• Solutions are too complex to be explained to a human
Dramatic
performance gain by pro
• Well below 25%
• No team did worst
than 25% since
• Proved that Deep
Convolutional
networks were the
way to go for that
problem class
• As long as you train
your model on
multiple GPUs
Important papers cited by or citing KSH ’12
Key paper cited by KSH or influential work
• Back propagation applied to handwritten zip code
recognition. NEURAL COMPUTING 1989, LeCun & al.
• The MNIST database of handwritten digits, 1998
LeCun & al.
• Gradient-based learning Applied to document
recognition. IEEE 1998, LeCun & al.
• Learning to parse images.
NIPS 2000, Hinton, Ghahramani & The
• Learning methods for generic object recognition
with invariance to pose and lighting. CVPR 2004,
LeCun & al.
• ImageNet: A Large Scale Hierarchical Image
Database. 2012, Deng & al.
Major paper citing KSH directly or not
Deeper
• Going Deeper with Convolution, 2015, Szegedy & al.
[Google & Magic Leap]
• Very Deep convolutional networks for large scale
image recognition, ICLR 2015, Simonyan & Zisserman
[Oxford]
Other architecture
• Generative Adversarial Networks, 2014
Goodfellow & al.
• Dynamic Routing Between Capsules, NIPS 2017,
Sabour, Frosst & Hinton
A more engineering than academic problem
• Reproduction is difficult without
unpublished code, computing
and engineering resources
• Few large datasets of tagged
images to test and train, bias in
use cases
• Conference presentation more
publicized that papers’ claims
• Heavily sponsored conferences
• Targeted at applicant hiring
• Scalable computing framework
and resources
• Theano, TensorFlow, Keras, etc.
• Amazon Web Services,
Google Cloud Platform
What this paper is
Image to classification
• Parallel architecture to scale model
• Set of implementation tips
• Imperfect solution to scale,
reflection, colour & illumination,
rotation (2D), point of view (3D)
Other image processing
• Leverage hierarchy in training set
• Locate the interesting parts
• Boolean if object is in at all
• Edge detection, Separate shapes
• Imagine what is hidden behind
• 2D to 3D representation
• Duplicate elements, counting
• Video processing, Still selection
What it is not about
Neural networks in 2012
The promises of back-propagation
Hubel & Wiesel 1962:
Mammalian visual cortex
A network of layers, activity flowing forward
• Very large input set (images)
• into a small outcome (classifier)
• Each neurons has an active value
• Neuron in each layer relies on
lower layers for its activation
• Algebraic sum of previous layer
• Filtered by an activation function
a0
(1) = σ(w0, 0
(1).a0
(0) + w1, 0
1.a1
(0) + … )
• Iteratively, until last layer
• Historically σ  [0, 1]
Now, σ commonly max(0, x)
a0
(1)
w0, 0
(1)a0
(0)
a4
(3)
Back propagation 1: how to feed a perceptron
a0
(1) = σ(w0, 0
(1).a0
(0) + w1, 0
1.a1
(0) + … + wn, 0
1.an
(0) + b0
(0))
a1
(1) = σ(w0, 1
(1).a0
(0) + w1, 1
1.a1
(0) + … + wn, 1
1.an
(0) + b1
(0))
a0
(2) = σ(w0, 0
(2).a0
(1) + w1, 0
(2). a1
(1) + … + wn, 0
(2).an
(1) + b0
(1))
= σ( w0, 0
(2). σ(w0, 0
(1).a0
(0) + … + b0
(0) )
+ w1, 0
(2). σ(w0, 1
(1).a0
(0) + … + b1
(0) )
+ … + b0
(1) )
…
an
1 = σ(w0, 0
(L). σ(w0, 0
(L-1). σ(w0, 0
(L-2). σ(…) ) ) ) + … + bn
(L))
= σ o W·σo …(L-1) times o W·σ (…)
+ σ o W·σo …(L-2) times o W·σ (…) + …
Back propagation 2: how to learn step by step
• Initialisation
• Training loop
• First feed-forward
• Algebra input + Activation
• Calculate Errors
• Back-propagate
• Deeply combined derivatives
• Learn new weight & bias
• Next loop
• Until cost slows to a stop
Cost := | Y - SoftMax(an
1,…,an
M) |
∂ Cost / ∂ wi, j
k = ∑ σ’(…)·W·σ o …L-1 times o σ (…)
This chain rule defines a gradient
along ∑layer (Nl.Nl-1 + Nl) dimensions
The perceptron: Universal approximator
• Theorem: If σ is continuous & non-linear,
any function can be approximated as closely as you like
by a perceptron, given enough layers & neurons
• Exploding & vanishing gradient:
• weights just below or above 1, their cumulated impact resp.
disappears or increases dramatically on deep networks.
• Exploding complexity of the gradient:
• ∂ σ o …L times o σ (w) / ∂ w= ∑ σ’· σ o …L-1 times o σ (w)
is unwieldly for continuous finite functions: sigmoid, tanh, logit
• ReLU: x  max(0, x) easier to derivate iteratively
Convolutional networks
How complex structures can process images
Hubel & Wiesel 1962:
Mammalian visual cortex
Architecture
• Image re-sizing, color to grey-scale
• Three specific processes to reduce dimensions
• Convolution: recognize small patterns
• Rectification/Activation: ignore irrelevant
• Pooling: feature, not exact position
• Final layers
• Flatten (re-shape)
• Fully interconnected (much smaller dimension than image definition)
• Normalisation, typically SoftMax
Matrix filters aka Convolutions
• Local concerns
• Hierarchical structure
• Each cell of the filter is a parameters
• One layer typically share a set of filters
• Kernel = Filter = Weight = Feature matrix =
Feature map = Activation map =
Parameters to be trained
Span
1  9 weights
2  25 weights
Padding
Padding ≤ Span
Whole image processed locally,
with some possible overlap
Step
N(0) x N (0)  N(1) x N(1) = N(0)/step x N (0)/step
Pooling, typically using MaxPool
Stride
N(1) x N (1)  N(2) x N(2) = N(1)/stride x N (1)/ stride
Architecture
• Image re-sizing, color to grey-scale
• Three specific processes to reduce dimensions
• Convolution: recognize small patterns
• Rectification/Activation: ignore irrelevant
• Pooling: feature, not exact position
• Final layers
• Flatten (re-shape)
• Fully interconnected (much smaller dimension than image definition)
• Normalisation, typically SoftMax
Span, Step & Padding
Sigmoid, tanh or ReLU
Stride, Pooling function
The challenge of increasing layers
• We can go from one large, colourful image to far fewer dimensions
• But that requires many layers, and
the network becomes complex to train
• Larger networks have better results
• Largest networks hit computation limits with passable results until 2012
Progress so far
• Importance of AlexNet >>
• Scientific Context
• Neural nets in 2012 >>
• Convolutional nets >>
• KSH ’12 findings >>
• Limits
• Critics & costs >>
• Further works >>
• Industrial impact >>
A. Krizhevsky, I. Sutskever
& G. Hinton at NIPS 2012
The AlexNet paper itself: findings, insights
Abstract
• We trained a large, deep convolutional neural network to classify the 1.2
million high-resolution images in the ImageNet LSVRC-2010 contest into
the 1000 different classes.
• On the test data, we achieved top-1 and top-5 error rates of 37.5% and
17.0% which is considerably better than the previous state-of-the-art.
• The neural network, which has 60 million parameters and 650,000
neurons, consists of five convolutional layers, some of which are followed
by max-pooling layers, and three fully-connected layers with a final 1000-
way softmax.
• To make training faster, we used non-saturating neurons and a very
efficient GPU implementation of the convolution operation.
• To reduce overfitting in the fully-connected layers we employed a
recently-developed regularization method called “dropout” that proved
to be very effective.
• We also entered a variant of this model in the ILSVRC-2012 competition
and achieved a winning top-5 test error rate of 15.3%, compared to 26.2%
achieved by the second-best entry.
Context: classic image tagging reference
Results: exceptional, big step forward.
Architecture: rather sophisticated then
Approach 1: relevant training shortcut
Approach 2: new regularisation technique
More result: also great on similar dataset
Findings
• 37% top 1 identification: not perfect or usable industrially
• 17% top 5: can suggest to make a human classifier more efficient
• Largest convolutional network at the time, limited overfitting
• Not:
• New technique overall: minor training improvements on 5-layer ConvNet
• Hopeful that deeper approaches would work
• Exploiting Residual learning differently
• No discussion on scoring quality estimate to prioritise ground-truth feedback
Dataset
• ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)
• 15 million high-resolution images
• 22k hierarchical labels via Amazon’s Mech. Turk
• 1.2 M training, 50k validation & 150k testing images
• Test on subset of 1000 images in 1000 categories
• Down-sampled images to 256 × 256
• Rectangular images: rescaled shorter side to 256 & cropped
middle
• Subtracting the mean activity over the training set from each
pixel
• No other pre-processing
Novelty
Larger but fairly standard network,
Parallel processing
Methodology & issues
• Object recognition too complex: Convolutional neural network
• Flexible (depth and breadth) & Relevant (stationarity & locality)
• Less parameters than other NN, easier to train
• Still prohibitively expensive at large scale on high-resolution images
• Current GPUs & highly-optimized convolution
• C++ implem. of ConvNet code.google.com/archive/p/cuda-convnet available
publicly
• Best results ever on ILSVRC-2010 & -2012
• Unusual features for performance & faster training
• New technique to prevent overfitting
• Less layers, worst off performance
Architecture 1: key features
• Non-linearity: Rectified Linear Unit (ReLU): f(x) = max(0, x)
• Trains an order of magnitude faster than saturating alternatives
• Parallel processing: more memory allows larger networks
• 2 x NVIDIA GTX 580 3GB GPUs , suitable for convolutional structure
• Normalisation: local inhibition
• bi
x, y = ai
x, y / [ k + ∑j ∈ i ± n (aj
x, y)2 ]β
• AlexNet use k = 2, n = 5, α = 10−4, and β = 0.75
• Overlapping pooling: s = 2 and z = 3
• First layer: s = 4 and z = 11; second: s = 2 and z = 5
Architecture 2: Network & GPU separation
GPU 1
GPU 2
Avoiding overfitting & Training
Sample augmentation & Drop-out
Image augmentation
• Image translations and horizontal reflections
• Test: ten 224 × 224 patches (corners & center) + horizontal reflections
averaging softmax predictions of the ten patches.
• Change intensity and color of the illumination (object invariant)
• PCA on the set of RGB values in training set;
add multiples of the found principal components
Drop out
• Setting to zero the output of each hidden neuron with probability 0.5
• first two fully-connected
• “Dropped out” neurons no forward pass no back- propagation
• Reduces complex co-adaptations of neurons
• One neuron cannot rely on the presence of particular other neurons
• Learn more robust features, rely different random subsets of neurons
• Doubles training time
Training
• Training: stochastic gradient descent
• Batch size of 128 examples
• Large momentum of 0.9
• Small decay of 0.0005
• Initialisation: not null to give signal to ReLU
• w0 ~ G(0, 0.01)
• a0 = 1 in 2nd, 4th & 5th convolutional & fully-connected hidden layers
• Learning rate: 0.01 overall
• Divided by 10 when validation error was not improving
Very promising outcome
Good results, sensical kernels, representative vectors
Debatable misattributions
Testing results
ILSVRC-2010
Model Top-1 Top-5
Sparse coding
Berg, Deng, & Fei-Fei ImageNet 2010
47.1% 28.2%
SIFT + FVs
Sánchez & Perronnin CVPR 2011 45.7% 25.7%
CNN 37.5% 17.0%
ILSVRC-2012
Model
Top-1
(val)
Top-5 (val) Top-5 (test)
SIFT + FVs
ILSVRC-2012
— — 26.2%
1 CNN 40.7% 18.2% —
5 CNNs 38.1% 16.4% 16.4%
1 CNN* 39.0% 16.6% —
7 CNNs* 36.7% 15.4% 15.3%
*: pre-trained on ImageNet 2011 Fall
Kernel learned (first layer for both GPUs)
Qualitative approach
Ambiguity in the image set Quality of feature vectors
Discussion in the paper
• Very little self-criticism: industrial result
• Some suggestion
• performance degrades if a single convolutional layer is removed (2% top-1)
• [no] unsupervised pre-training even though we expect that it will help
• Perspectives
• we still have many orders of magnitude to go in order to match the infero-
temporal pathway of the human visual system
• we would like to use very large and deep convolutional nets on video
sequences where the temporal structure provides very helpful information
Information for reproduction
A lot given
• Image set public
• Cuda code published
• Standard Training/test
• Meta-parameters set
• Extensive supplement data
• Sample of closest vector images
Missing
• Variance between iterations
• Training issues, exploding gradient
• Kernel for each layer
Progress so far
• Importance of AlexNet >>
• Scientific Context
• Neural nets in 2012 >>
• Convolutional nets >>
• KSH ’12 findings >>
• Limits
• Critics & costs >>
• Further works >>
• Industrial impact >>
Critics and costs of the approach
Feasible does not mean cheap
Cost of computation
• Dedicated amateur could reproduce
• Better results mainly by brute-force
• Complexity justifies Computing-as-a-service
• Centralisation of the ability to recognize images
Finding image dataset to train with
• Who needs to flag so many images?
• Applications are specialized (logistical chain, faces)
therefore potential users have the right dataset
• Non-image sets easier to find (e.g. speech, or text to classifier)
Unbalanced detection
• CNN are not great with unbalanced training sets
• Better results with Kalman filters & SVM
• Radiology: Healthy scans vs. potential cancer
• Astronomy: Galaxy vs. gravitational lenses
Further work
20,000 citations & more developments
Deeper neural nets
• More processing power, better results
• Complexity of training follows
• Fit representation of all weights in memory
• Computation of gradient along many dimensions
• Well-informed training sets are expensive to gather
• Reach a depth were convergence is hard to get
• Shortcuts in passing neurons allow start with simpler inferences
• Capsules: complementary structural elements from layers
Computation framework
• Meta-parameters become the problem
• ‘Neural network architects’
• Need to express structure in a coherent syntax using framework
• Caffe, PyTorch, Thean, TensorFlow
• Handle engineering challenge, like parallelisation
Deconvolution & Adversarial approaches
Labels to image suggestion
• Generative adversarial network: two responding networks
• Flag generated images to improve quality
• Deep Dream: introduce more positive minor elements
• ML Hacking: Minor perturbation targeted for consistent errors
Industrial impact
Applications and challenges from the paper
Recognising images
• Image Search at a product
• Medical images to diagnostic
• Industrial applications
• Object categorization,
position in a logistical chain
• Photos as economic proxy
e.g. Parking for activity
• Face labelling & social ties
• Opening more images to
copyright abuses
• Re-think of detection processes
Concentration of diagnostic tool
• Automated flagging of
inappropriate content
• Discrimination from photos
Beyond direct application
• More inference
• Separate edges, attention
• 2D to 3D representation, position
• Imagine what is hidden behind
• Image coloration
• Photos transformation
• Style transfer: artist, season, light
• Adversarial generation
• Psycadelic augmentation
• Video processing
• Still selection
• Video editing
• Classifying behavior
• Beyond photos
• 2, 3 dimensional signal
• Music, voice generation
• Recurrent Neural network with
Long Short-Term Memory:
• Text translation, generation
• Speech processing
Outline
• Importance of AlexNet >>
• Scientific Context
• Neural nets in 2012 >>
• Convolutional nets >>
• KSH ’12 findings >>
• Limits
• Critics & costs >>
• Further works >>
• Industrial impact >>
Any questions?
bertil.hatt@ensae.org
+44 (0)7 48 12 799 38
Highly recommended:
• playground.tensorflow.org
• Tutorial videos 3b1b.co
1 de 55

Recomendados

Convolutional Neural Network Models - Deep Learning por
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningBenha University
12.6K visualizações83 slides
Convolutional Neural Network and Its Applications por
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
4.6K visualizações23 slides
CNN Tutorial por
CNN TutorialCNN Tutorial
CNN TutorialSungjoon Choi
5.6K visualizações37 slides
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S... por
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
13.6K visualizações63 slides
Deep Learning - Convolutional Neural Networks por
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
71.4K visualizações86 slides
Convolutional Neural Networks (CNN) por
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
58.5K visualizações70 slides

Mais conteúdo relacionado

Mais procurados

ViT (Vision Transformer) Review [CDM] por
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
4.1K visualizações25 slides
Convolutional Neural Network (CNN) - image recognition por
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN
4.3K visualizações64 slides
Convolutional Neural Networks por
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
16.1K visualizações80 slides
Cnn por
CnnCnn
CnnNirthika Rajendran
13.9K visualizações31 slides
Introduction to CNN por
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
8.8K visualizações18 slides
Convolutional Neural Network por
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkVignesh Suresh
261 visualizações10 slides

Mais procurados(20)

ViT (Vision Transformer) Review [CDM] por Dongmin Choi
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi4.1K visualizações
Convolutional Neural Network (CNN) - image recognition por YUNG-KUEI CHEN
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN4.3K visualizações
Convolutional Neural Networks por Ashray Bhandare
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
Ashray Bhandare16.1K visualizações
Introduction to CNN por Shuai Zhang
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang8.8K visualizações
Convolutional Neural Network por Vignesh Suresh
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
Vignesh Suresh261 visualizações
Deep Learning With Neural Networks por Aniket Maurya
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya3K visualizações
Convolutional neural network por MojammilHusain
Convolutional neural networkConvolutional neural network
Convolutional neural network
MojammilHusain1.1K visualizações
Deep Learning - CNN and RNN por Ashray Bhandare
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
Ashray Bhandare5.9K visualizações
Machine Learning - Convolutional Neural Network por Richard Kuo
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
Richard Kuo1K visualizações
Recurrent neural network por Syed Annus Ali SHah
Recurrent neural networkRecurrent neural network
Recurrent neural network
Syed Annus Ali SHah1.3K visualizações
Cnn por Mehrnaz Faraz
CnnCnn
Cnn
Mehrnaz Faraz630 visualizações
Convolutional Neural Network (CNN) por Muhammad Haroon
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Muhammad Haroon824 visualizações
CNN Machine learning DeepLearning por Abhishek Sharma
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
Abhishek Sharma540 visualizações
Intro to deep learning por David Voyles
Intro to deep learning Intro to deep learning
Intro to deep learning
David Voyles1.6K visualizações
Cnn method por AmirSajedi1
Cnn methodCnn method
Cnn method
AmirSajedi12.2K visualizações
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori... por Simplilearn
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn3.6K visualizações
Introduction to Deep Learning por Oswald Campesato
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato3.8K visualizações
Image classification using CNN por Noura Hussein
Image classification using CNNImage classification using CNN
Image classification using CNN
Noura Hussein5.3K visualizações
Convolution Neural Network (CNN) por Suraj Aavula
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
Suraj Aavula13.5K visualizações

Similar a AlexNet

MLIP - Chapter 3 - Introduction to deep learning por
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
374 visualizações95 slides
Towards better analysis of deep convolutional neural networks por
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks曾 子芸
350 visualizações41 slides
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019 por
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya
399 visualizações58 slides
Deep Learning for Computer Vision - ExecutiveML por
Deep Learning for Computer Vision - ExecutiveMLDeep Learning for Computer Vision - ExecutiveML
Deep Learning for Computer Vision - ExecutiveMLAlex Conway
1.1K visualizações94 slides
Convolutional neural networks 이론과 응용 por
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
2.2K visualizações63 slides
PyConZA'17 Deep Learning for Computer Vision por
PyConZA'17 Deep Learning for Computer VisionPyConZA'17 Deep Learning for Computer Vision
PyConZA'17 Deep Learning for Computer VisionAlex Conway
1.5K visualizações106 slides

Similar a AlexNet(20)

MLIP - Chapter 3 - Introduction to deep learning por Charles Deledalle
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
Charles Deledalle374 visualizações
Towards better analysis of deep convolutional neural networks por 曾 子芸
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
曾 子芸350 visualizações
Deep Learning for Computer Vision - ExecutiveML por Alex Conway
Deep Learning for Computer Vision - ExecutiveMLDeep Learning for Computer Vision - ExecutiveML
Deep Learning for Computer Vision - ExecutiveML
Alex Conway1.1K visualizações
Convolutional neural networks 이론과 응용 por 홍배 김
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
홍배 김2.2K visualizações
PyConZA'17 Deep Learning for Computer Vision por Alex Conway
PyConZA'17 Deep Learning for Computer VisionPyConZA'17 Deep Learning for Computer Vision
PyConZA'17 Deep Learning for Computer Vision
Alex Conway1.5K visualizações
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak por PyData
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData5.9K visualizações
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring... por Sergey Karayev
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Sergey Karayev13.5K visualizações
Chapter10.pptx por adnansbp
Chapter10.pptxChapter10.pptx
Chapter10.pptx
adnansbp10 visualizações
Artificial Intelligence, Machine Learning and Deep Learning por Sujit Pal
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal22.9K visualizações
Deep Learning por Pierre de Lacaze
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze1.3K visualizações
"Demystifying Deep Neural Networks," a Presentation from BDTI por Edge AI and Vision Alliance
"Demystifying Deep Neural Networks," a Presentation from BDTI"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI
Edge AI and Vision Alliance765 visualizações
Three Tools for "Human-in-the-loop" Data Science por Aditya Parameswaran
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
Aditya Parameswaran2.2K visualizações
Deeplearning in finance por Sebastien Jehan
Deeplearning in financeDeeplearning in finance
Deeplearning in finance
Sebastien Jehan14.3K visualizações
mi-lecture-kx-part-1.pptx por arunadepu1
mi-lecture-kx-part-1.pptxmi-lecture-kx-part-1.pptx
mi-lecture-kx-part-1.pptx
arunadepu14 visualizações
Deep learning and computer vision por MeetupDataScienceRoma
Deep learning and computer visionDeep learning and computer vision
Deep learning and computer vision
MeetupDataScienceRoma639 visualizações
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits por NVIDIA Taiwan
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA Taiwan970 visualizações
introduction to deeplearning por Eyad Alshami
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
Eyad Alshami901 visualizações
Machine Learning Foundations for Professional Managers por Albert Y. C. Chen
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen522 visualizações

Mais de Bertil Hatt

Five finger audit por
Five finger auditFive finger audit
Five finger auditBertil Hatt
278 visualizações4 slides
Are you ready for Data science? A 12 point test por
Are you ready for Data science? A 12 point testAre you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testBertil Hatt
593 visualizações29 slides
Prediction machines por
Prediction machinesPrediction machines
Prediction machinesBertil Hatt
224 visualizações15 slides
Garbage in, garbage out por
Garbage in, garbage outGarbage in, garbage out
Garbage in, garbage outBertil Hatt
257 visualizações43 slides
MancML Growth accounting por
MancML Growth accountingMancML Growth accounting
MancML Growth accountingBertil Hatt
70 visualizações18 slides
What to do to get started with AI por
What to do to get started with AIWhat to do to get started with AI
What to do to get started with AIBertil Hatt
268 visualizações39 slides

Mais de Bertil Hatt(6)

Five finger audit por Bertil Hatt
Five finger auditFive finger audit
Five finger audit
Bertil Hatt278 visualizações
Are you ready for Data science? A 12 point test por Bertil Hatt
Are you ready for Data science? A 12 point testAre you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point test
Bertil Hatt593 visualizações
Prediction machines por Bertil Hatt
Prediction machinesPrediction machines
Prediction machines
Bertil Hatt224 visualizações
Garbage in, garbage out por Bertil Hatt
Garbage in, garbage outGarbage in, garbage out
Garbage in, garbage out
Bertil Hatt257 visualizações
MancML Growth accounting por Bertil Hatt
MancML Growth accountingMancML Growth accounting
MancML Growth accounting
Bertil Hatt70 visualizações
What to do to get started with AI por Bertil Hatt
What to do to get started with AIWhat to do to get started with AI
What to do to get started with AI
Bertil Hatt268 visualizações

Último

UNEP FI CRS Climate Risk Results.pptx por
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 visualizações51 slides
Building Real-Time Travel Alerts por
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
111 visualizações48 slides
Advanced_Recommendation_Systems_Presentation.pptx por
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptxneeharikasingh29
5 visualizações9 slides
MOSORE_BRESCIA por
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 visualizações8 slides
3196 The Case of The East River por
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
11 visualizações4 slides
Data structure and algorithm. por
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm. Abdul salam
19 visualizações24 slides

Último(20)

UNEP FI CRS Climate Risk Results.pptx por pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 visualizações
Building Real-Time Travel Alerts por Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann111 visualizações
Advanced_Recommendation_Systems_Presentation.pptx por neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
neeharikasingh295 visualizações
3196 The Case of The East River por ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 visualizações
Data structure and algorithm. por Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 19 visualizações
Introduction to Microsoft Fabric.pdf por ishaniuudeshika
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika29 visualizações
RuleBookForTheFairDataEconomy.pptx por noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 visualizações
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation por DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
DataScienceConferenc17 visualizações
Chapter 3b- Process Communication (1) (1)(1) (1).pptx por ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 visualizações
Understanding Hallucinations in LLMs - 2023 09 29.pptx por Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski17 visualizações
How Leaders See Data? (Level 1) por Narendra Narendra
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra13 visualizações
Supercharging your Data with Azure AI Search and Azure OpenAI por Peter Gallagher
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher37 visualizações
Cross-network in Google Analytics 4.pdf por GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 visualizações
PROGRAMME.pdf por HiNedHaJar
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar18 visualizações
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx por JaysonGarabilesEspej
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 visualizações
CRIJ4385_Death Penalty_F23.pptx por yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 visualizações
Short Story Assignment by Kelly Nguyen por kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 visualizações
Organic Shopping in Google Analytics 4.pdf por GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials11 visualizações

AlexNet

  • 1. AlexNet: Context, Summary & Impact Discussion by B. Hatt on ImageNet Classification with Deep Convolutional Neural Network, NIPS 2012 by A. Krizhevsky, I. Sutskever & G. Hinton
  • 2. Outline • Importance of AlexNet • Scientific Context • Neural nets in 2012 • Convolutional nets • KSH ’12 findings • Limits • Critics & costs • Further works • Industrial impact • This presentation should last about 50 min. without questions • Feel free to interrupt at anytime for questions, misunderstanding
  • 3. Deep convolutional neural networks The most influential industrial discovery in a decade
  • 4. The most influential paper on data science • 20,000 citations, more than any cited by or citing this paper • Taught to all aspiring data scientists, at university & on-line • Fastest growing academic requirement for new positions • Applications are rather narrow: image to tags • Recognising faces for photos in social context • Flagging content for further attention, search • Ideas could be applied to other computationally intensive tasks • Solutions are too complex to be explained to a human
  • 5. Dramatic performance gain by pro • Well below 25% • No team did worst than 25% since • Proved that Deep Convolutional networks were the way to go for that problem class • As long as you train your model on multiple GPUs
  • 6. Important papers cited by or citing KSH ’12 Key paper cited by KSH or influential work • Back propagation applied to handwritten zip code recognition. NEURAL COMPUTING 1989, LeCun & al. • The MNIST database of handwritten digits, 1998 LeCun & al. • Gradient-based learning Applied to document recognition. IEEE 1998, LeCun & al. • Learning to parse images. NIPS 2000, Hinton, Ghahramani & The • Learning methods for generic object recognition with invariance to pose and lighting. CVPR 2004, LeCun & al. • ImageNet: A Large Scale Hierarchical Image Database. 2012, Deng & al. Major paper citing KSH directly or not Deeper • Going Deeper with Convolution, 2015, Szegedy & al. [Google & Magic Leap] • Very Deep convolutional networks for large scale image recognition, ICLR 2015, Simonyan & Zisserman [Oxford] Other architecture • Generative Adversarial Networks, 2014 Goodfellow & al. • Dynamic Routing Between Capsules, NIPS 2017, Sabour, Frosst & Hinton
  • 7. A more engineering than academic problem • Reproduction is difficult without unpublished code, computing and engineering resources • Few large datasets of tagged images to test and train, bias in use cases • Conference presentation more publicized that papers’ claims • Heavily sponsored conferences • Targeted at applicant hiring • Scalable computing framework and resources • Theano, TensorFlow, Keras, etc. • Amazon Web Services, Google Cloud Platform
  • 8. What this paper is Image to classification • Parallel architecture to scale model • Set of implementation tips • Imperfect solution to scale, reflection, colour & illumination, rotation (2D), point of view (3D) Other image processing • Leverage hierarchy in training set • Locate the interesting parts • Boolean if object is in at all • Edge detection, Separate shapes • Imagine what is hidden behind • 2D to 3D representation • Duplicate elements, counting • Video processing, Still selection What it is not about
  • 9. Neural networks in 2012 The promises of back-propagation
  • 10. Hubel & Wiesel 1962: Mammalian visual cortex
  • 11. A network of layers, activity flowing forward • Very large input set (images) • into a small outcome (classifier) • Each neurons has an active value • Neuron in each layer relies on lower layers for its activation • Algebraic sum of previous layer • Filtered by an activation function a0 (1) = σ(w0, 0 (1).a0 (0) + w1, 0 1.a1 (0) + … ) • Iteratively, until last layer • Historically σ  [0, 1] Now, σ commonly max(0, x) a0 (1) w0, 0 (1)a0 (0) a4 (3)
  • 12. Back propagation 1: how to feed a perceptron a0 (1) = σ(w0, 0 (1).a0 (0) + w1, 0 1.a1 (0) + … + wn, 0 1.an (0) + b0 (0)) a1 (1) = σ(w0, 1 (1).a0 (0) + w1, 1 1.a1 (0) + … + wn, 1 1.an (0) + b1 (0)) a0 (2) = σ(w0, 0 (2).a0 (1) + w1, 0 (2). a1 (1) + … + wn, 0 (2).an (1) + b0 (1)) = σ( w0, 0 (2). σ(w0, 0 (1).a0 (0) + … + b0 (0) ) + w1, 0 (2). σ(w0, 1 (1).a0 (0) + … + b1 (0) ) + … + b0 (1) ) … an 1 = σ(w0, 0 (L). σ(w0, 0 (L-1). σ(w0, 0 (L-2). σ(…) ) ) ) + … + bn (L)) = σ o W·σo …(L-1) times o W·σ (…) + σ o W·σo …(L-2) times o W·σ (…) + …
  • 13. Back propagation 2: how to learn step by step • Initialisation • Training loop • First feed-forward • Algebra input + Activation • Calculate Errors • Back-propagate • Deeply combined derivatives • Learn new weight & bias • Next loop • Until cost slows to a stop Cost := | Y - SoftMax(an 1,…,an M) | ∂ Cost / ∂ wi, j k = ∑ σ’(…)·W·σ o …L-1 times o σ (…) This chain rule defines a gradient along ∑layer (Nl.Nl-1 + Nl) dimensions
  • 14. The perceptron: Universal approximator • Theorem: If σ is continuous & non-linear, any function can be approximated as closely as you like by a perceptron, given enough layers & neurons • Exploding & vanishing gradient: • weights just below or above 1, their cumulated impact resp. disappears or increases dramatically on deep networks. • Exploding complexity of the gradient: • ∂ σ o …L times o σ (w) / ∂ w= ∑ σ’· σ o …L-1 times o σ (w) is unwieldly for continuous finite functions: sigmoid, tanh, logit • ReLU: x  max(0, x) easier to derivate iteratively
  • 15. Convolutional networks How complex structures can process images
  • 16. Hubel & Wiesel 1962: Mammalian visual cortex
  • 17. Architecture • Image re-sizing, color to grey-scale • Three specific processes to reduce dimensions • Convolution: recognize small patterns • Rectification/Activation: ignore irrelevant • Pooling: feature, not exact position • Final layers • Flatten (re-shape) • Fully interconnected (much smaller dimension than image definition) • Normalisation, typically SoftMax
  • 18. Matrix filters aka Convolutions • Local concerns • Hierarchical structure • Each cell of the filter is a parameters • One layer typically share a set of filters • Kernel = Filter = Weight = Feature matrix = Feature map = Activation map = Parameters to be trained Span 1  9 weights 2  25 weights
  • 20. Whole image processed locally, with some possible overlap Step N(0) x N (0)  N(1) x N(1) = N(0)/step x N (0)/step
  • 21. Pooling, typically using MaxPool Stride N(1) x N (1)  N(2) x N(2) = N(1)/stride x N (1)/ stride
  • 22. Architecture • Image re-sizing, color to grey-scale • Three specific processes to reduce dimensions • Convolution: recognize small patterns • Rectification/Activation: ignore irrelevant • Pooling: feature, not exact position • Final layers • Flatten (re-shape) • Fully interconnected (much smaller dimension than image definition) • Normalisation, typically SoftMax Span, Step & Padding Sigmoid, tanh or ReLU Stride, Pooling function
  • 23. The challenge of increasing layers • We can go from one large, colourful image to far fewer dimensions • But that requires many layers, and the network becomes complex to train • Larger networks have better results • Largest networks hit computation limits with passable results until 2012
  • 24. Progress so far • Importance of AlexNet >> • Scientific Context • Neural nets in 2012 >> • Convolutional nets >> • KSH ’12 findings >> • Limits • Critics & costs >> • Further works >> • Industrial impact >>
  • 25. A. Krizhevsky, I. Sutskever & G. Hinton at NIPS 2012 The AlexNet paper itself: findings, insights
  • 26. Abstract • We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. • On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. • The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000- way softmax. • To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. • To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. • We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. Context: classic image tagging reference Results: exceptional, big step forward. Architecture: rather sophisticated then Approach 1: relevant training shortcut Approach 2: new regularisation technique More result: also great on similar dataset
  • 27. Findings • 37% top 1 identification: not perfect or usable industrially • 17% top 5: can suggest to make a human classifier more efficient • Largest convolutional network at the time, limited overfitting • Not: • New technique overall: minor training improvements on 5-layer ConvNet • Hopeful that deeper approaches would work • Exploiting Residual learning differently • No discussion on scoring quality estimate to prioritise ground-truth feedback
  • 28. Dataset • ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) • 15 million high-resolution images • 22k hierarchical labels via Amazon’s Mech. Turk • 1.2 M training, 50k validation & 150k testing images • Test on subset of 1000 images in 1000 categories • Down-sampled images to 256 × 256 • Rectangular images: rescaled shorter side to 256 & cropped middle • Subtracting the mean activity over the training set from each pixel • No other pre-processing
  • 29. Novelty Larger but fairly standard network, Parallel processing
  • 30. Methodology & issues • Object recognition too complex: Convolutional neural network • Flexible (depth and breadth) & Relevant (stationarity & locality) • Less parameters than other NN, easier to train • Still prohibitively expensive at large scale on high-resolution images • Current GPUs & highly-optimized convolution • C++ implem. of ConvNet code.google.com/archive/p/cuda-convnet available publicly • Best results ever on ILSVRC-2010 & -2012 • Unusual features for performance & faster training • New technique to prevent overfitting • Less layers, worst off performance
  • 31. Architecture 1: key features • Non-linearity: Rectified Linear Unit (ReLU): f(x) = max(0, x) • Trains an order of magnitude faster than saturating alternatives • Parallel processing: more memory allows larger networks • 2 x NVIDIA GTX 580 3GB GPUs , suitable for convolutional structure • Normalisation: local inhibition • bi x, y = ai x, y / [ k + ∑j ∈ i ± n (aj x, y)2 ]β • AlexNet use k = 2, n = 5, α = 10−4, and β = 0.75 • Overlapping pooling: s = 2 and z = 3 • First layer: s = 4 and z = 11; second: s = 2 and z = 5
  • 32. Architecture 2: Network & GPU separation GPU 1 GPU 2
  • 33. Avoiding overfitting & Training Sample augmentation & Drop-out
  • 34. Image augmentation • Image translations and horizontal reflections • Test: ten 224 × 224 patches (corners & center) + horizontal reflections averaging softmax predictions of the ten patches. • Change intensity and color of the illumination (object invariant) • PCA on the set of RGB values in training set; add multiples of the found principal components
  • 35. Drop out • Setting to zero the output of each hidden neuron with probability 0.5 • first two fully-connected • “Dropped out” neurons no forward pass no back- propagation • Reduces complex co-adaptations of neurons • One neuron cannot rely on the presence of particular other neurons • Learn more robust features, rely different random subsets of neurons • Doubles training time
  • 36. Training • Training: stochastic gradient descent • Batch size of 128 examples • Large momentum of 0.9 • Small decay of 0.0005 • Initialisation: not null to give signal to ReLU • w0 ~ G(0, 0.01) • a0 = 1 in 2nd, 4th & 5th convolutional & fully-connected hidden layers • Learning rate: 0.01 overall • Divided by 10 when validation error was not improving
  • 37. Very promising outcome Good results, sensical kernels, representative vectors Debatable misattributions
  • 38. Testing results ILSVRC-2010 Model Top-1 Top-5 Sparse coding Berg, Deng, & Fei-Fei ImageNet 2010 47.1% 28.2% SIFT + FVs Sánchez & Perronnin CVPR 2011 45.7% 25.7% CNN 37.5% 17.0% ILSVRC-2012 Model Top-1 (val) Top-5 (val) Top-5 (test) SIFT + FVs ILSVRC-2012 — — 26.2% 1 CNN 40.7% 18.2% — 5 CNNs 38.1% 16.4% 16.4% 1 CNN* 39.0% 16.6% — 7 CNNs* 36.7% 15.4% 15.3% *: pre-trained on ImageNet 2011 Fall
  • 39. Kernel learned (first layer for both GPUs)
  • 40. Qualitative approach Ambiguity in the image set Quality of feature vectors
  • 41. Discussion in the paper • Very little self-criticism: industrial result • Some suggestion • performance degrades if a single convolutional layer is removed (2% top-1) • [no] unsupervised pre-training even though we expect that it will help • Perspectives • we still have many orders of magnitude to go in order to match the infero- temporal pathway of the human visual system • we would like to use very large and deep convolutional nets on video sequences where the temporal structure provides very helpful information
  • 42. Information for reproduction A lot given • Image set public • Cuda code published • Standard Training/test • Meta-parameters set • Extensive supplement data • Sample of closest vector images Missing • Variance between iterations • Training issues, exploding gradient • Kernel for each layer
  • 43. Progress so far • Importance of AlexNet >> • Scientific Context • Neural nets in 2012 >> • Convolutional nets >> • KSH ’12 findings >> • Limits • Critics & costs >> • Further works >> • Industrial impact >>
  • 44. Critics and costs of the approach Feasible does not mean cheap
  • 45. Cost of computation • Dedicated amateur could reproduce • Better results mainly by brute-force • Complexity justifies Computing-as-a-service • Centralisation of the ability to recognize images
  • 46. Finding image dataset to train with • Who needs to flag so many images? • Applications are specialized (logistical chain, faces) therefore potential users have the right dataset • Non-image sets easier to find (e.g. speech, or text to classifier)
  • 47. Unbalanced detection • CNN are not great with unbalanced training sets • Better results with Kalman filters & SVM • Radiology: Healthy scans vs. potential cancer • Astronomy: Galaxy vs. gravitational lenses
  • 48. Further work 20,000 citations & more developments
  • 49. Deeper neural nets • More processing power, better results • Complexity of training follows • Fit representation of all weights in memory • Computation of gradient along many dimensions • Well-informed training sets are expensive to gather • Reach a depth were convergence is hard to get • Shortcuts in passing neurons allow start with simpler inferences • Capsules: complementary structural elements from layers
  • 50. Computation framework • Meta-parameters become the problem • ‘Neural network architects’ • Need to express structure in a coherent syntax using framework • Caffe, PyTorch, Thean, TensorFlow • Handle engineering challenge, like parallelisation
  • 51. Deconvolution & Adversarial approaches Labels to image suggestion • Generative adversarial network: two responding networks • Flag generated images to improve quality • Deep Dream: introduce more positive minor elements • ML Hacking: Minor perturbation targeted for consistent errors
  • 52. Industrial impact Applications and challenges from the paper
  • 53. Recognising images • Image Search at a product • Medical images to diagnostic • Industrial applications • Object categorization, position in a logistical chain • Photos as economic proxy e.g. Parking for activity • Face labelling & social ties • Opening more images to copyright abuses • Re-think of detection processes Concentration of diagnostic tool • Automated flagging of inappropriate content • Discrimination from photos
  • 54. Beyond direct application • More inference • Separate edges, attention • 2D to 3D representation, position • Imagine what is hidden behind • Image coloration • Photos transformation • Style transfer: artist, season, light • Adversarial generation • Psycadelic augmentation • Video processing • Still selection • Video editing • Classifying behavior • Beyond photos • 2, 3 dimensional signal • Music, voice generation • Recurrent Neural network with Long Short-Term Memory: • Text translation, generation • Speech processing
  • 55. Outline • Importance of AlexNet >> • Scientific Context • Neural nets in 2012 >> • Convolutional nets >> • KSH ’12 findings >> • Limits • Critics & costs >> • Further works >> • Industrial impact >> Any questions? bertil.hatt@ensae.org +44 (0)7 48 12 799 38 Highly recommended: • playground.tensorflow.org • Tutorial videos 3b1b.co

Notas do Editor

  1. Watson & Crick’s DNA structure has 5k citations
  2. Watson & Crick’s DNA structure has 5k citations
  3. LeCun: Apply Gradient to neural nets, ImageNet: not published, challenge reference for emerging technique large training set, essential
  4. Clusters of neurons matching features Of increasing level of abstraction
  5. Given enough parameters With non-linearities Chain rule explosion
  6. We could represent the whole math with matrix products Actually, code that implements this like Coda, which is the engine of AlexNet was published at the same time. But I want to illustrate how complicated the math really is
  7. Given enough parameters With non-linearities
  8. Clusters of neurons matching features Of increasing level of abstraction
  9. For each convolutional layer, you can have one, two of three of the slice type Each with their parameters
  10. Explicit reference to biological neuronal structure isolated 60s LeCun 1989: - Local connection - Layers of increasing abstraction Spacial invariance Only local part is connected, “slide over” Otherwise, combinatorial explosion
  11. once we know that a specific feature is in the original input volume (there will be a high activation value), its exact location is not as important as its relative location to the other features
  12. For each convolutional layer, you can have one, two of three of the slice type Each with their parameters
  13. Any question about the context of the paper so far? Do both ideas of what is a convolutional net, and what is back propagation make sense to you?
  14. Ok, so let’s move on to the core of this presentation: The findings of the paper itself. What is interesting is that this paper was not seen as spectacular at the time
  15. Overall, very exhaustive documentatiojn which explains a lot of the success The rest of the industry might not have done so well
  16. Support vector machines can more easily Separate the part of the feature space that we are trying to detect without assigning to rare events to maximize cost function