SlideShare uma empresa Scribd logo
1 de 59
Baixar para ler offline
Modern Convolutional Neural Network
techniques for image segmentation
Deep Learning Journal Club
Gioele Ciaparrone
Michele Curci
November 30, 2016
University of Salerno
Index
1. Introduction
2. The Inception architecture
3. Fully convolutional networks
4. Hypercolumns
5. Conclusion
2
Introduction
CNN recap
• Sequence of convolutional and pooling layers
• Rectifier activation function
• Fully connected layers at the end
• Softmax function for classification
4
Convolution I
5
Convolution II
Valid padding (left) and same padding (right) convolutions
6
LeNet-5 (1989-1998)
• First CNN (1989) proven to work well, used for handwritten Zip
code recognition [1]
• Refined through the years until the LeNet-5 version (1998) [2]
7
LeNet-5 interactive visualization [3]
It’s possible to interact with the network in 3D, manually drawing a digit
to be classified, clicking on the neurons to get info about the parameters
and the connected units, or rotating and zooming the network:
http://scs.ryerson.ca/~aharley/vis/conv/
8
AlexNet (2012) [5]
• After a long hiatus in which deep learning was ignored [4], they
received attention once again after Alex Krizhevsky overwhelmingly
won the ILSVRC in 2012 with AlexNet
• Structure very similar to LeNet-5, but with some new key insights:
very efficient GPU implementation, ReLU neurons and dropout
9
The Inception architecture
Motivations
• Increasing model size tends to improve quality
• More computational resources are needed
• Computational efficiency and low parameter count are still important
• Mobile vision and embedded systems
• Big Data
11
Going Deeper with Convolutions [6]
• The Inception module solves this problem making a better use of the
computing resources
• Proposed in 2014 by Christian Szegedy and other Google researchers
• Used in the GoogLeNet architecture that won both the ILSVRC
2014 classification and detection challanges
12
Inception module I
• Visual information is processed at various scales and then aggregated
• Since pooling operations are beneficial in CNNs, a parallel pooling
path has been added
• Problems:
• 3x3 and 5x5 convolutions can be very expensive on top of a layer
with lots of filters
• The number of filters substantially increases for each Inception layer
added, leading to a computational blow up 13
Inception module II
• Adding the 1x1 convolutions before the bigger convolutions reduces
dimensionality
• The same is done after the pooling layer
14
GoogLeNet I
• GoogLeNet is a particular incarnation of the Inception architecture
• 22 convolutional layers (27 including pooling)
• 9 Inception modules
• 2 auxiliary classifiers to solve the vanishing gradient problem and for
regularization
• Designed with computational efficiency in mind
• Inference can be run on devices with limited computational
resources, especially memory
• 7 of these networks used in an ensemble for the ILSVRC 2014
classification task
15
GoogLeNet II
16
GoogLeNet III
17
GoogLeNet - Training
• Trained with the DistBelief distributed machine learning system
• Asynchronous stochastic gradient descent with 0.9 momentum
• Image sampling methods have changed many times before the
competition
• Converged models were trained on with other options
• Models were trained on crops of different size
• There isn’t a definitive guidance to the most effective single way to
train these networks
18
GoogLeNet - ILSVRC 2014 Results
Classification (above) and object detection (below) results.
19
DeepDream
Google’s DeepDream uses a GoogLeNet to produce “machine dreams”
20
Inception-v2 and Inception-v3
• The Inception module authors later presented new optimized
versions of the architecture, called Inception-v2 and Inception-v3 [7]
• They managed to significantly improve GoogLeNet ILSVRC 2014
results
• The improvements were based on various key principles:
• Avoid representational bottlenecks
• Spatial aggregation on lower dimensional embeddings doesn’t usually
induce relevant losses in representational power
• Balance the width and depth of the network
21
Convolution factorization I
• Factorizing convolutions allows to reduce the number of parameters
while not loosing much expressiveness
• For example 5x5 convolutions can be factorized into a pair of 3x3
convolutions
• It is also possible to factorize a NxN convolutions into a 1xN and a
Nx1 convolutions
22
Convolution factorization II
The original Inception module (left) and the new factorized module
(right).
23
Efficient grid size reduction - problem
• Suppose we want to pass from a d × d grid with k filters to a d
2 × d
2
grid with 2k filters
• We need to compute a stride-1 convolution and then a pooling
• Computational cost dominated by convolutions: 2d2
k2
operations
• Inverting the order, the number of operations is reduced to 2(d
2 )2
k2
,
but we violate the bottleneck principle
24
Efficient grid size reduction - solution
• The solution is an Inception module with convolution and pooling
blocks with stride 2
• Computationally efficient and no representational bottleneck
introduced
25
The new architecture
• Using various modified Inception modules, here is the new
Inception-v2 architecture
26
Inception-v2: modules used
n = 7
27
Inception-v2: training and observations
• The network was trained on the ILSVRC 2012 images using
stochastic gradient descent and the TensorFlow library
• Experimental testings proved the two auxiliary classifiers to have less
impact on the training convergence than expected
• In the early training phases, the model performance was not affected
by the presence of the auxiliary classifiers: they only improved the
performance near the end of training
• Removing the lower auxiliary classifier didn’t have any effect
• The main classifier performs better if batch normalization or dropout
are added to the auxiliary ones
• The model was also trained and tested on smaller receptive fields
with only a small loss of top-1 accuracy (76.6% for 299x299 RF vs.
75.2% on 79x79 RF). Important for post-classification of detection
28
Inception-v2 to Inception-v3 results (single model)
• Each row’s Inception-v2 model adds a feature with respect to the
previous row’s model
• The last line’s model is referred to as the Inception-v3 model
29
Inception-v3 vs other models (single and ensemble)
Single model results Ensemble results
• On the ILSVRC 2012 dataset, there is a significant improvement
versus state-of-the-art models, both with a single model and with an
ensemble of models
• Note that the ensemble errors here are validation errors (except for
the one marked with ’*’, that is a test error)
30
Fully convolutional networks
Semantic segmentation
• Image segmentation is the process of partitioning an image in
multiple segments (set of pixels or super-pixels)
• Semantic segmentation is the partitioning of an image into
semantically meaningful parts and to classify each part into one of
the pre-determined classes
• It’s possible to achieve the same result with pixel-wise
classification, i.e. assigning a class to each pixel
32
Fully convolutional networks
• Shelhamer et al. [8] showed that fully convolutional networks trained
pixels-to-pixels exceed the state-of-the-art in semantic segmentation
• The fully convolutional networks they proposed take input of
arbitrary size and produce same-sized output to make dense
predictions
33
Convolutionalization of a classic net I
• Typical recognition nets (AlexNet, GoogLeNet, etc.) take fixed-sized
inputs and produce non-spatial outputs
• The fully connected layers have fixed dimensions and drop the
spatial coordinates
• However we can view these fully connected layers as convolutions
that cover their entire input regions
34
Convolutionalization of a classic net II
• These fully convolutional networks take input of any size and output
classifications map
• The resulting maps are equivalent to the evaluation of the original
network on particular input patches
• The new network is more than 5 times faster than the original
network both at learning time and at inference time (considering a
10x10 output grid)
• Note that the output dimensions are typically reduced by
subsampling
• So output interpolation is needed to obtain dense predictions
• The interpolation is obtained through backwards convolutions
35
Backwards strided convolution
Upsampling from 3x3 grid to 5x5
36
Architecture I
• Coarse and local information is fused combining lower and higher
layers
• 3 network types with different layers fused were tested
37
Architecture II
• 3 proven classification architectures were transformed to fully
convolutional: AlexNet, VGG16 and GoogLeNet
• Each net’s final classifier layer is discarded and all the fully
connected layers are converted to convolutions
• A 1x1 convolution with 21 channels (the number of classes in the
PASCAL VOC 2011 dataset) is added to the end, followed by a
backwards convolution layer
38
Architecture III
• The original nets were first pre-trained using image classification
• Then they were transformed to fully convolutional for fine tuning
using whole images (using SGD with momentum)
• The best results were obtained with FCN-VGG16
• Training on whole images proved to be as effective as sampling
patches
39
Architecture comparison
• The first models (FCN-32s) didn’t fuse different layers, but the
resulting output is very coarse
• They then fused lower layers with the last one (as shown earlier) to
obtain better results (mean IU 62.7 for FCN-8s vs. 59.4 for
FCN-32s)
40
Results comparison I
• The model reaches state-of-the-art performance on semantic
segmentation
• Also the model is much faster at inference time than previous
architectures
41
Results comparison II
42
Hypercolumns
Hypercolumns I
• The last layer of a CNN captures general features of the image, but
is too coarse spatially to allow precise localization
• Earlier layers instead may be precise in localization but will not
capture semantics
• Hariharan et al. [9] presented the hypercolumn concept, which puts
togheter the information from both higher and lower layers to obtain
better results on 3 fine-grained localization tasks:
• Simultaneous detection and segmentation
• Keypoint localization
• Part labeling
44
Hypercolumns II
• The hypercolumn corresponding to a given input location is defined
as the outputs of all units above that location at all layers of the
CNN, stacked into one vector
45
Problem setting I
• Input: a set of detections (subjected to non-maximum suppression),
each with a bounding box, a category label and a score
• According to the task we are performing for each detection we want:
• segment out the object
• segment its parts
• predict its keypoints
• Whichever the task, the bounding boxes are slightly expanded and a
50x50 heatmap is predicted on each of them
46
Problem setting II
• The information encoded in each heatmap and the number of
heatmaps depend on the chosen task:
• For segmentation, the heatmap encodes the probability that a
particular location is inside the object
• For part labeling a separate heatmap is predicted for each part,
where each heatmap is the probability a location belongs to that part
• For keypoint localization a separate heatmap is predicted for each
keypoint, with each heatmap encoding the probability that the
keypoint is at a particular location
• The heatmaps are finally resized to the size of the expanded
bounding boxes
• So all the tasks are solved assigning a probability to each of the
50x50 locations
47
Problem setting III
• For each of the 50x50 locations and for each category a classifier
should be trained
• But doing so has 3 problems:
• The amount of data that each classifier sees during training is
heavily reduced
• Training so many classifiers is computationally expensive
• While the classifier should vary according to the location, to adjacent
pixels should be classified similarly
• The solution is to train a coarse K × K (usually K = 5 or K = 10)
grid of classifiers and interpolate between them
48
Network architecture
conv conv conv
upsample upsample upsample
sigmoid
classifier
interpolation
Note: inverting the order of upsampling and convolutions (that calculate
the K × K grids) and computing them separately for each of the 3
combined layers allows to reduce computational cost
49
Bounding box refining
• A special technique is used to improve the box selection, called
rescoring
50
SDS results
51
Keypoint prediction results
52
Part labeling results
53
Conclusion
Conclusion
• We have seen how the Inception modules allow to train deeper and
better networks in a computationally efficient manner
• We have then observed how to transform a classification CNN into a
fully convolutional network for pixel-wise classification
• We have learned the hypercolumn technique to combine high and
low level information to improve the accuracy on various fine-grained
localization tasks
55
Thank you for your patience! :)
56
References I
[1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
W. Hubbard, and L. D. Jackel, “Backpropagation applied to
handwritten zip code recognition,” Neural Computation, vol. 1(4),
pp. 541–551, 1989.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proc. IEEE, vol. 86,
pp. 2278–2324, 1998.
[3] A. W. Harley, “An interactive node-link visualization of convolutional
neural networks,” in ISVC, pp. 867–877, 2015.
[4] A. Kurenkov, “A ’brief’ history of neural nets and deep learning, part
4.” http://www.andreykurenkov.com/writing/
a-brief-history-of-neural-nets-and-deep-learning-part-4/.
57
References II
[5] A. Krizhevsky, I. Sutskever, , and G. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in Neural
Information Processing Systems, vol. 25, pp. 1106–1114, 2012.
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
convolutions,” CoRR, vol. abs/1409.4842, 2014.
[7] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,” CoRR,
vol. abs/1512.00567, 2015.
[8] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” CoRR, vol. abs/1605.06211, 2016.
58
References III
[9] B. Hariharan, P. A. Arbel´aez, R. B. Girshick, and J. Malik,
“Hypercolumns for object segmentation and fine-grained
localization,” CoRR, vol. abs/1411.5752, 2014.
59

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

A beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trendsA beginner's guide to Style Transfer and recent trends
A beginner's guide to Style Transfer and recent trends
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Deep learning
Deep learning Deep learning
Deep learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
AlexNet
AlexNetAlexNet
AlexNet
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
Deep learning
Deep learningDeep learning
Deep learning
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 

Semelhante a Modern Convolutional Neural Network techniques for image segmentation

intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
taeseon ryu
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
NoorUlHaq47
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
convolutional_neural_networks.pptx
convolutional_neural_networks.pptxconvolutional_neural_networks.pptx
convolutional_neural_networks.pptx
MsKiranSingh
 

Semelhante a Modern Convolutional Neural Network techniques for image segmentation (20)

PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
GoogLeNet.pptx
GoogLeNet.pptxGoogLeNet.pptx
GoogLeNet.pptx
 
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]Handwritten Digit Recognition and performance of various modelsation[autosaved]
Handwritten Digit Recognition and performance of various modelsation[autosaved]
 
VGG.pptx
VGG.pptxVGG.pptx
VGG.pptx
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
convolutional_neural_networks.pptx
convolutional_neural_networks.pptxconvolutional_neural_networks.pptx
convolutional_neural_networks.pptx
 

Último

原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
yulianti213969
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
a8om7o51
 

Último (20)

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 

Modern Convolutional Neural Network techniques for image segmentation

  • 1. Modern Convolutional Neural Network techniques for image segmentation Deep Learning Journal Club Gioele Ciaparrone Michele Curci November 30, 2016 University of Salerno
  • 2. Index 1. Introduction 2. The Inception architecture 3. Fully convolutional networks 4. Hypercolumns 5. Conclusion 2
  • 4. CNN recap • Sequence of convolutional and pooling layers • Rectifier activation function • Fully connected layers at the end • Softmax function for classification 4
  • 6. Convolution II Valid padding (left) and same padding (right) convolutions 6
  • 7. LeNet-5 (1989-1998) • First CNN (1989) proven to work well, used for handwritten Zip code recognition [1] • Refined through the years until the LeNet-5 version (1998) [2] 7
  • 8. LeNet-5 interactive visualization [3] It’s possible to interact with the network in 3D, manually drawing a digit to be classified, clicking on the neurons to get info about the parameters and the connected units, or rotating and zooming the network: http://scs.ryerson.ca/~aharley/vis/conv/ 8
  • 9. AlexNet (2012) [5] • After a long hiatus in which deep learning was ignored [4], they received attention once again after Alex Krizhevsky overwhelmingly won the ILSVRC in 2012 with AlexNet • Structure very similar to LeNet-5, but with some new key insights: very efficient GPU implementation, ReLU neurons and dropout 9
  • 11. Motivations • Increasing model size tends to improve quality • More computational resources are needed • Computational efficiency and low parameter count are still important • Mobile vision and embedded systems • Big Data 11
  • 12. Going Deeper with Convolutions [6] • The Inception module solves this problem making a better use of the computing resources • Proposed in 2014 by Christian Szegedy and other Google researchers • Used in the GoogLeNet architecture that won both the ILSVRC 2014 classification and detection challanges 12
  • 13. Inception module I • Visual information is processed at various scales and then aggregated • Since pooling operations are beneficial in CNNs, a parallel pooling path has been added • Problems: • 3x3 and 5x5 convolutions can be very expensive on top of a layer with lots of filters • The number of filters substantially increases for each Inception layer added, leading to a computational blow up 13
  • 14. Inception module II • Adding the 1x1 convolutions before the bigger convolutions reduces dimensionality • The same is done after the pooling layer 14
  • 15. GoogLeNet I • GoogLeNet is a particular incarnation of the Inception architecture • 22 convolutional layers (27 including pooling) • 9 Inception modules • 2 auxiliary classifiers to solve the vanishing gradient problem and for regularization • Designed with computational efficiency in mind • Inference can be run on devices with limited computational resources, especially memory • 7 of these networks used in an ensemble for the ILSVRC 2014 classification task 15
  • 18. GoogLeNet - Training • Trained with the DistBelief distributed machine learning system • Asynchronous stochastic gradient descent with 0.9 momentum • Image sampling methods have changed many times before the competition • Converged models were trained on with other options • Models were trained on crops of different size • There isn’t a definitive guidance to the most effective single way to train these networks 18
  • 19. GoogLeNet - ILSVRC 2014 Results Classification (above) and object detection (below) results. 19
  • 20. DeepDream Google’s DeepDream uses a GoogLeNet to produce “machine dreams” 20
  • 21. Inception-v2 and Inception-v3 • The Inception module authors later presented new optimized versions of the architecture, called Inception-v2 and Inception-v3 [7] • They managed to significantly improve GoogLeNet ILSVRC 2014 results • The improvements were based on various key principles: • Avoid representational bottlenecks • Spatial aggregation on lower dimensional embeddings doesn’t usually induce relevant losses in representational power • Balance the width and depth of the network 21
  • 22. Convolution factorization I • Factorizing convolutions allows to reduce the number of parameters while not loosing much expressiveness • For example 5x5 convolutions can be factorized into a pair of 3x3 convolutions • It is also possible to factorize a NxN convolutions into a 1xN and a Nx1 convolutions 22
  • 23. Convolution factorization II The original Inception module (left) and the new factorized module (right). 23
  • 24. Efficient grid size reduction - problem • Suppose we want to pass from a d × d grid with k filters to a d 2 × d 2 grid with 2k filters • We need to compute a stride-1 convolution and then a pooling • Computational cost dominated by convolutions: 2d2 k2 operations • Inverting the order, the number of operations is reduced to 2(d 2 )2 k2 , but we violate the bottleneck principle 24
  • 25. Efficient grid size reduction - solution • The solution is an Inception module with convolution and pooling blocks with stride 2 • Computationally efficient and no representational bottleneck introduced 25
  • 26. The new architecture • Using various modified Inception modules, here is the new Inception-v2 architecture 26
  • 28. Inception-v2: training and observations • The network was trained on the ILSVRC 2012 images using stochastic gradient descent and the TensorFlow library • Experimental testings proved the two auxiliary classifiers to have less impact on the training convergence than expected • In the early training phases, the model performance was not affected by the presence of the auxiliary classifiers: they only improved the performance near the end of training • Removing the lower auxiliary classifier didn’t have any effect • The main classifier performs better if batch normalization or dropout are added to the auxiliary ones • The model was also trained and tested on smaller receptive fields with only a small loss of top-1 accuracy (76.6% for 299x299 RF vs. 75.2% on 79x79 RF). Important for post-classification of detection 28
  • 29. Inception-v2 to Inception-v3 results (single model) • Each row’s Inception-v2 model adds a feature with respect to the previous row’s model • The last line’s model is referred to as the Inception-v3 model 29
  • 30. Inception-v3 vs other models (single and ensemble) Single model results Ensemble results • On the ILSVRC 2012 dataset, there is a significant improvement versus state-of-the-art models, both with a single model and with an ensemble of models • Note that the ensemble errors here are validation errors (except for the one marked with ’*’, that is a test error) 30
  • 32. Semantic segmentation • Image segmentation is the process of partitioning an image in multiple segments (set of pixels or super-pixels) • Semantic segmentation is the partitioning of an image into semantically meaningful parts and to classify each part into one of the pre-determined classes • It’s possible to achieve the same result with pixel-wise classification, i.e. assigning a class to each pixel 32
  • 33. Fully convolutional networks • Shelhamer et al. [8] showed that fully convolutional networks trained pixels-to-pixels exceed the state-of-the-art in semantic segmentation • The fully convolutional networks they proposed take input of arbitrary size and produce same-sized output to make dense predictions 33
  • 34. Convolutionalization of a classic net I • Typical recognition nets (AlexNet, GoogLeNet, etc.) take fixed-sized inputs and produce non-spatial outputs • The fully connected layers have fixed dimensions and drop the spatial coordinates • However we can view these fully connected layers as convolutions that cover their entire input regions 34
  • 35. Convolutionalization of a classic net II • These fully convolutional networks take input of any size and output classifications map • The resulting maps are equivalent to the evaluation of the original network on particular input patches • The new network is more than 5 times faster than the original network both at learning time and at inference time (considering a 10x10 output grid) • Note that the output dimensions are typically reduced by subsampling • So output interpolation is needed to obtain dense predictions • The interpolation is obtained through backwards convolutions 35
  • 36. Backwards strided convolution Upsampling from 3x3 grid to 5x5 36
  • 37. Architecture I • Coarse and local information is fused combining lower and higher layers • 3 network types with different layers fused were tested 37
  • 38. Architecture II • 3 proven classification architectures were transformed to fully convolutional: AlexNet, VGG16 and GoogLeNet • Each net’s final classifier layer is discarded and all the fully connected layers are converted to convolutions • A 1x1 convolution with 21 channels (the number of classes in the PASCAL VOC 2011 dataset) is added to the end, followed by a backwards convolution layer 38
  • 39. Architecture III • The original nets were first pre-trained using image classification • Then they were transformed to fully convolutional for fine tuning using whole images (using SGD with momentum) • The best results were obtained with FCN-VGG16 • Training on whole images proved to be as effective as sampling patches 39
  • 40. Architecture comparison • The first models (FCN-32s) didn’t fuse different layers, but the resulting output is very coarse • They then fused lower layers with the last one (as shown earlier) to obtain better results (mean IU 62.7 for FCN-8s vs. 59.4 for FCN-32s) 40
  • 41. Results comparison I • The model reaches state-of-the-art performance on semantic segmentation • Also the model is much faster at inference time than previous architectures 41
  • 44. Hypercolumns I • The last layer of a CNN captures general features of the image, but is too coarse spatially to allow precise localization • Earlier layers instead may be precise in localization but will not capture semantics • Hariharan et al. [9] presented the hypercolumn concept, which puts togheter the information from both higher and lower layers to obtain better results on 3 fine-grained localization tasks: • Simultaneous detection and segmentation • Keypoint localization • Part labeling 44
  • 45. Hypercolumns II • The hypercolumn corresponding to a given input location is defined as the outputs of all units above that location at all layers of the CNN, stacked into one vector 45
  • 46. Problem setting I • Input: a set of detections (subjected to non-maximum suppression), each with a bounding box, a category label and a score • According to the task we are performing for each detection we want: • segment out the object • segment its parts • predict its keypoints • Whichever the task, the bounding boxes are slightly expanded and a 50x50 heatmap is predicted on each of them 46
  • 47. Problem setting II • The information encoded in each heatmap and the number of heatmaps depend on the chosen task: • For segmentation, the heatmap encodes the probability that a particular location is inside the object • For part labeling a separate heatmap is predicted for each part, where each heatmap is the probability a location belongs to that part • For keypoint localization a separate heatmap is predicted for each keypoint, with each heatmap encoding the probability that the keypoint is at a particular location • The heatmaps are finally resized to the size of the expanded bounding boxes • So all the tasks are solved assigning a probability to each of the 50x50 locations 47
  • 48. Problem setting III • For each of the 50x50 locations and for each category a classifier should be trained • But doing so has 3 problems: • The amount of data that each classifier sees during training is heavily reduced • Training so many classifiers is computationally expensive • While the classifier should vary according to the location, to adjacent pixels should be classified similarly • The solution is to train a coarse K × K (usually K = 5 or K = 10) grid of classifiers and interpolate between them 48
  • 49. Network architecture conv conv conv upsample upsample upsample sigmoid classifier interpolation Note: inverting the order of upsampling and convolutions (that calculate the K × K grids) and computing them separately for each of the 3 combined layers allows to reduce computational cost 49
  • 50. Bounding box refining • A special technique is used to improve the box selection, called rescoring 50
  • 55. Conclusion • We have seen how the Inception modules allow to train deeper and better networks in a computationally efficient manner • We have then observed how to transform a classification CNN into a fully convolutional network for pixel-wise classification • We have learned the hypercolumn technique to combine high and low level information to improve the accuracy on various fine-grained localization tasks 55
  • 56. Thank you for your patience! :) 56
  • 57. References I [1] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1(4), pp. 541–551, 1989. [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, pp. 2278–2324, 1998. [3] A. W. Harley, “An interactive node-link visualization of convolutional neural networks,” in ISVC, pp. 867–877, 2015. [4] A. Kurenkov, “A ’brief’ history of neural nets and deep learning, part 4.” http://www.andreykurenkov.com/writing/ a-brief-history-of-neural-nets-and-deep-learning-part-4/. 57
  • 58. References II [5] A. Krizhevsky, I. Sutskever, , and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1106–1114, 2012. [6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CoRR, vol. abs/1409.4842, 2014. [7] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” CoRR, vol. abs/1512.00567, 2015. [8] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1605.06211, 2016. 58
  • 59. References III [9] B. Hariharan, P. A. Arbel´aez, R. B. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” CoRR, vol. abs/1411.5752, 2014. 59