Deep Learning is enabling a wide range of computer vision applications from advanced driver assistance systems to sophisticated medical diagnostic devices. However, designing and deploying these applications involve a lot of challenges like handling large datasets, developing optimized models, effectively performing GPU computing and efficiently deploying deep learning models to embedded boards like NVIDIA Jetson. This session illustrates how MATLAB supports all phases of this workflow starting with algorithm design to automatically generating portable and optimized CUDA code helping engineers and scientists address the commonly observed challenges in deep learning workflow
3. 3
Deep Learning Applications in Computer Vision
HIGHWAY_SCENE
Classification
Semantic SegmentationRain Detection and Removal
Human Aware Navigation for Robots
5. 5
End-to-End Application: Lane Detection
Transfer Learning
AlexNet â 1000 class classification
Lane detection
CNN
Post-processing
(find left/right lane
points)
Image
Parabolic lane
coefficients in
world coordinates
Left lane co-efficients
Right lane co-efficients
Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c
MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
6. 6
Deep Learning Challenges
Big Data
âȘ Handling large amounts of data
âȘ Labeling thousands of images & videos
Training and Testing Deep Neural Networks
âȘ Accessing reference models from research
âȘ Understanding network behavior
âȘ Tuning hyperparameters and refining architectures
âȘ Training takes hours-days
Seamless Deployment onto embedded hardware
Real world systems use more than deep
learning
Deep learning frameworks do not
include âclassicalâ computer vision
Not a deep learning expert
7. 7
Access Large Sets of Images
Handle Large Sets of Images
Easily manage large sets of images
- Single line of code to access images
- Operates on disk, database, big-data file system
imageData =
imageDataStore(âvehiclesâ)
Easily manage large sets of images
- Single line of code to access images
- Operates on disk, database, big-data file
system
Organize Images in Folders
(~ 10,000 images , 5 folders)
8. 8
Handle big image collection without big changes
Images in local directory
Images on HDFS
10. 10
Generate Training Data from Labeled Images
Labeled Lane Boundaries in
Image Coordinates
Correspond to coefficients of parabola representing left and right lane (a,b,c).
Ground Truth Exported from
Ground Truth Labeler App
Parabolic Lane Boundary
Modeling
>> findparabolicLaneBoundaries
Lane Boundary Models
in World Coordinates
11. 11
End-to-End Application: Lane Detection
Transfer Learning
AlexNet â 1000 class classification
Lane detection
CNN
Post-processing
(find left/right lane
points)
Image
Parabolic lane
coefficients in
vehicle coordinates
Left lane co-efficients
Right lane co-efficients
Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c
MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
12. 12
Deep Learning Challenges
Big Data
âȘ Handling large amounts of data
âȘ Labeling thousands of images & videos
Training and Testing Deep Neural Networks
âȘ Accessing reference models from research
âȘ Understanding network behavior
âȘ Tuning hyperparameters and refining architectures
âȘ Training takes hours-days
Seamless Deployment onto embedded hardware
Real world systems use more than deep
learning
Deep learning frameworks do not
include âclassicalâ computer vision
Not a deep learning expert
13. 13
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
14. 14
Import Pre-Trained Models and Network Architectures
Pretrained Models
âȘ AlexNet
âȘ VGG-16
âȘ VGG-19
âȘ GoogLeNet
âȘ Resnet50
âȘ InceptionV3
âȘ ResNet - 101
Import Models from Frameworks
âȘ Caffe Model Importer
(including Caffe Model Zoo)
â importCaffeLayers
â importCaffeNetwork
âȘ TensorFlow-Keras Model Importer
â importKerasLayers
â importKerasNetwork
Download from within MATLAB
net = alexnet;
net = vgg16;
net = vgg19;
net = googlenet;
net = resnet50;
net = inceptionv3;
net = resnet101;
15. 15
Visualizations for Understanding Network Behavior
âȘ Custom visualizations
â Example: Class Activation Maps
Filters
âŠ
Activations
Deep Dream
http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf
16. 16
Augment Training Images
imageAugmenter = imageDataAugmenter('RandRotation',[-180 180])
Rotation
Reflection
Scaling
Shearing
Translation
Colour pre-processing
Resize / Random crop / Centre crop
17. 17
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
18. 18
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
19. 19
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
20. 20
Training Deep Neural Networks
trainingOptions
âȘ Plot training metrics
â Training accuracy, smoothed training
accuracy, validation accuracy
â Training loss, smoothed training loss,
and validation loss
âȘ Debug training
â Stop and check current state
â Save / load checkpoint networks
â Custom output function (stopping
condition, visualization, etc.)
âȘ Bayesian optimization for
hyperparameter tuning Learn More
21. 21
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
MATLAB Provides Evaluation Frameworks for Different Classes of Deep Learning Problems
23. 23
Deep learning on CPU, GPU, multi-GPU and clusters
Single CPU Single CPU
Single GPU
Single CPU
Multiple GPUs
On-prem server with
GPUs
Cloud GPUs
(AWS, Azure, etc.)
Deep Learning on
Cloud Whitepaper
24. 24
Training in MATLAB is fast
MATLAB is more than 4x
faster than TensorFlow
AlexNet CNN architecture trained on the ImageNet dataset, using batch size of 32, on a Windows 10 desktop with single
NVIDIA GPU (Titan Xp). TensorFlow version 1.2.0.
25. 25
Deep Learning Challenges
Big Data
âȘ Handling large amounts of data
âȘ Labeling thousands of images & videos
Training and Testing Deep Neural Networks
âȘ Accessing reference models from research
âȘ Understanding network behavior
âȘ Tuning hyperparameters and refining architectures
âȘ Training takes hours-days
Seamless Deployment onto embedded hardware
Real world systems use more than deep
learning
Deep learning frameworks do not
include âclassicalâ computer vision
Not a deep learning expert
26. 26
Algorithm Design to Embedded Deployment Workflow
Conventional Approach
Desktop GPU
High-level language
Deep learning framework
Large, complex software stack
1
Desktop GPU
C++
C/C++
Low-level APIs
Application-specific libraries
2
C++
Embedded GPU
C/C++
Target-optimized libraries
Optimize for memory & speed
3
Challenges
âą Integrating multiple libraries and
packages
âą Verifying and maintaining multiple
implementations
âą Algorithm & vendor lock-in
27. 27
GPU Coder for Deployment: New Product in R2017b
Neural Networks
Deep Learning, machine learning
Image Processing and
Computer Vision
Image filtering, feature detection/extraction
Signal Processing and
Communications
FFT, filtering, cross correlation,
7x faster than state-of-art 700x faster than CPUs
for feature extraction
20x faster than
CPUs for FFTs
GPU Coder
Accelerated implementation of
parallel algorithms on GPUs
28. 28
Algorithm Design to Embedded Deployment Workflow
with GPU Coder
MATLAB algorithm
(functional reference)
Functional test1 Deployment
unit-test
2
Desktop
GPU
C++
Deployment
integration-test
3
Desktop
GPU
C++
Real-time test4
Embedded GPU
.mex .lib Cross-compiled
.lib
Build type
Call CUDA
from MATLAB
directly
Call CUDA from
(C++) hand-
coded main()
Call CUDA from (C++)
hand-coded main().
31. 31
End-to-End Application: Lane Detection
Transfer Learning
AlexNet â 1000 class classification
Lane detection
CNN
Post-processing
(find left/right lane
points)
Image
Parabolic lane
coefficients in
world coordinates
Left lane co-efficients
Right lane co-efficients
Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c
MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
https://tinyurl.com/ybaxnxjg
33. 33
Alexnet Inference on NVIDIA Titan Xp
MATLAB GPU Coder
(R2017b)
TensorFlow (1.2.0)
Caffe2 (0.8.1)
Framespersecond
Batch Size
CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
GPU Pascal Titan Xp
cuDNN v5
Testing platform
MXNet (0.10)
MATLAB (R2017b)
2x 7x5x
34. 34
Alexnet inference on NVIDIA GPUs
0
1
2
3
4
5
6
7
8
9 CPU resident memory
GPU peak memory (nvidia-smi)
Memoryusage(GB)
Batch Size1 16 32 64
CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50 GHz
GPU Tesla K40c
Py-Caffe
GPUCoder
TensorFlow
MATLABw/PCT
C++-Caffe
35. 35
Design Deep Neural Networks in MATLAB and Deploy with GPU
Coder
Design Deep Learning &
Vision Algorithms
Highlights
âȘ Manage large image sets
âȘ Easy access to models like AlexNet, GoogleNet
âȘ Pre-built training frameworks
âȘ Automate ground truth labeling apps
Highlights
âȘ Automate optimized CUDA code
generation with GPU Coder
âȘ Deployed models upto 4.5x faster
than Caffe2 and 7x faster than
Tensor
High Performance Deployment