AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
November 2016
MAC205
Deep Learning at Cloud Scale
Improving Video Discoverability by Scaling Up Caffe on AWS
Andres Rodriguez, PhD, Solutions Architect, Intel Corporation
Juan Carlos Riverio, CEO, Vilynx

Content Outline
• Deep learning overview and usages
• Worked example for fine-tuning a NN
• Some theory behind deep learning
• Vilynx – videos discoverability
2

Deep Learning
• A branch of machine learning
• Data is passed through multiple non-linear
transformations
• Goal: Learn the parameters of the transformation that
minimize a cost function
3

Bigger Data Better Hardware Smarter Algorithms
Why Now?
Image: 1000 KB / picture
Audio: 5000 KB / song
Video: 5,000,000 KB / movie
Transistor density doubles
every 18 months
Cost / GB in 1995: $1000.00
Cost / GB in 2015: $0.03
Advances in algorithm
innovation, including neural
networks, leading to better
accuracy in training models
4

Types of Deep Learning
• Supervised learning
• Data -> Labels
• Unsupervised learning
• No labels; Clustering; Reducing dimensionality
• Reinforcement learning
• Reward actions (e.g., robotics)
http://ode.engin.umich.edu/presentations/idetc2014/img/image_feature_learning_clear.png
5

data
output expected
…
0.10 0.15 0.20 …0.05
person cat dog bike
0 1 0 … 0
person cat dog bike
penalty
(error or cost)
…
Forward
Propagation
Back
Propagation
Training
6

data
output expected
…
person cat dog bike
0 1 0 … 0
person cat dog bike
inference
Training
0.10 0.15 0.20 0.05
penalty
(error or cost)
7
…
…
Forward
Propagation
Back
Propagation

Deep Learning Use Cases
• Fraud / face detection
• Gaming, check processing
• Computer server
monitoring
• Financial forecasting and
prediction
• Network intrusion
detection
• Recommender systems
• Personal assistant
• Automatic Speech
recognition
• Natural language
processing
• Image & Video
recognition/tagging
• Targeted Ads
Cloud Service
Providers
Financial
Services
Healthcare
Automotive
8

Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver excellent deep learning environment
Develop deep networks across frameworks
Maximum performance on Intel architecture
EC2
Intel® Math Kernel Library (Intel® MKL)
9

Elastic Compute Cloud (EC2)
C4 Instances
• “Highest performing processors and the lowest price/compute
performance in EC2”1
• Vilynx
• Deep learning for video content extraction
• Supports various companies: CBS, TBS, etc.
•
1https://aws.amazon.com/ec2/instance-types/https://www.stlmag.com/news/st-louis-app-pikazo-will-turn-your-profile-picture/
• Pikazo app
• Transforms photos into artistic render
10

Elastic Compute Cloud (EC2)
C4 Instances
c4.8xlarge On-Demand:
• $1.675/hr
GoogleNet inference:
• batch size 32
• 237 ims/sec = 4.2 ms/im
• 1 million images costs
$1.96
Spot prices are cheaper
OS: Linux version 3.13.0-86-generic (buildd@lgw01-51) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #131-Ubuntu SMP Thu May 12 23:33:13
UTC 2016. MxNet Tip of tree: commit de41c736422d730e7cfad72dd6afc229ce08cf90, Tue Nov 1 11:43:04 2016 +0800. MKL 2017 Gold update 1
11
6.1 2.4 1.2 0.8
679.5
262.5
79.7 73.9
0
200
400
600
800
AlexNet GoogLeNet v1 ResNet-50 GoogLeNet v3
Images/Sec
c4.8xlarge MXNet Inference
No MKL MKL

Intel® Math Kernel Library 2017 (Intel® MKL 2017)
• Optimized for EC2 instances with Intel® Xeon® CPUs
• Optimized for common deep learning operations
• GEMM (useful in RNNs and fully connected layers)
• Convolutions
• Pooling
• ReLU
• Batch normalization
Recurrent NN Convolutional NN
12

Naïve Convolution
https://en.wikipedia.org/wiki/Convolutional_neural_network
13

Cache Friendly Convolution
arxiv.org/pdf/1602.06709v1.pdf
14

Gradient Descent
𝐽 𝒘(0)
=
𝑖=1
𝑁
𝑐𝑜𝑠𝑡(𝒘(0)
, 𝒙𝑖)
𝒘𝒘(0)
15

Gradient Descent
𝐽 𝒘(0)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘𝒘(0)
𝑑𝐽 𝒘(0)
𝑑𝒘
16

Gradient Descent
𝐽 𝒘(0)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘𝒘(0)
𝒘(1)
= 𝒘(0)
−
𝑑𝐽 𝒘(0)
𝑑𝒘
17

Gradient Descent
𝐽 𝒘(0)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘𝒘(0)
𝒘(1)
= 𝒘(0)
− 𝛼
𝑑𝐽 𝒘(0)
𝑑𝒘
learning rate
18

Gradient Descent
𝐽 𝒘(0)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘𝒘(0)
𝒘(1)
= 𝒘(0)
− 𝛼
𝑑𝐽 𝒘(0)
𝑑𝒘
𝒘(1)
too small
19

Gradient Descent
𝐽 𝒘(0)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘𝒘(0)
𝒘(1)
= 𝒘(0)
− 𝛼
𝑑𝐽 𝒘(0)
𝑑𝒘
𝒘(1)
too large
20

Gradient Descent
𝐽 𝒘(0)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘𝒘(0)
𝒘(1)
= 𝒘(0)
− 𝛼
𝑑𝐽 𝒘(0)
𝑑𝒘
𝒘(1)
good enough
21

Gradient Descent
𝐽 𝒘(1)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘𝒘(2)
𝒘(2)
= 𝒘(1)
− 𝛼
𝑑𝐽 𝒘(1)
𝑑𝒘
𝒘(1)
22

Gradient Descent
𝐽 𝒘(2)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘
𝒘(3)
= 𝒘(2)
− 𝛼
𝑑𝐽 𝒘(2)
𝑑𝒘
𝒘(2)
𝒘(3)
23

Gradient Descent
𝐽 𝒘(3)
=
𝑖=1
𝑁
, 𝒙𝑖)
𝒘
𝒘(4)
= 𝒘(3)
− 𝛼
𝑑𝐽 𝒘(3)
𝑑𝒘
𝒘(4)
𝒘(3)
24

Transfer learning via fine-tuning
• First few layers are usually very similar within a domain
• Last layers are task specific
• Take a trained model and fine-tune it for a particular task
http://vision.stanford.edu/Datasets/collage_s.png
https://www.kaggle.com/c/dogs-vs-cats
http://adas.cvc.uab.es/task-cv2016/papers/0026.pdf
25

• Install Intel-Optimized Caffe (or your favorite framework)
• https://software.intel.com/en-us/articles/training-and-deploying-deep-
learning-networks-with-caffe-optimized-for-intel-architecture
• Download a pre-trained model
• http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel
• Modify the training model (next slide)
Fine-tuning steps
26

Fine-tuning: ILSVRC -> DogsVsCats
layer {
name: "data"
type: "Data"
data_param {
source: "ilsvrc12_train_lmdb"
...
}
...
}
...
layer {
name: "fc8"
type: "InnerProduct"
inner_product_param {
num_output: 1000
...
}
}
layer {
name: "data"
type: "Data"
data_param {
source: “dogs_cats_train_lmdb"
...
}
...
}
...
layer {
name: "fc8-ft"
type: "InnerProduct"
inner_product_param {
num_output: 2
...
}
}
>> # From the command line
>> caffe train -solver solver.prototxt -weights trainedModel.caffemodel
27

Fine-tuning guidelines
• Freeze all but the last layer (or more if new dataset is very different)
• lr_mult=0 in local learning rates
• Earlier layer weights won't change very much
• Drop the initial learning rate (in the solver.prototxt) by 10x
Replace 1000 with 2 unit layer
Train the 4096+1 x 2 weights
http://www.mdpi.com/remotesensing/remotesensing-07-14680/article_deploy/html/images/remotesensing-07-14680-g002-1024.png
28

Demo
• Fine-tune trained model for dog vs cats
29

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Juan Carlos Riveiro: CEO and Cofounder
30

How?...
Building the biggest dataset for video deep learning by auto tagging selected video
scenes in real-time and leveraging web and social media to continues update the tags
Hello. We're Vilynx, the video personalization company
We select the relevant contents targeted to individual needs
solving the content discovery problem.
Benefit?..
Increase views, time spent watching videos and in video search.
Markets: Media, Smart Phones, Drones, Security, Robots, Smart Cities.
31

Outstanding Tech Team: Experienced and Very Successful
Juan Carlos Riveiro, CEO
More than 100 patents in Signal Processing, Data
statistics/algorithms and Machine Learning.
Founder and CEO of Gigle Networks (Acquired by
Broadcom),
CTO & VP of R&D at DS2 (Acquired by Marvell).
Elisenda Bou, CTO
PhD from UPC and MIT and expert on Machine Learning
and Complex SW Architectures. Worked on adaptive
satellite control systems and recipient of the 2013 Google
Faculty Research Awards.
José Cordero Rama
MS for Deep Learning at UPC/BSC
Data Scientist at King, Bdigital and Gen-Med
Joan Capdevila, PhD
MS and PhD for Machine Learning
At Georgia Tech and UPC/BSC
Data Scientist at AIS and Accenture
Jordi Pont-Tuset, PhD
PostDoc on Machine Learning at ETH Zurich
PhD on Image Segmentation at UPC
Disney Research
Asier Aduriz
Computer Science and Telecom Engineering
degree at UPC (Top 1% of class)
Engineer at CERN.
Dèlia Fernàndez
MS on Deep Learning at Columbia University
Signal Processing Researcher at Northeastern University
Data Scientist at InnoTech
David Varas, PhD
PhD for Video Object Tracking at UPC
Adjunct Professor on Computer Vision &
Statistical Signal Processing at UPC
32

Vilynx: Indexing Visual Knowledge
8 cameras/car
Smart Cities
Connecting Everything
VR/AR Changing Everything
A camera at every
corner in London
Drones everywhere (Amazon)
How is all this visual content going to be indexed?
Just like the internet before Google
+1000 hours of video uploaded
every minute in internet
33

The Vilynx Knowledge Graph
The average vocabulary of a 5-year
old is 5000 words
• 4800 words/concepts
• 1.8 tags per video
• 8M videos
The average vocabulary of an adult
is 30,000 words
• 2M words/concepts
• 50 tags per video
• 10M videos 34

First Market driven by Video Content Producers
Media companies need content personalization to drive audience
through multiple channels
35

Some Customer Examples:
http://www.cbs.com/shows/the-late-
show-with-stephen-colbert/
https://www.americasgreatestmakers.com/
http://www.vanitatis.elconfidencial.com/
36

Vilynx Products
Inputs:
Outputs:
Applications:
37
Videos Audience Data
Contextual Data:
Social Networks, YouTube, Web
Key 5 sec clips Intelligent Auto Tagging
• Better video
discovery
• Native Ad
integration
• Programmatic
Ad matching
• More video
views and
longer
engagement
times
• VOD & Live
Events
• Drive branding
• Amplification
with keyword
recommendation
• Drive Click
through rates
• Better user
experience
Video Thumbnails Social Sharing Recommendations Video Search Ad Market

Vilynx | Workflow
Machine Learning or Deep Learning
4
3
12
98% accuracy to find the relevant parts of the video
CTR increase between 50% to 500% (customer validated)
38
1. We ingest customer videos and
the contextual information
around it.
2. We then take cues from around
the Web and social networks.
3. This combined input is fed to the
most advanced convolutional
deep neural network in the
industry.
4. Output are video previews
optimized to engage your
audience and rich metadata that
can further drive your video
content.

 A data training set of video moments that includes:
 10M (and growing) tagged 5 sec video moments,
ImageNet for video has only 4000 moments
 2M Contextual tags (and growing)
 Continuously updated training set of new tags by
crawling of social media/the web
 Real time unsupervised training of the network to
autonomously learn and identify new patterns
Advancing Deep Learning Networks:
Move from simple classification to indexing all visual content
39

Demo Results
• Fine-tune dogs vs cats classifier results
40

Call to action
• Use Intel Optimized Frameworks for workloads
• https://github.com/intel/caffe
• https://github.com/dmlc/mxnet
• https://github.com/intel/theano
• https://github.com/intel/torch
• other frameworks coming soon…
• Deep learning tutorial
• https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-
optimized-for-intel-architecture
• Distributed training of deep networks on AWS
• https://software.intel.com/en-us/articles/distributed-training-of-deep-networks-on-amazon-web-
services-aws
41

Legal Notices & Disclaimers
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice.
Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at
intel.com, or from the OEM or retailer. No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance
and benchmark results, visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may
affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a
number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the
annual report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current
characterized errata are available on request.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm
whether referenced data are accurate.
Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
42

Thank you!
(huge) contributions from:
Joseph Spisak, Elisenda Bou, Hendrik Van der Meer, Zhenlin Luo, Ravi Panchumarthy,
Ryan Saffores, Niv Sundaram, and many more..

Remember to complete
your evaluations!

AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Semelhante a AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205) (20)

Mais de Amazon Web Services

Mais de Amazon Web Services (20)

Último

Último (20)

AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)