Deep learning continues to push the state of the art in domains such as video analytics, computer vision, and speech recognition. Deep networks are powered by amazing levels of representational power, feature learning, and abstraction. This approach comes at the cost of a significant increase in required compute power, which makes the AWS cloud an excellent environment for training. Innovators in this space are applying deep learning to a variety of applications. One such innovator, Vilynx, a startup based in Palo Alto, realized that the current pre-roll advertising-based models for mobile video weren’t returning publishers' desired levels of engagement. In this session, we explain the algorithmic challenges of scaling across multiple nodes, and what Intel is doing on AWS to overcome them. We describe the benefits of using AWS CloudFormation to set up a distributed training environment for deep networks. We also showcase Vilynx’s contributions to video discoverability, and explain how Vilynx uses AWS tools to understand video content. This session is sponsored by Intel.
2. Content Outline
• Deep learning overview and usages
• Worked example for fine-tuning a NN
• Some theory behind deep learning
• Vilynx – videos discoverability
2
3. Deep Learning
• A branch of machine learning
• Data is passed through multiple non-linear
transformations
• Goal: Learn the parameters of the transformation that
minimize a cost function
3
4. Bigger Data Better Hardware Smarter Algorithms
Why Now?
Image: 1000 KB / picture
Audio: 5000 KB / song
Video: 5,000,000 KB / movie
Transistor density doubles
every 18 months
Cost / GB in 1995: $1000.00
Cost / GB in 2015: $0.03
Advances in algorithm
innovation, including neural
networks, leading to better
accuracy in training models
4
5. Types of Deep Learning
• Supervised learning
• Data -> Labels
• Unsupervised learning
• No labels; Clustering; Reducing dimensionality
• Reinforcement learning
• Reward actions (e.g., robotics)
http://ode.engin.umich.edu/presentations/idetc2014/img/image_feature_learning_clear.png
5
6. data
output expected
…
0.10 0.15 0.20 …0.05
person cat dog bike
0 1 0 … 0
person cat dog bike
penalty
(error or cost)
…
Forward
Propagation
Back
Propagation
Training
6
7. data
output expected
…
person cat dog bike
0 1 0 … 0
person cat dog bike
inference
Training
0.10 0.15 0.20 0.05
penalty
(error or cost)
7
…
…
Forward
Propagation
Back
Propagation
8. Deep Learning Use Cases
• Fraud / face detection
• Gaming, check processing
• Computer server
monitoring
• Financial forecasting and
prediction
• Network intrusion
detection
• Recommender systems
• Personal assistant
• Automatic Speech
recognition
• Natural language
processing
• Image & Video
recognition/tagging
• Targeted Ads
Cloud Service
Providers
Financial
Services
Healthcare
Automotive
8
9. Optimized Deep Learning Environment
Fuel the development of vertical solutions
Deliver excellent deep learning environment
Develop deep networks across frameworks
Maximum performance on Intel architecture
EC2
Intel® Math Kernel Library (Intel® MKL)
9
10. Elastic Compute Cloud (EC2)
C4 Instances
• “Highest performing processors and the lowest price/compute
performance in EC2”1
• Vilynx
• Deep learning for video content extraction
• Supports various companies: CBS, TBS, etc.
•
1https://aws.amazon.com/ec2/instance-types/https://www.stlmag.com/news/st-louis-app-pikazo-will-turn-your-profile-picture/
• Pikazo app
• Transforms photos into artistic render
10
11. Elastic Compute Cloud (EC2)
C4 Instances
c4.8xlarge On-Demand:
• $1.675/hr
GoogleNet inference:
• batch size 32
• 237 ims/sec = 4.2 ms/im
• 1 million images costs
$1.96
Spot prices are cheaper
OS: Linux version 3.13.0-86-generic (buildd@lgw01-51) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #131-Ubuntu SMP Thu May 12 23:33:13
UTC 2016. MxNet Tip of tree: commit de41c736422d730e7cfad72dd6afc229ce08cf90, Tue Nov 1 11:43:04 2016 +0800. MKL 2017 Gold update 1
11
6.1 2.4 1.2 0.8
679.5
262.5
79.7 73.9
0
200
400
600
800
AlexNet GoogLeNet v1 ResNet-50 GoogLeNet v3
Images/Sec
c4.8xlarge MXNet Inference
No MKL MKL
12. Intel® Math Kernel Library 2017 (Intel® MKL 2017)
• Optimized for EC2 instances with Intel® Xeon® CPUs
• Optimized for common deep learning operations
• GEMM (useful in RNNs and fully connected layers)
• Convolutions
• Pooling
• ReLU
• Batch normalization
Recurrent NN Convolutional NN
12
25. Transfer learning via fine-tuning
• First few layers are usually very similar within a domain
• Last layers are task specific
• Take a trained model and fine-tune it for a particular task
http://vision.stanford.edu/Datasets/collage_s.png
https://www.kaggle.com/c/dogs-vs-cats
http://adas.cvc.uab.es/task-cv2016/papers/0026.pdf
25
26. • Install Intel-Optimized Caffe (or your favorite framework)
• https://software.intel.com/en-us/articles/training-and-deploying-deep-
learning-networks-with-caffe-optimized-for-intel-architecture
• Download a pre-trained model
• http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel
• Modify the training model (next slide)
Fine-tuning steps
26
28. Fine-tuning guidelines
• Freeze all but the last layer (or more if new dataset is very different)
• lr_mult=0 in local learning rates
• Earlier layer weights won't change very much
• Drop the initial learning rate (in the solver.prototxt) by 10x
Replace 1000 with 2 unit layer
Train the 4096+1 x 2 weights
http://www.mdpi.com/remotesensing/remotesensing-07-14680/article_deploy/html/images/remotesensing-07-14680-g002-1024.png
28
29. Demo
• Fine-tune trained model for dog vs cats
http://vision.stanford.edu/Datasets/collage_s.png
https://www.kaggle.com/c/dogs-vs-cats
29
31. How?...
Building the biggest dataset for video deep learning by auto tagging selected video
scenes in real-time and leveraging web and social media to continues update the tags
Hello. We're Vilynx, the video personalization company
We select the relevant contents targeted to individual needs
solving the content discovery problem.
Benefit?..
Increase views, time spent watching videos and in video search.
Markets: Media, Smart Phones, Drones, Security, Robots, Smart Cities.
31
32. Outstanding Tech Team: Experienced and Very Successful
Juan Carlos Riveiro, CEO
More than 100 patents in Signal Processing, Data
statistics/algorithms and Machine Learning.
Founder and CEO of Gigle Networks (Acquired by
Broadcom),
CTO & VP of R&D at DS2 (Acquired by Marvell).
Elisenda Bou, CTO
PhD from UPC and MIT and expert on Machine Learning
and Complex SW Architectures. Worked on adaptive
satellite control systems and recipient of the 2013 Google
Faculty Research Awards.
José Cordero Rama
MS for Deep Learning at UPC/BSC
Data Scientist at King, Bdigital and Gen-Med
Joan Capdevila, PhD
MS and PhD for Machine Learning
At Georgia Tech and UPC/BSC
Data Scientist at AIS and Accenture
Jordi Pont-Tuset, PhD
PostDoc on Machine Learning at ETH Zurich
PhD on Image Segmentation at UPC
Disney Research
Asier Aduriz
Computer Science and Telecom Engineering
degree at UPC (Top 1% of class)
Engineer at CERN.
Dèlia Fernàndez
MS on Deep Learning at Columbia University
Signal Processing Researcher at Northeastern University
Data Scientist at InnoTech
David Varas, PhD
PhD for Video Object Tracking at UPC
Adjunct Professor on Computer Vision &
Statistical Signal Processing at UPC
32
33. Vilynx: Indexing Visual Knowledge
8 cameras/car
Smart Cities
Connecting Everything
VR/AR Changing Everything
A camera at every
corner in London
Drones everywhere (Amazon)
How is all this visual content going to be indexed?
Just like the internet before Google
+1000 hours of video uploaded
every minute in internet
33
34. The Vilynx Knowledge Graph
The average vocabulary of a 5-year
old is 5000 words
• 4800 words/concepts
• 1.8 tags per video
• 8M videos
The average vocabulary of an adult
is 30,000 words
• 2M words/concepts
• 50 tags per video
• 10M videos 34
35. First Market driven by Video Content Producers
Media companies need content personalization to drive audience
through multiple channels
35
37. Vilynx Products
Inputs:
Outputs:
Applications:
37
Videos Audience Data
Contextual Data:
Social Networks, YouTube, Web
Key 5 sec clips Intelligent Auto Tagging
• Better video
discovery
• Native Ad
integration
• Programmatic
Ad matching
• More video
views and
longer
engagement
times
• VOD & Live
Events
• Drive branding
• Amplification
with keyword
recommendation
• Drive Click
through rates
• Better user
experience
Video Thumbnails Social Sharing Recommendations Video Search Ad Market
38. Vilynx | Workflow
Machine Learning or Deep Learning
4
3
12
98% accuracy to find the relevant parts of the video
CTR increase between 50% to 500% (customer validated)
38
1. We ingest customer videos and
the contextual information
around it.
2. We then take cues from around
the Web and social networks.
3. This combined input is fed to the
most advanced convolutional
deep neural network in the
industry.
4. Output are video previews
optimized to engage your
audience and rich metadata that
can further drive your video
content.
39. A data training set of video moments that includes:
10M (and growing) tagged 5 sec video moments,
ImageNet for video has only 4000 moments
2M Contextual tags (and growing)
Continuously updated training set of new tags by
crawling of social media/the web
Real time unsupervised training of the network to
autonomously learn and identify new patterns
Advancing Deep Learning Networks:
Move from simple classification to indexing all visual content
39
41. Call to action
• Use Intel Optimized Frameworks for workloads
• https://github.com/intel/caffe
• https://github.com/dmlc/mxnet
• https://github.com/intel/theano
• https://github.com/intel/torch
• other frameworks coming soon…
• Deep learning tutorial
• https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-
optimized-for-intel-architecture
• Distributed training of deep networks on AWS
• https://software.intel.com/en-us/articles/distributed-training-of-deep-networks-on-amazon-web-
services-aws
41
43. Thank you!
(huge) contributions from:
Joseph Spisak, Elisenda Bou, Hendrik Van der Meer, Zhenlin Luo, Ravi Panchumarthy,
Ryan Saffores, Niv Sundaram, and many more..