O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Intelligent Thumbnail Selection
Kamil Sindi, Lead Data Scientist
JW Player
1. Company
a. Open-source video player
b. Hosting platform
c. 5% of global internet video traffic
d. 150+ team
2...
Thumbnails are Important
● Your video's first impression
● Types: Upload, Manual, Auto (default)
● Manual >> Auto in Play ...
What’s a “Good” Thumbnail?
It’s subjective to the viewer!
Common themes:
● Not blurry
● Balanced brightness
● Centered obj...
Manually Creating a Model is Hard
● Which features to extract?
● How to describe those features?
● How to weight features?...
Deep Learning
● Learn features implicitly
● Learn from examples
● Techniques to avoid overfitting
● Success in a lot of ap...
Inception
● Learn multiple models in parallel; concatenate
their outputs (“modules”)
● Factoring convolutions (“towers”): ...
1. Dimensionality reduction: fewer
channels, strides, feature pooling
2. Parameter reduction: faster, less
overfitting
3. ...
InceptionV3 Architecture
https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
Dog (0.80)
Cat ...
Transfer Learning
1,000,000 images, 1,000 categories● Use pre-trained model
○ Cheaper (no GPU required)
○ Faster
○ Prevent...
Fine Tuning + Tips
● Change classification layer +
backprop layers back
● Idea:
Early layers do basic filters; later
layer...
Other Applications of Transfer Learning
Google “Show and Tell”
https://github.com/tensorflow/models/tree/master/im2txt
Ima...
Training: Thesis
Train to differentiate between Manual and Auto
● Manual thumbnails are (usually) better than Auto
● Selec...
Training: Examples
Positive (Manual)
Negative Examples
Negative (Auto)
Video Pre-Filter
Use FFMPEG to select top 100 frame
candidates
Methods:
● Color histogram changes to avoid
dupes
● Coded M...
Motion Vectors
Source: Sintel, Blender Studios
Engineering
Demo: Evaluation Tool
Demo: Examples Original Auto (10th second frame)
Top scored frames
from new model
What’s Next
● Refinements:
○ Fine tuning to earlier layers
○ Other models: ResnetV2, Xception
○ Pre-Filtering: adaptive, h...
Resources
Blog Posts:
● https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html
● https://github...
Próximos SlideShares
Carregando em…5
×

Intelligent Thumbnail Selection

1.322 visualizações

Publicada em

Building a Tensorflow-based model that extracts the "best" frames from a video, which are then used as auto-generated thumbnails and thumbstrips. We used transfer learning on Google's Inceptionv3 model, which was pretrained with ImageNet data and retrained on JW Player's thumbnail library.

Publicada em: Internet
  • Seja o primeiro a comentar

Intelligent Thumbnail Selection

  1. 1. Intelligent Thumbnail Selection Kamil Sindi, Lead Data Scientist
  2. 2. JW Player 1. Company a. Open-source video player b. Hosting platform c. 5% of global internet video traffic d. 150+ team 2. Data Team a. Handling 5MM events per minute b. Storing 1TB+ per day c. Stack: Storm (Trident), Kafka, Luigi, Elasticsearch, Spark, AWS, MySQL Customers
  3. 3. Thumbnails are Important ● Your video's first impression ● Types: Upload, Manual, Auto (default) ● Manual >> Auto in Play Rate ● Current Auto is 10th second frame ● Many big publishers only use Manual ● 90% of Thumbnails are Auto! :-( source: tastingtable.com (2016-10-12)
  4. 4. What’s a “Good” Thumbnail? It’s subjective to the viewer! Common themes: ● Not blurry ● Balanced brightness ● Centered objects ● Large text overlay ● Relevant to subject vs Source: Big Buck Bunny, Blender Studios
  5. 5. Manually Creating a Model is Hard ● Which features to extract? ● How to describe those features? ● How to weight features? ● How to penalize overfitting of models? ● Many techniques: SIFT, SURF, HOG? Need to be an expert in Computer Vision :-( Edge Detection Color Histogram Pixel Segmentation So Many Image Features...
  6. 6. Deep Learning ● Learn features implicitly ● Learn from examples ● Techniques to avoid overfitting ● Success in a lot of applications: ○ Image classification ○ Image captioning ○ Machine translation ○ Speech-to-Text
  7. 7. Inception ● Learn multiple models in parallel; concatenate their outputs (“modules”) ● Factoring convolutions (“towers”): e.g. 1x1 convs followed by 3x3 ● Parameter reduction: GoogleNet (5MM) vs. AlexNet (60MM), VGG (200MM) ● Auxiliary classifiers for regularization ● Residual connections (Inceptionv4) ● Depthwise separable convolutions (Xception) https://www.udacity.com/course/deep-learning--ud730 https://arxiv.org/abs/1409.4842 Source: Rethinking the Inception Architecture for Computer Vision
  8. 8. 1. Dimensionality reduction: fewer channels, strides, feature pooling 2. Parameter reduction: faster, less overfitting 3. “Cheap” nonlinearity: 1x1 + 3x3 is non-lin 4. Cross-channel ⊥ spatial correlations 1x1 Convolutions: what’s the point? 1x1 convolution with strides Pooling with 1x1 convolution Source: http://iamaaditya.github.io/2016/03/one-by-one-convolution/ In Convolutional Nets, there is no such thing as “fully-connected layers”. There are only convolution layers with 1x1 convolution kernels. – Yann LeCun
  9. 9. InceptionV3 Architecture https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html Dog (0.80) Cat (0.05) Rat (0.01) ...
  10. 10. Transfer Learning 1,000,000 images, 1,000 categories● Use pre-trained model ○ Cheaper (no GPU required) ○ Faster ○ Prevents overfitting ● Penultimate (“Bottleneck”) layer contains image’s “essence” (CNN codes); acts as a feature extractor ● Just add a linear classifier (Softmax; lin-SVM) to Bottleneck
  11. 11. Fine Tuning + Tips ● Change classification layer + backprop layers back ● Idea: Early layers do basic filters; later layers more dataset specific ● Generally use a pre-trained model regardless of data size or similarity Data Size (per class) < 500 > 500 > 5,000 Similar to original Too small TL TL + FT earlier layers Not Similar Too small TL on earlier layers TL + FT entire network
  12. 12. Other Applications of Transfer Learning Google “Show and Tell” https://github.com/tensorflow/models/tree/master/im2txt Image Captioning Image Search http://www.slideshare.net/ScottThompson90/applying-transfer-learn ing-in-tensorflow
  13. 13. Training: Thesis Train to differentiate between Manual and Auto ● Manual thumbnails are (usually) better than Auto ● Select Manual with high views and play rate; Auto selection is random but low plays ● We have a lot of examples: 10K+ manual ● We used InceptionV3 pre-trained on ImageNet
  14. 14. Training: Examples Positive (Manual) Negative Examples Negative (Auto)
  15. 15. Video Pre-Filter Use FFMPEG to select top 100 frame candidates Methods: ● Color histogram changes to avoid dupes ● Coded Macroblock information ● Remove “black” frame ● Measure motion vectors
  16. 16. Motion Vectors Source: Sintel, Blender Studios
  17. 17. Engineering
  18. 18. Demo: Evaluation Tool
  19. 19. Demo: Examples Original Auto (10th second frame) Top scored frames from new model
  20. 20. What’s Next ● Refinements: ○ Fine tuning to earlier layers ○ Other models: ResnetV2, Xception ○ Pre-Filtering: adaptive, hardware accel. ● Products: ○ New auto thumbnails ○ Thumbstrips
  21. 21. Resources Blog Posts: ● https://research.googleblog.com/2016/03/train-your-own-image-classifier-with.html ● https://github.com/tensorflow/models/tree/master/inception ● http://iamaaditya.github.io/2016/03/one-by-one-convolution/ ● http://www.slideshare.net/ScottThompson90/applying-transfer-learning-in-tensorflow ● https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html ● http://cs231n.github.io/transfer-learning/ ● https://research.googleblog.com/2015/10/improving-youtube-video-thumbnails-with.html ● https://pseudoprofound.wordpress.com/2016/08/28/notes-on-the-tensorflow-implementation-of-inception-v3/ ● https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html Papers: ● Rethinking the inception architecture for computer vision. https://arxiv.org/abs/1512.00567 ● Xception: Deep Learning with Depthwise Separable Convolutions. https://arxiv.org/abs/1610.02357 ● CNN Features off-the-shelf: an Astounding Baseline for Recognition. https://arxiv.org/abs/1403.6382 ● DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. https://arxiv.org/abs/1310.1531 ● How transferable are features in deep neural networks? https://arxiv.org/abs/1411.1792

×