Nir Yungster and Kamil Sindi explain how the company is systematically improving recommendations model performance while navigating the many engineering challenges and unique needs of the diverse publishers it serves.
4. ● Open source video player + video
platform
● 5% of all video plays on the web
● Per month:
○ 40Bn plays
○ 100 TB events
● 15K Customers
5. PLAYER Data
Analytics
The fastest online
video player
(2008)
Data-driven products (e.g.
Recommendations)
(2016)
Dashboards, Audience
Measurement
(2014)
Video Management
and Delivery
(2011)
PLATFORM
8. ● 20K requests per second
● Support legacy endpoints
○ Non-recommendations playlists
● Business rule features (e.g. sunrise, sunset, geo block)
● Include video metadata in response (conversions, manifest, etc.)
● Pass product “sniff test”
● Rudimentary A/B testing using click-through rates
○ Beat random
10. ● Association → Association Rule Mining
○ Viewers who watched X also watched Y
● Content → BM25 (think tf-idf)
○ Elasticsearch
● Trending
○ Exp. weighted moving avg of plays
11. Rec 1: “Best hotels in Boulder”
Rec 2: “Amazing 1080”
Rec 3: “Best ski slopes in Colorado”
Rec 4: “Snowboarding is fun!”
Rec 5: “Top Snowboarding schools”
Rec 6: “Kardashian Katastrophe!”
Rec 7: “Cats on Skis”
Top ten Snowboarding
Destinations in Colorado, 2018
Similar titles
Highly
co-watched
Trending
13. ✓ 20K requests per second
✓ Support legacy endpoints
✓ Business rule features (e.g. sunset, sunrise, geo block)
✓ Include video metadata in response (conversions, manifest, etc.)
○ Use log-based architecture to sync from various sources
✓ Pass product “sniff test”
✓ Rudimentary A/B testing
○ Beat random when looking at Overlay Click-Through Rate
○ Bested competitors in customer-led A/B tests
14. How can we drive more value to customers?
How can we continue to grow competitive advantage?
17. Americans spend 2+ hrs on social media
Our publishers are fighting for time
Recommendations can drive viewer
time by either:
● More Time per Session
● More Sessions (higher retention)
18. ● Keep viewers in consistent
variant to measure:
○ Time/session
○ Viewer retention
A/B results (JW model vs random)
● 50% more time per session on recommended content
● 10% higher viewer retention (D1, D7)
19. We can now run experiments and understanding
impact on viewer time
Hypothesis
“If we boost recently
produced content,
recs will be more
relevant”
Experiment
What happens to
time spent?
20. Experiment ResultRecommendation Algorithm (hypothesis)
Swap in Word2Vec title similarity instead of tf-idf
Boost recent content
Try trending only
Try different ordering of layers
2 Weeks
3 Weeks
1 Week
2 Weeks
22. ● Time spent in a session aggregates behavior over a sequence of
recommendations
○ Predicting that directly is hard
● Pick closely related metric to measure effectiveness of a single
recommendation
○ Time watched, percent watched?
○ Probability of an “engaged watch”
23. Video 1 Video 2
Pairwise Empirical Engagement Rate
(PEER Score)
PEER Score = Wilson Score ( )
% video 2 watches >= 30 seconds
Metric for List of Recommended Videos V :
nDCG (V), where PEER is relevance metric
26. ● Algorithm performance
○ Association vs Content
○ Optimal Training Window
● Publishers with viral events that affect results
○ Test results change with such events
● Publisher quirks
○ Player, Recommendations implementation
27.
28. ● Algorithmic Perspective
○ More Context
○ Personalization
○ Progress in deep learning for recs
● Implementation / Maintainability
○ Single Unified Model (for widely varying publishers)
○ Flexible inputs (Anything2Vec)
29. ● Built and A/B tested Tensorflow
model that performs on par with
our current algorithms
● Same context, unpersonalized
● AWS SageMaker used for training
on GPUs, serving model via
Tensorflow Serving
● Trained using triplet loss to learn
video embeddings
Anchor
Positive
Example
Negative
Example
FaceNet: A Unified Embedding for Face Recognition and
Clustering (2015)
30. ● Modeling
○ Score individual videos vs. learn to rank
○ How to choose positive & negative training samples?
○ Relevance metric for hyperparameter tuning
● Architecture
○ API traffic
○ Viewer profile service
○ Tensorflow is free, but scaling it is not
31. ● “Just build” can work great for MVP recommender
● Offline testing critical for algorithmic improvement
● Finding the right offline metric is key
32. Data Science
Graham Edge
Matthew Yu
Rik Heijdens
Bobby Han
Engineering
Doug Shore
Alex Halter
Linda Cai
Dan Meng
Leo Yu
Franklin Dement