Strata NYC: Building turn-key recommendations for 5% of internet video

● Open source video player + video
platform
● 5% of all video plays on the web
● Per month:
○ 40Bn plays
○ 100 TB events
● 15K Customers

PLAYER Data
Analytics
The fastest online
video player
(2008)
Data-driven products (e.g.
Recommendations)
(2016)
Dashboards, Audience
Measurement
(2014)
Video Management
and Delivery
(2011)
PLATFORM

Increases views, engagement and ad revenue with minimal effort or
investment by publisher
JW Recommendations

● 20K requests per second
● Support legacy endpoints
○ Non-recommendations playlists
● Business rule features (e.g. sunrise, sunset, geo block)
● Include video metadata in response (conversions, manifest, etc.)
● Pass product “sniff test”
● Rudimentary A/B testing using click-through rates
○ Beat random

Association-based
recommendations
Content-based
recommendations
(& Trending videos)
Title: Top ten Snowboarding
Destinations in Colorado
Description, keywords

● Association → Association Rule Mining
○ Viewers who watched X also watched Y
● Content → BM25 (think tf-idf)
○ Elasticsearch
● Trending
○ Exp. weighted moving avg of plays

Rec 1: “Best hotels in Boulder”
Rec 2: “Amazing 1080”
Rec 3: “Best ski slopes in Colorado”
Rec 4: “Snowboarding is fun!”
Rec 5: “Top Snowboarding schools”
Rec 6: “Kardashian Katastrophe!”
Rec 7: “Cats on Skis”
Top ten Snowboarding
Destinations in Colorado, 2018
Similar titles
Highly
co-watched
Trending

Association Pipeline Content Pipeline

✓ 20K requests per second
✓ Support legacy endpoints
✓ Business rule features (e.g. sunset, sunrise, geo block)
✓ Include video metadata in response (conversions, manifest, etc.)
○ Use log-based architecture to sync from various sources
✓ Pass product “sniff test”
✓ Rudimentary A/B testing
○ Beat random when looking at Overlay Click-Through Rate
○ Bested competitors in customer-led A/B tests

How can we drive more value to customers?
How can we continue to grow competitive advantage?

Click Through Rate Completion Rate
Ad Impressions Viewer Time

Americans spend 2+ hrs on social media
Our publishers are fighting for time
Recommendations can drive viewer
time by either:
● More Time per Session
● More Sessions (higher retention)

● Keep viewers in consistent
variant to measure:
○ Time/session
○ Viewer retention
A/B results (JW model vs random)
● 50% more time per session on recommended content
● 10% higher viewer retention (D1, D7)

We can now run experiments and understanding
impact on viewer time
Hypothesis
“If we boost recently
produced content,
recs will be more
relevant”
Experiment
What happens to
time spent?

Experiment ResultRecommendation Algorithm (hypothesis)
Swap in Word2Vec title similarity instead of tf-idf
Boost recent content
Try trending only
Try different ordering of layers
2 Weeks
3 Weeks
1 Week
2 Weeks

Fast Iteration Cycles
Build
Signals
Training
Data
Model
Model
Output
Predict
Evaluation,
validation
Improve Features,
Model, Data
Run Experiment
Build
Recommendation Algorithm (hypothesis)

● Time spent in a session aggregates behavior over a sequence of
recommendations
○ Predicting that directly is hard
● Pick closely related metric to measure effectiveness of a single
recommendation
○ Time watched, percent watched?
○ Probability of an “engaged watch”

Video 1 Video 2
Pairwise Empirical Engagement Rate
(PEER Score)
PEER Score = Wilson Score ( )
% video 2 watches >= 30 seconds
Metric for List of Recommended Videos V :
nDCG (V), where PEER is relevance metric

● Significant improvement
to time watched
○ 10% - 40% increase
● Improved CTR too

● Algorithm performance
○ Association vs Content
○ Optimal Training Window
● Publishers with viral events that affect results
○ Test results change with such events
● Publisher quirks
○ Player, Recommendations implementation

● Algorithmic Perspective
○ More Context
○ Personalization
○ Progress in deep learning for recs
● Implementation / Maintainability
○ Single Unified Model (for widely varying publishers)
○ Flexible inputs (Anything2Vec)

● Built and A/B tested Tensorflow
model that performs on par with
our current algorithms
● Same context, unpersonalized
● AWS SageMaker used for training
on GPUs, serving model via
Tensorflow Serving
● Trained using triplet loss to learn
video embeddings
Anchor
Positive
Example
Negative
Example
FaceNet: A Unified Embedding for Face Recognition and
Clustering (2015)

● Modeling
○ Score individual videos vs. learn to rank
○ How to choose positive & negative training samples?
○ Relevance metric for hyperparameter tuning
● Architecture
○ API traffic
○ Viewer profile service
○ Tensorflow is free, but scaling it is not

● “Just build” can work great for MVP recommender
● Offline testing critical for algorithmic improvement
● Finding the right offline metric is key

Data Science
Graham Edge
Matthew Yu
Rik Heijdens
Bobby Han
Engineering
Doug Shore
Alex Halter
Linda Cai
Dan Meng
Leo Yu
Franklin Dement

Strata NYC: Building turn-key recommendations for 5% of internet video

Strata NYC: Building turn-key recommendations for 5% of internet video

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Strata NYC: Building turn-key recommendations for 5% of internet video

Semelhante a Strata NYC: Building turn-key recommendations for 5% of internet video (20)

Último

Último (20)

Strata NYC: Building turn-key recommendations for 5% of internet video