This document discusses Pinterest's ads marketplace and optimization strategies. It provides an overview of Pinterest's ads delivery funnel including ranking, auction, and retrieval. It then discusses predicting relevance and engagement through human labels, deep learning models, and multi-task learning. It also covers auction design principles and candidate retrieval using a two-tower deep learning approach. The goal is to maximize long-term value for users, advertisers, and Pinterest across different surfaces and ad formats.
12. Surfaces
User intent is variable, therefore
relevance baseline for ads is also
variable...
Surface Context
Home feed Mixed
Related Pins Single Pin
Search (Visual) Query
13. Two-sided Marketplace
● Mission: maximize long term
value for Pinners, Partners, and
Pinterest across the different
surfaces, formats and advertising
objectives
● Put Pinners First: Strong focus
on relevance of ads shown to
Pinners
14. Mission
To provide long term value
for Pinners, Partners and Pinterest
Revenue =
# of users * visits per user * pages per visit * ads per page * ad engagement * cost per engagement
Relationship between: Pinners and Pinterest Partners and Pinterest
15. Optimization Framework
Eren Manavoglu - AdKDD, 2019 [3]
● Assumption 1: Satisfied users will engage more with the product
○ Short term user satisfaction can be a proxy for long-term engagement
● Assumption 2: Satisfied advertisers will increase spend
○ Short term advertiser satisfaction can be a proxy for long-term advertiser spend
● Constraint optimization problem based on short term observations
○ Can be solved via Lagrangian Relaxation
maximize Revenue
s.t. User satisfaction ≥ Ku
Advertiser satisfaction ≥ Ka
16. The Ads Delivery Funnel
Ranking
Auction
Retrieval
Put Pinners First!
And the role of relevance?
19. Generalized Second Price Auction
Per slot auction:
1. Apply quality floor for each ad candidate
2. Compute utility score for each ad candidate
3. Rank candidates by utility
4. Allocate the winning candidate to ad slot on page
5. Calculate price = max(competing price, reserve_price)
20. User satisfaction
● With the product
○ Monthly active users, visits per user, time spent, saves, ...
● With ads
○ Ad engagement, Ad relevance, diversity of ads
First and second order effects:
1. Seeing an ad, how many ads
2. Quality of ads
Adallocation
Adsranking
Quality floor
Put Pinners First!
21. Ads Ranking
Partner Bid (predicted eCPM, $):
Quality Bid (shadow bid, $):
Utility = Partner Bid ⊙ Quality Bid
Bid, if objective is awareness
Partner Bid = Bid * pctr, if objective is traffic
Bid * pcvr, if objective is conversion
Quality Bid = relevance ⨁ engagement ⨁ diversity
22. Quality Bid
Goal: Capture the (incremental) value of seeing an ad
● Predicted incremental long term value (LTV)...
● Displacement cost: loss in quality when inserting ads
○ Millions of ads competing with Billions of organic pins
Quality Bid = Relevance ⨁ Engagement ⨁ Diversity
● Components in quality bid should not or weakly correlate with action
advertiser is paying for
● Components in quality bid should be complementary
23. Relevance [4,5]
● Pre-click relevance
● Post-click relevance
Engagement
● Clicks
● Good clicks
● Saves
● Hides
Diversity [6]
● Cost of repeat impressions, deduplication, and ceilings
Relevance ⨁ Engagement ⨁ Diversity
Metric Clicks
Good clicks Strong
Saves Moderate
Hides Negative,
Moderate
Correlation
Metric Relevance
Clicks Weak
Good clicks Weak
Saves Weak
Hides No
Based on <query,pin> pairs in search
26. Pre-click Relevance
Objective
Predict the probability P( R | A, X )
that an ad A is relevant, given a
user request context X
Context
● Surface → Home feed, Search, and
Related Pin
● Ad Format → Regular, Video,
Carousel
● Ad Type → Standard vs shopping
27. Establishing a Groundtruth?
Relevance
● Explicit feedback
○ Collect editorial judgements for search and
related pins
○ Collect feedback from users through user
surveys
● Leverage implicit feedback?
○ Reduce reliance on editorial data
■ Save $$, address scale
○ clicks, good clicks, hides, saves
○ Odds ratio OR(X,Y)
28. Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
29. Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Sampling of queries
● Head vs tail queries → power law distribution
● Strata of interests: country, topical coverage, ad demand coverage
● Seasonality of queries
30. Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Sampling of queries
● Head vs tail queries → power law distribution
● Strata of interests: country, topical coverage, ad demand coverage
● Seasonality of queries
Sampling of candidate ads
● Remove auction survivor bias
● Advertiser budget skew → distribution of ads shown
● Seasonality of ads
31. Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Sampling of queries
● Head vs tail queries → power law distribution
● Strata of interests: country, topical coverage, ad demand coverage
● Seasonality of queries
Sampling of candidate ads
● Remove auction survivor bias
● Advertiser budget skew → distribution of ads shown
● Seasonality of ads
How to “Incrementally” update models and metrics?
● Model refinement, measuring relevance in experiments, and overall user satisfaction
32. Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Editors
● In-house vs Mechanical Turk
Training
● Instructions, tooling to assure certain quality baseline
33. Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Editors
● In-house vs Mechanical Turk
Training
● Instructions, tooling to assure certain quality baseline
Rating scheme
● 5-point likert scale vs binary judgements
Interassessor agreement and label cleaning
● Clear understanding of user intent?
○ Low inter-assessor agreement --> wasted $, loss in predictive power?
34. Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Labels
● Align with objective function and intended use cases? Trimmer vs ranking in utility
● Weighting to reflect head vs tail queries?
● Class in-balance across queries
Model health
● Continuous model training and deployment
Evaluating relevance
● Offline vs online
● First page precision vs nDCG@K
● How and where: measuring relevance in different parts of the funnel [5]
36. Evolution of Engagement Model Architectures
● Hybrid model - GDBT + LR/FTRL
○ Practical lessons from predicting clicks on ads at Facebook [7]
○ Ad Click Prediction: a View from the Trenches [8]
● Wide and Deep
○ Wide & Deep Learning for Recommender Systems [9]
○ Deep CTR prediction for display advertising [10]
○ Starting point for this talk: “simple” DNN / LR
● “Deep and Light” [11]
○ AutoML
○ Multi-tower
○ Multi-task
○ Platt Scaling calibration
37. Managing Complexity
Surface Ad Product Ad Format
Engagement
Type
Ad Type
Home Feed
Search
Related Pins
Awareness
Traffic
(CPC / autobid)
Conversion
(CPA / autobid)
Catalog Sales
Video View
Image
Video
Collection
Click
Good click
Video View
Save
Hide
Add to cart
Checkout
...
Standard
Shopping
38. Motivation for Deep and Light
● Simplify the system
○ ~60 models → 3x2 models
● Improve performance
○ Transfer knowledge across objectives
● Increase dev velocity
○ Feature engineering, automated model deployment, signal quality
● Save infra cost
Organizations grow with the complexity of the systems under their control
44. ● Ads and Calibration
● Metric:
● Deep & Wide observations
○ Very large and sparse feature space
○ Large amount of data to converge
○ Handling transient trends w.r.t. changes in ad inventory and user behavior
○ DNN better able to capture higher order feature interactions
● On calibration of modern neural networks [14]
○ Temperature Scaling → variant of Platt Scaling
∑ click prediction
# of clicks
Calibration =
Calibration Challenges
45. Lightweight Calibration - Platt Scaling
● Transform output of a classification model into a probability distribution
○ Fit logistic regression to classifier’s scores
● Includes small set of signals
○ Contextual - country, time of day, device
○ Ad format - image, video
○ User - language, …
● Lean and dense → fast convergence, hourly update
○ DNN daily update
● Reduction in calibration error of 80%
46. Negative downsampling to balance labels [7]
P: DNN model prediction, w: down sample ratio, q: calibrated prediction
● In the context of multi-task learning
○ Heads likely to have different rates: CTR > Save > Hides
○ Solution: Downsample for 1 head and rescale multiplier, based on down sample rate w
and relative eRates.
Mis calibration due to label selection bias in experimentation
● New model changes the impression distribution
● Leads to bias in experimental results, when training calibration model with production traffic
● Solution: Train calibration model for each DNN variant based own traffic
Lightweight Calibration - Platt Scaling
50. After: Two-Tower DNN [15, 16, 17]
Input Layer
Representation Layer
Summarization Layer
Latent Cross Layer
Fully Connected Layers
Ad Embedding
Input Layer
Representation Layer
Summarization Layer
Latent Cross Layer
Fully Connected Layers
User Embedding
User request context Ad context
51. Classification Regression
Features ● Content features
● User features
● Ad Performance features
● Context features
Labels ● Positives
○ Engaged ads
● Negatives
○ Impressed, no
engagement
○ Dark traffic @
ranking, retrieval
stages
Predicted L2 ranking
scores
Loss function Log loss, Softmax loss, ... log(MAE), huber loss, ...
Model Training
Ads Index
Retrieval
Ranking
Auction
52. Shopping ads
Serving at Scale: Standard vs Shopping Ads
Ad EmbeddingUser Embedding
Input Layer
Representation Layer
Summarization Layer
Latent Cross Layer
Fully Connected Layers
User context
Cached ad embeddings
ANN: HNSW
Standard ads
Apply targeting
constraints
53. Funnel Optimization?
Input Layer
Representation Layer
Summarization Layer
Latent Cross Layer
Fully Connected Layers
Ad Embedding
Input Layer
Representation Layer
Summarization Layer
Latent Cross Layer
Fully Connected Layers
User Embedding
User context Ad context
Engagement Relevance
● Multi-task learning extensions?
● Engagement vs Relevance?
● Lightweight utility function?
○ Bid aware?
● Closing the feedback loop?
54. Special Thanks
Ernest Wang, Xiaofang Chen, Ahmed Thabet,
Joshua Cherry, Sayantan Mukhopadhyay,
Holly Capell, Ashim Datta,
Benjamin Weitz, Sindhu Vijaya Raghavan,
Mao Ye, Jiajing Xu, Hari Venkatesan,
Alexandrin Popescul, Ryan Galgon
Entire XFN Ads Quality and Ads Infra teams!
55. References
[1] R. Ying et al. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD, 2018.
[2] A. Zhai et al. Learning a Unified Embedding for Visual Search at Pinterest. KDD, 2019.
[3] E. Manavoglu. From the Clouds to the Trenches: Learning to Manage the Marketplace. AdKDD, 2019
[4] M. Roman. Contextual Relevance in Ads Ranking. Pinterest Techblog, 2020.
[5] R. van Zwol. Relevance in a 2-Sided Marketplace. Integrity workshop at WSDM, 2020.
[6] A. Datta, Javier Llaca Ojinaga, Kevin McLaughlin. Optimizing Ads Marketplace Diversity. 2020.
[7] X. He, et al. Practical lessons from predicting clicks on ads at Facebook. AdKDD, 2014.
[8] H.B. McMahan, et al. Ad Click Prediction: a View from the Trenches. KDD, 2013.
[9] H. Cheng. Wide & Deep Learning for Recommender Systems. DLRS, 2016.
56. References
[10] J. Chen, et al. Deep ctr prediction in display advertising. ACM Multimedia, 2016
[11] E. Wang. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads. Pinterest Techblog,
2020
[12] R. Wang, et al. Deep & Cross Network for Ad Click Predictions. AdKDD 2017
[13] J. Zhuang. PinText: A Multitask Text Embedding System in Pinterest. KDD 2019
[14] R. Caruana. Multitask learning. Machine learning, 1997.
[15] C. Guo, et al. On calibration of modern neural networks. ICML 2017
[16] X. Chen. Deep Neural Network Based Ads Retrieval. Ads Modeling and Marketplace Workshop, 2020
[17] P. Covington, J Adams, E. Sargin. Deep Neural Networks for YouTube Recommendations. RecSys, 2016
[18] D. Dilipkumar, J. Chen. A SplitNet architecture for ad candidate ranking. Twitter Engineering blog, 2019