Marketplace in Motion: Auction Design and Relevance Prediction at Pinterest

Marketplace
in Motion
Roelof van Zwol
roelof@pinterest.com
AdKDD - August 24, 2020

Overview
● Intro to (Ads @) Pinterest
● Ads Delivery Funnel
● Auction Design and Principles
● Predicting Relevance
● Predicting Engagement
● Candidate Retrieval

Pinterest is the visual discovery engine.
Our mission is to bring everyone the
inspiration to create a life they love.

400M+ 200B+
Monthly active users Pins saved by Pinners
85% 50%+
Pinners use the mobile app Pinners live outside the US

Pinners
Boards
Pins
Visually similar

Pinterest Labs
https://labs.pinterest.com/publications/
Graph Convolutional Neural
Networks [1]
User graph + annotation + Visual
Combine
Visual Search [2]

Ads @ Pinterest:
From inspiration
to action

Advertiser Objectives
Advertisers set up a campaign and place bid, budget, targeting related to an
objective.

Ad Formats
Format
Image
Video
Collection
Carousel
Do users care about the format?
● Diﬀerent engagement modes
● Cognitive load

Surfaces
User intent is variable, therefore
relevance baseline for ads is also
variable...
Surface Context
Home feed Mixed
Related Pins Single Pin
Search (Visual) Query

Two-sided Marketplace
● Mission: maximize long term
value for Pinners, Partners, and
Pinterest across the diﬀerent
surfaces, formats and advertising
objectives
● Put Pinners First: Strong focus
on relevance of ads shown to
Pinners

Mission
To provide long term value
for Pinners, Partners and Pinterest
Revenue =
# of users * visits per user * pages per visit * ads per page * ad engagement * cost per engagement
Relationship between: Pinners and Pinterest Partners and Pinterest

Optimization Framework
Eren Manavoglu - AdKDD, 2019 [3]
● Assumption 1: Satisﬁed users will engage more with the product
○ Short term user satisfaction can be a proxy for long-term engagement
● Assumption 2: Satisﬁed advertisers will increase spend
○ Short term advertiser satisfaction can be a proxy for long-term advertiser spend
● Constraint optimization problem based on short term observations
○ Can be solved via Lagrangian Relaxation
maximize Revenue
s.t. User satisfaction ≥ Ku
Advertiser satisfaction ≥ Ka

The Ads Delivery Funnel
Ranking
Auction
Retrieval
Put Pinners First!
And the role of relevance?

Ingredients
Ads Ranking
Ad Allocation
Auction Mechanism
Reserve Pricing
Quality Floors

Generalized Second Price Auction
Per slot auction:
1. Apply quality ﬂoor for each ad candidate
2. Compute utility score for each ad candidate
3. Rank candidates by utility
4. Allocate the winning candidate to ad slot on page
5. Calculate price = max(competing price, reserve_price)

User satisfaction
● With the product
○ Monthly active users, visits per user, time spent, saves, ...
● With ads
○ Ad engagement, Ad relevance, diversity of ads
First and second order eﬀects:
1. Seeing an ad, how many ads
2. Quality of ads
Adallocation
Adsranking
Quality ﬂoor
Put Pinners First!

Ads Ranking
Partner Bid (predicted eCPM, $):
Quality Bid (shadow bid, $):
Utility = Partner Bid ⊙ Quality Bid
Bid, if objective is awareness
Partner Bid = Bid * pctr, if objective is traﬃc
Bid * pcvr, if objective is conversion
Quality Bid = relevance ⨁ engagement ⨁ diversity

Quality Bid
Goal: Capture the (incremental) value of seeing an ad
● Predicted incremental long term value (LTV)...
● Displacement cost: loss in quality when inserting ads
○ Millions of ads competing with Billions of organic pins
Quality Bid = Relevance ⨁ Engagement ⨁ Diversity
● Components in quality bid should not or weakly correlate with action
advertiser is paying for
● Components in quality bid should be complementary

Relevance [4,5]
● Pre-click relevance
● Post-click relevance
Engagement
● Clicks
● Good clicks
● Saves
● Hides
Diversity [6]
● Cost of repeat impressions, deduplication, and ceilings
Relevance ⨁ Engagement ⨁ Diversity
Metric Clicks
Good clicks Strong
Saves Moderate
Hides Negative,
Moderate
Correlation
Metric Relevance
Clicks Weak
Good clicks Weak
Saves Weak
Hides No
Based on <query,pin> pairs in search

AUCTION
RANKING
RETRIEVAL
AD COMPLIANCE
ADVERTISER
QUALITY
Relevance in a Two-sided Marketplace
Integrity
Quality

Pre-click Relevance
Objective
Predict the probability P( R | A, X )
that an ad A is relevant, given a
user request context X
Context
● Surface → Home feed, Search, and
Related Pin
● Ad Format → Regular, Video,
Carousel
● Ad Type → Standard vs shopping

Establishing a Groundtruth?
Relevance
● Explicit feedback
○ Collect editorial judgements for search and
related pins
○ Collect feedback from users through user
surveys
● Leverage implicit feedback?
○ Reduce reliance on editorial data
■ Save $$, address scale
○ clicks, good clicks, hides, saves
○ Odds ratio OR(X,Y)

Relevance Models based on Human Labels
Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model

Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Sampling of queries
● Head vs tail queries → power law distribution
● Strata of interests: country, topical coverage, ad demand coverage
● Seasonality of queries

Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Sampling of queries
Sampling of candidate ads
● Remove auction survivor bias
● Advertiser budget skew → distribution of ads shown
● Seasonality of ads

Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Sampling of queries
Sampling of candidate ads
● Remove auction survivor bias
● Advertiser budget skew → distribution of ads shown
● Seasonality of ads
How to “Incrementally” update models and metrics?
● Model reﬁnement, measuring relevance in experiments, and overall user satisfaction

Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Editors
● In-house vs Mechanical Turk
Training
● Instructions, tooling to assure certain quality baseline

Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Editors
● In-house vs Mechanical Turk
Training
● Instructions, tooling to assure certain quality baseline
Rating scheme
● 5-point likert scale vs binary judgements
Interassessor agreement and label cleaning
● Clear understanding of user intent?
○ Low inter-assessor agreement --> wasted $, loss in predictive power?

Scrape
queries and
pool ads
Collect
Editorial
Judgements
Train and
deploy model
Labels
● Align with objective function and intended use cases? Trimmer vs ranking in utility
● Weighting to reflect head vs tail queries?
● Class in-balance across queries
Model health
● Continuous model training and deployment
Evaluating relevance
● Offline vs online
● First page precision vs nDCG@K
● How and where: measuring relevance in different parts of the funnel [5]

Evolution of Engagement Model Architectures
● Hybrid model - GDBT + LR/FTRL
○ Practical lessons from predicting clicks on ads at Facebook [7]
○ Ad Click Prediction: a View from the Trenches [8]
● Wide and Deep
○ Wide & Deep Learning for Recommender Systems [9]
○ Deep CTR prediction for display advertising [10]
○ Starting point for this talk: “simple” DNN / LR
● “Deep and Light” [11]
○ AutoML
○ Multi-tower
○ Multi-task
○ Platt Scaling calibration

Managing Complexity
Surface Ad Product Ad Format
Engagement
Type
Ad Type
Home Feed
Search
Related Pins
Awareness
Traffic
(CPC / autobid)
Conversion
(CPA / autobid)
Catalog Sales
Video View
Image
Video
Collection
Click
Good click
Video View
Save
Hide
Add to cart
Checkout
...
Standard
Shopping

Motivation for Deep and Light
● Simplify the system
○ ~60 models → 3x2 models
● Improve performance
○ Transfer knowledge across objectives
● Increase dev velocity
○ Feature engineering, automated model deployment, signal quality
● Save infra cost
Organizations grow with the complexity of the systems under their control

AutoML [12,11]
Representation
Layer
Summarization
Layer
Latent Cross
Layer
Fully
Connected
Layers
Output Layer
● Reduce feature engineering eﬀort
● Learning arbitrary functions from
raw signals
● Input signals
○ Continuous
○ OneHot
○ Indexed (MultiHot categorical)
○ Hashed OneHot
○ Hash Indexed
○ Dense (GraphSage [1], PinText [13])
Deep & Cross Network for Ad Click Predictions [12]

AutoML
Representation
Layer
Summarization
Layer
Latent Cross
Layer
Fully
Connected
Layers
Output Layer
● Representation layer
○ Squashing, clipping, hashing
projection, normalization
● Summarization layer
○ Grouping, learn common embedding
(category vector for user and pin)
● Latent cross layer
○ Multiplicative layer, high degree
interactions, force “explicit” feature
crossing
● Fully connected layers
○ Classic deep neural network

Multi-Task [13]
Which objectives can be learned
jointly?
● Clicks
● Good clicks
● Closeups
● Saves
● Hides
● Label Sampling?
● Performance lifts?
Input Layer
Representation Layer
Summarization Layer
Latent Cross Layer
Fully Connected Layers
MLP
Standard Ads
MLP
MLP
Shopping Ads
Tower
MLP MLP MLP
AutoMLMulti
Tower
Multi
Task

Multi-Tower [13]
● When to use multi-tower?
○ Same objective (CTR, Save, …),
diﬀerent signals
○ Sample quantity?
○ On-site vs oﬀ-site signals?
Input Layer
Summarization Layer
Latent Cross Layer
MLP
Standard Ads
MLP
MLP
Shopping Ads
Tower
MLP MLP MLP
AutoMLMulti
Tower
Multi
Task

Oﬄine Evaluation
Shopping
AUC
Standard
AUC
Shopping
Log Loss
Standard
Log Loss
Single-Tower 0.69 0.83 0.31 0.12
Multi-Tower 0.80 0.85 0.28 0.11

● Ads and Calibration
● Metric:
● Deep & Wide observations
○ Very large and sparse feature space
○ Large amount of data to converge
○ Handling transient trends w.r.t. changes in ad inventory and user behavior
○ DNN better able to capture higher order feature interactions
● On calibration of modern neural networks [14]
○ Temperature Scaling → variant of Platt Scaling
∑ click prediction
# of clicks
Calibration =
Calibration Challenges

Lightweight Calibration - Platt Scaling
● Transform output of a classiﬁcation model into a probability distribution
○ Fit logistic regression to classiﬁer’s scores
● Includes small set of signals
○ Contextual - country, time of day, device
○ Ad format - image, video
○ User - language, …
● Lean and dense → fast convergence, hourly update
○ DNN daily update
● Reduction in calibration error of 80%

Negative downsampling to balance labels [7]
P: DNN model prediction, w: down sample ratio, q: calibrated prediction
● In the context of multi-task learning
○ Heads likely to have different rates: CTR > Save > Hides
○ Solution: Downsample for 1 head and rescale multiplier, based on down sample rate w
and relative eRates.
Mis calibration due to label selection bias in experimentation
● New model changes the impression distribution
● Leads to bias in experimental results, when training calibration model with production traffic
● Solution: Train calibration model for each DNN variant based own traffic
Lightweight Calibration - Platt Scaling

Before...
Retrieval Strategies
Ranker1
Ranker2
Ranker...
Ranker...
RankerN
Blender
Ads Inventory12+ high quality rankers:
● Exact match (IR foundation)
● Broad match
● Visual similarity
● GraphSage
● PinText
● …
Naive priority-based blender
Natural progression in
development lifecycle!

Retrieval Optimization Objective?
Maximize recall for:
● Engagement, or
● Relevance, or
● Funnel eﬃciency, or
● Auction survival?
Ranking
Auction
Retrieval

After: Two-Tower DNN [15, 16, 17]
Input Layer
Summarization Layer
Latent Cross Layer
Ad Embedding
Input Layer
Summarization Layer
Latent Cross Layer
User Embedding
User request context Ad context

Classification Regression
Features ● Content features
● User features
● Ad Performance features
● Context features
Labels ● Positives
○ Engaged ads
● Negatives
○ Impressed, no
engagement
○ Dark traﬃc @
ranking, retrieval
stages
Predicted L2 ranking
scores
Loss function Log loss, Softmax loss, ... log(MAE), huber loss, ...
Model Training
Ads Index
Retrieval
Ranking
Auction

Shopping ads
Serving at Scale: Standard vs Shopping Ads
Ad EmbeddingUser Embedding
Input Layer
Summarization Layer
Latent Cross Layer
User context
Cached ad embeddings
ANN: HNSW
Standard ads
Apply targeting
constraints

Funnel Optimization?
Input Layer
Summarization Layer
Latent Cross Layer
Ad Embedding
Input Layer
Summarization Layer
Latent Cross Layer
User Embedding
User context Ad context
Engagement Relevance
● Multi-task learning extensions?
● Engagement vs Relevance?
● Lightweight utility function?
○ Bid aware?
● Closing the feedback loop?

Special Thanks
Ernest Wang, Xiaofang Chen, Ahmed Thabet,
Joshua Cherry, Sayantan Mukhopadhyay,
Holly Capell, Ashim Datta,
Benjamin Weitz, Sindhu Vijaya Raghavan,
Mao Ye, Jiajing Xu, Hari Venkatesan,
Alexandrin Popescul, Ryan Galgon
Entire XFN Ads Quality and Ads Infra teams!

References
[1] R. Ying et al. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. KDD, 2018.
[2] A. Zhai et al. Learning a Uniﬁed Embedding for Visual Search at Pinterest. KDD, 2019.
[3] E. Manavoglu. From the Clouds to the Trenches: Learning to Manage the Marketplace. AdKDD, 2019
[4] M. Roman. Contextual Relevance in Ads Ranking. Pinterest Techblog, 2020.
[5] R. van Zwol. Relevance in a 2-Sided Marketplace. Integrity workshop at WSDM, 2020.
[6] A. Datta, Javier Llaca Ojinaga, Kevin McLaughlin. Optimizing Ads Marketplace Diversity. 2020.
[7] X. He, et al. Practical lessons from predicting clicks on ads at Facebook. AdKDD, 2014.
[8] H.B. McMahan, et al. Ad Click Prediction: a View from the Trenches. KDD, 2013.
[9] H. Cheng. Wide & Deep Learning for Recommender Systems. DLRS, 2016.

References
[10] J. Chen, et al. Deep ctr prediction in display advertising. ACM Multimedia, 2016
[11] E. Wang. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads. Pinterest Techblog,
2020
[12] R. Wang, et al. Deep & Cross Network for Ad Click Predictions. AdKDD 2017
[13] J. Zhuang. PinText: A Multitask Text Embedding System in Pinterest. KDD 2019
[14] R. Caruana. Multitask learning. Machine learning, 1997.
[15] C. Guo, et al. On calibration of modern neural networks. ICML 2017
[16] X. Chen. Deep Neural Network Based Ads Retrieval. Ads Modeling and Marketplace Workshop, 2020
[17] P. Covington, J Adams, E. Sargin. Deep Neural Networks for YouTube Recommendations. RecSys, 2016
[18] D. Dilipkumar, J. Chen. A SplitNet architecture for ad candidate ranking. Twitter Engineering blog, 2019

Marketplace in Motion: Auction Design and Relevance Prediction at Pinterest

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Marketplace in Motion: Auction Design and Relevance Prediction at Pinterest

Semelhante a Marketplace in Motion: Auction Design and Relevance Prediction at Pinterest (20)

Último

Último (20)

Marketplace in Motion: Auction Design and Relevance Prediction at Pinterest