An Adaptive Proportional Value-per-Click Agent for Ad Auctions

An Adaptive Proportional Value-per-
Click Agent for Bidding in Ad
Auctions
Trading Agent Design and Analysis Workshop 2011

Kyriakos C. Chatzidimitriou AUTH/CERTH
Lampros C. Stavrogiannis Univ. of Southampton
Andreas L. Symeonidis AUTH/CERTH
Pericles A. Mitkas AUTH/CERTH

Introduction
• Basic idea: working paper of Dr. Yevgeniy
Vorobeychik regarding QuakTAC 2009 entry
• Since this initial work, we have:
– Conducted more Game-Theoretic experiments
– Improved conversions estimation
– Improved user distribution estimation
– Included an adaptive component
• Ended up with (more or less) the same:
“Ultimate Answer to the Ultimate Question of Life, The
Universe, and Everything” TAC Ad Auctions Game

0.3
TADA@IJCAI 2011 Mertacor 2

Basic Strategy: VPC
D
q q
bid d 1
a vd 1

^ A
q q q
ˆ
v Pr { conversion | click } E [ revenue | conversion ]

^ ^
C
| focused }( Iˆ d
q q q
Pr { conversion | click } focusedPer centage Pr { conversion 1
)
B


A) Expected Revenue
• Solely depends on Manufacturer’s Specialty
(MS)

(USP (3 MSB )) / 3 MS not defined in q
USP (1 MSB ) MS matched in q
USP MS not matched in q


B) Focused Percentage
• Monte Carlo Simulations
• First Method (Vorobeychik)
– focusedPercentagequery = conversionsquery /
[clicksquery * Pr(conversionquery )]
– Average over query class (F0,F1,F2)
• Second Method 2011
– Use server source files
– MC states (NS, IS, F0, F1, F2, T) per product (x9)
– focusedPercentagequery = Fiquery / (Fiquery + ISquery)


Graph for query (pg, null)


C) Id Estimation
cap
Id 1
g (cd 3
cd 2
cd 1
ˆ
cd ˆ
cd 1
C )

• kNN
– Inspired by periodic conversions behavior
– Time series matching using Euclidean Distance as a
similarity criterion
– k = 5, t = 5, N = 600
• Heuristic Baseline
– Underestimate for bidding higher cd = (cd-1 +cd-2 +cd-3 )/4
• Aggregate
– cd = (kNN+Baseline)/2
– cd+1 = ((kNN+Baseline)/2)/2

kNN example


No ad No
display conversions

Cyclic behavior High
Low bid conversion
prob.
• 5-day long pulses
• Pulse Height & Width related to
factors like user distribution at the
Low
time, competition High
VPC • Large peaks in daily profits come from VPC
“catching the wave”

Low
conversion High bid
prob.

High
Conversions
TADA@IJCAI 2011 Mertacor ranking 9

Rest of the strategy
• Budget unconstrained
• Hard-coded ad selection strategy
– F0 => generic
– F2 => if user preference matched => targeted
– F1 => if one of the preferences is matched =>
targeted, else generic


Simulation-based Game
Theoretical Analysis
• One-shot Bayesian game
• Myopic linear strategies b = α ∙ vpc -> find
optimal shading, α
• Iterative best response to find a symmetric
Bayes-Nash equilibrium
• Most profitable single deviation from a
homogeneous set of opponents until self-play
is best response -> BNE


D) alpha
• Vorobeychik
– “a = 0.2, 0.3 more robust to aggressive
opponents”
– The previous best values found a=0.1, 0.2 (2009)
not profitable in 2010 platform
• We have re-run the algorithm under the 2010
specs
– a=0.3 is the optimal value (1 -> 0.4 -> 0.3)


• Instead of α -> (αF0 ,αF1, αF2) x (αCLOW, αCMED,αCHIGH)

• Start from optimal α = 0.3, explore all possible
deviations for each α, first for query levels then
capacity levels

• 0.3 seems to be optimal in all cases

• Points in between do not yield different results (0.3
still the best)

Adaptive component
• Problem Statement
We want to capture the case where, based on the
current environment (competition conditions),
having a different α than 0.3, will yield a competitive
advantage
• GT analysis “a good starting point”
• Model it as an associative k-armed bandit
problem with optimistic initial values and e-
greedy action selection strategy

State, Action, Reward
• State
– Quantized VPC (x11)
– Capacity (x3)
– Query Type (x3)
– Manufacturer Specialty Bonus (x2)
– Component Specialty Bonus (x2)
• a = {0.28, 0.29, 0.3, 0.31, 0.32}
• r = daily profits


Experiment (1/2)
• Self-play Agent Name Score
– 210 games Mertacor-Std-1 53.042
– All capacities to 450 Mertacor-Std-2 52.763
(MEDIUM) Mertacor-kNN-1 52.673
• The standard agent is Mertacor-kNN-2 52.703
unbeatable since it is created Mertacor-RL-1 52.270
that way Mertacor-RL-2 52.233
Mertacor-Full-1 51.673
Mertacor-Full-2 51.899


Experiment (2/2)
• Mix-up things, include more Agent Name Score
agents with different strategies Mertacor-kNN 53.223
– 250 games
– All capacities to 450 (MEDIUM)
Mertacor-Std 52.245
Schlemazl (2010) 51.975
• Better estimation lead to
better performance Mertacor-Full 51.796
Mertacor-RL 51.790
• Adaptiveness is suited for
even more complicated Epflagent (2010) 49.232
environments Tau (2010) 45.987
(capacity and strategy wise) Crocodile (2010) 45.858


2011
Also tested/under development
• Daily Campaign Budget Threshold algorithms
– Estimation
– Simulation
• Particle Filtering for user state estimation
– TacTex


Conclusions & Future Work
• α = 0.3 is a very powerful conclusion/hard to
beat
• Better estimates for B) user state and C) Id
could further improve performance
• On-line learning still in very crude form – Not
yet satisfied but seems a reasonable thing to
do
• Competition-wise: fitted-Q learning from data
logs

Thank you for your attention

Questions?

An Adaptive Proportional Value-per-Click Agent for Ad Auctions

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a An Adaptive Proportional Value-per-Click Agent for Ad Auctions

Semelhante a An Adaptive Proportional Value-per-Click Agent for Ad Auctions (12)

Mais de Kyriakos Chatzidimitriou

Mais de Kyriakos Chatzidimitriou (6)

Último

Último (11)

An Adaptive Proportional Value-per-Click Agent for Ad Auctions