1) The document summarizes an adaptive bidding agent for ad auctions that estimates values like conversion rates and user distributions to determine bids.
2) It employs a proportional value-per-click strategy and explores different techniques for estimating values like focused percentages of users and id conversions over time.
3) Experimental results show the agent performing well against other agents, with adaptations like using machine learning to determine bidding parameters based on the competitive environment performing comparably to optimized static parameters.
An Adaptive Proportional Value-per-Click Agent for Ad Auctions
1. An Adaptive Proportional Value-per-
Click Agent for Bidding in Ad
Auctions
Trading Agent Design and Analysis Workshop 2011
Kyriakos C. Chatzidimitriou AUTH/CERTH
Lampros C. Stavrogiannis Univ. of Southampton
Andreas L. Symeonidis AUTH/CERTH
Pericles A. Mitkas AUTH/CERTH
2. Introduction
• Basic idea: working paper of Dr. Yevgeniy
Vorobeychik regarding QuakTAC 2009 entry
• Since this initial work, we have:
– Conducted more Game-Theoretic experiments
– Improved conversions estimation
– Improved user distribution estimation
– Included an adaptive component
• Ended up with (more or less) the same:
“Ultimate Answer to the Ultimate Question of Life, The
Universe, and Everything” TAC Ad Auctions Game
0.3
TADA@IJCAI 2011 Mertacor 2
3. Basic Strategy: VPC
D
q q
bid d 1
a vd 1
^ A
q q q
ˆ
v Pr { conversion | click } E [ revenue | conversion ]
^ ^
C
| focused }( Iˆ d
q q q
Pr { conversion | click } focusedPer centage Pr { conversion 1
)
B
TADA@IJCAI 2011 Mertacor 3
4. A) Expected Revenue
• Solely depends on Manufacturer’s Specialty
(MS)
(USP (3 MSB )) / 3 MS not defined in q
USP (1 MSB ) MS matched in q
USP MS not matched in q
TADA@IJCAI 2011 Mertacor 4
5. B) Focused Percentage
• Monte Carlo Simulations
• First Method (Vorobeychik)
– focusedPercentagequery = conversionsquery /
[clicksquery * Pr(conversionquery )]
– Average over query class (F0,F1,F2)
• Second Method 2011
– Use server source files
– MC states (NS, IS, F0, F1, F2, T) per product (x9)
– focusedPercentagequery = Fiquery / (Fiquery + ISquery)
TADA@IJCAI 2011 Mertacor 5
7. C) Id Estimation
cap
Id 1
g (cd 3
cd 2
cd 1
ˆ
cd ˆ
cd 1
C )
• kNN
– Inspired by periodic conversions behavior
– Time series matching using Euclidean Distance as a
similarity criterion
– k = 5, t = 5, N = 600
• Heuristic Baseline
– Underestimate for bidding higher cd = (cd-1 +cd-2 +cd-3 )/4
• Aggregate
– cd = (kNN+Baseline)/2
– cd+1 = ((kNN+Baseline)/2)/2
TADA@IJCAI 2011 Mertacor 7
9. No ad No
display conversions
Cyclic behavior High
Low bid conversion
prob.
• 5-day long pulses
• Pulse Height & Width related to
factors like user distribution at the
Low
time, competition High
VPC • Large peaks in daily profits come from VPC
“catching the wave”
Low
conversion High bid
prob.
High
Conversions
TADA@IJCAI 2011 Mertacor ranking 9
10. Rest of the strategy
• Budget unconstrained
• Hard-coded ad selection strategy
– F0 => generic
– F2 => if user preference matched => targeted
– F1 => if one of the preferences is matched =>
targeted, else generic
TADA@IJCAI 2011 Mertacor 10
11. Simulation-based Game
Theoretical Analysis
• One-shot Bayesian game
• Myopic linear strategies b = α ∙ vpc -> find
optimal shading, α
• Iterative best response to find a symmetric
Bayes-Nash equilibrium
• Most profitable single deviation from a
homogeneous set of opponents until self-play
is best response -> BNE
TADA@IJCAI 2011 Mertacor 11
12. D) alpha
• Vorobeychik
– “a = 0.2, 0.3 more robust to aggressive
opponents”
– The previous best values found a=0.1, 0.2 (2009)
not profitable in 2010 platform
• We have re-run the algorithm under the 2010
specs
– a=0.3 is the optimal value (1 -> 0.4 -> 0.3)
TADA@IJCAI 2011 Mertacor 12
13. Simulation-based Game
Theoretical Analysis
• Instead of α -> (αF0 ,αF1, αF2) x (αCLOW, αCMED,αCHIGH)
• Start from optimal α = 0.3, explore all possible
deviations for each α, first for query levels then
capacity levels
• 0.3 seems to be optimal in all cases
• Points in between do not yield different results (0.3
still the best)
TADA@IJCAI 2011 Mertacor 13
16. Adaptive component
• Problem Statement
We want to capture the case where, based on the
current environment (competition conditions),
having a different α than 0.3, will yield a competitive
advantage
• GT analysis “a good starting point”
• Model it as an associative k-armed bandit
problem with optimistic initial values and e-
greedy action selection strategy
TADA@IJCAI 2011 Mertacor 16
18. Experiment (1/2)
• Self-play Agent Name Score
– 210 games Mertacor-Std-1 53.042
– All capacities to 450 Mertacor-Std-2 52.763
(MEDIUM) Mertacor-kNN-1 52.673
• The standard agent is Mertacor-kNN-2 52.703
unbeatable since it is created Mertacor-RL-1 52.270
that way Mertacor-RL-2 52.233
Mertacor-Full-1 51.673
Mertacor-Full-2 51.899
TADA@IJCAI 2011 Mertacor 18
19. Experiment (2/2)
• Mix-up things, include more Agent Name Score
agents with different strategies Mertacor-kNN 53.223
– 250 games
– All capacities to 450 (MEDIUM)
Mertacor-Std 52.245
Schlemazl (2010) 51.975
• Better estimation lead to
better performance Mertacor-Full 51.796
Mertacor-RL 51.790
• Adaptiveness is suited for
even more complicated Epflagent (2010) 49.232
environments Tau (2010) 45.987
(capacity and strategy wise) Crocodile (2010) 45.858
TADA@IJCAI 2011 Mertacor 19
20. 2011
Also tested/under development
• Daily Campaign Budget Threshold algorithms
– Estimation
– Simulation
• Particle Filtering for user state estimation
– TacTex
TADA@IJCAI 2011 Mertacor 20
21. Conclusions & Future Work
• α = 0.3 is a very powerful conclusion/hard to
beat
• Better estimates for B) user state and C) Id
could further improve performance
• On-line learning still in very crude form – Not
yet satisfied but seems a reasonable thing to
do
• Competition-wise: fitted-Q learning from data
logs
TADA@IJCAI 2011 Mertacor 21