Keynote by Michael Wellman at The 15th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA-2012) during the Knowledge Technology Week (KTW2012). September 3 - 7, 2012. Kuching, Sarawak, Malaysia
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Empirical Game-Theoretic Analysis for Practical Strategic Reasoning
1. M.
Wellman
6
Sep
12
Empirical
Game-‐Theore<c
Analysis
for
Prac<cal
Strategic
Reasoning
Michael
P.
Wellman
University
of
Michigan
Planning
in
Strategic
Environments
• Planning
problem
– find
agent
behavior
sa<sfying/op<mizing
objec<ves
wrt
environment
– strives
for
ra<onality
Agent1
Environment
Agent2
• When
environment
contains
other
agents
– model
them
as
ra#onal
planners
as
well
– problem
is
a
game
– search
now
mul<-‐dimensional,
different
(global)
objec<ve
PRIMA-‐12
1
2. M.
Wellman
6
Sep
12
Real-‐World
Games
complex
dynamics
and
uncertainty
• rich
strategy
space
– strategy:
obs*
×
<me
ac<on
• severely
incomplete
two
approaches
informa<on
1. analyze
(stylized)
– interdependent
types
(signals)
approxima<ons
– info
par<ally
revealed
over
– one-‐shot,
complete
info…
<me
2. simula<on-‐based
➙ analy<c
game-‐theore<c
methods
solu<ons
few
and
far
– search
between
– empirical:
sta<s<cs,
machine
learning,…
Empirical
Game-‐Theore<c
Analysis
(EGTA)
• Game
described
procedurally,
no
directly
usable
analy<cal
form
• Parametrize
strategy
space
based
on
agent
architecture
• Selec<vely
explore
strategy/profile
space
• Induce
game
model
(payoff
func<on)
from
simula<on
data
Empirical
game
PRIMA-‐12
2
3. M.
Wellman
6
Sep
12
EGTA
Process
2.
Es<mate
empirical
game
Payoff
Empirical
Profile
Simulator
Data
Game
Game
Profile
Space
Analysis
(NE)
3.
Solve
empirical
game
Strategy
Set
1.
Parametrize
strategy
space
Simula<on-‐Based
Game
Modeling
…
5,1
0,2
6,8
PRIMA-‐12
3
4. M.
Wellman
6
Sep
12
TAC
Supply
Chain
Mgmt
Game
suppliers
manufacturers
Pintel
CPU
Manufacturer 1
IMD
component
RFQs
Manufacturer 2
Basus
Motherboard
PC RFQs
Macrostar
supplier
offers
Manufacturer 3
PC bids
customer
PC orders
Memory
Mec Manufacturer 4
component
Queenmax
orders
Manufacturer 5
Watergate
10 component types
Hard Disk
16 PC types
Manufacturer 6
Mintor
220 simulation days
15 seconds per day
Two-‐Strategy
Game
(Unpreempted)
PRIMA-‐12
4
5. M.
Wellman
6
Sep
12
Two-‐Strategy
Game
(Unpreempted)
Three-‐Strategy
Game:
Devia<ons
PRIMA-‐12
5
6. M.
Wellman
6
Sep
12
Ranking
Strategies
• O`en
want
to
know:
which
is
“beber”,
strategy
A
or
strategy
B?
• Problem:
– Depends
on
what
other
agents
do
– Cannot
evaluate
independent
of
strategic
context
• Which
context?
– Self-‐play
– Fixed
propor<ons
of
other
agents
– Equilibrium
(NE
Regret)
Ranking
Strategies:
TAC/SCM-‐07
SCM-‐07
Tournament
SCM-‐07
EGTA
from
PR
Jordan
PhD
Thesis,
2009
PRIMA-‐12
6
14. M.
Wellman
6
Sep
12
Role-‐Symmetric
Games
32
Hierarchical
Reduc<on
PRIMA-‐12
14
15. M.
Wellman
6
Sep
12
Devia<on-‐Preserving
Reduc<on
vs
Devia<on-‐Preserving
Reduc<on
vs
PRIMA-‐12
15
16. M.
Wellman
6
Sep
12
Itera<ve
EGTA
Process
Game
Model
Induc<on
Sampling
Control
Problem
Problem
Payoff
Empirical
Profile
Simulator
Data
Game
Select
Game
Profile
Space
Analysis
(NE)
More
More
Strategy
Set
Add
Strategy
Strategy
Space
Strategies
Refine?
Samples
N
Strategy
Explora<on
Problem
End
Sampling
Control
Problem
• Revealed
payoff
model
– sample
provides
exact
payoff
– minimum-‐regret-‐first
search
(MRFS)
• abempts
to
refute
best
current
candidate
• Noisy
payoff
model
– sample
drawn
from
payoff
distribu<on
– informa<on
gain
search
(IGS)
• sample
profile
maximizing
entropy
difference
wrt
probability
of
being
min-‐regret
profile
PRIMA-‐12
16
22. M.
Wellman
6
Sep
12
Finding
Approximate
PSNE
Itera<ve
EGTA
Process
Game
Model
Induc<on
Sampling
Control
Problem
Problem
Payoff
Empirical
Profile
Simulator
Data
Game
Select
Game
Profile
Space
Analysis
(NE)
More
More
Strategy
Set
Add
Strategy
Strategy
Space
Strategies
Refine?
Samples
N
Strategy
Explora<on
Problem
End
PRIMA-‐12
22
23. M.
Wellman
6
Sep
12
Construct
Empirical
Game
• Simplest
approach:
direct
es<ma<on
– employ
control
variates
and
other
variance
reduc<on
techniques
Empirical
Game
(s1,u(s1))
?
...
u(•)
(sL,u(sL))
Payoff
data
from
selected
profiles
Payoff
Func<on
Regression
Si
=
[0,1]
generate
data
(simula<ons)
FPSB2
Example
0
0.5
1
0
3,3
1,4
1,1
0.5
4,1
2,2
4,1
1,1
1,0
3,3
1
learn
regression
solve
learned
game
eq
=
(0.32,0.32)
Vorobeychik
et
al.,
ML
2007
PRIMA-‐12
23
24. M.
Wellman
6
Sep
12
Generaliza<on
Risk
Approach
• Model
varia<ons
– func<onal
forms,
rela<onship
Cross
ValidaHon
structures,
parameters
– strategy
granularity
Observa#on
Data
• Approach:
– Treat
candidate
game
model Fold
1
Fold
2
Fold
3
as
a
predictor
for
payoff
data
– Adopt
loss
func<on
for
predictor
Training
Valida#on
– Select
model
candidate
minimizing
expected
loss
Jordan
et
al.,
AAMAS-‐09
Itera<ve
EGTA
Process
Game
Model
Induc<on
Sampling
Control
Problem
Problem
Payoff
Empirical
Profile
Simulator
Data
Game
Select
Game
Profile
Space
Analysis
(NE)
More
More
Strategy
Set
Add
Strategy
Strategy
Space
Strategies
Refine?
Samples
N
Strategy
Explora<on
Problem
End
PRIMA-‐12
24
25. M.
Wellman
6
Sep
12
Learning
New
Strategies:
EGTA+RL
Payoff
Empirical
Profile
Simulator
Data
Game
Select
Game
Profile
Space
Online
Analysis
(NE)
Learning
New
RL:
Best
response
More
More
Strategy
Set
Strategy
to
NE
Strategies
Refine?
Samples
Add
new
N
Strategy
Y
Y
N
Improve
N
Deviates?
RL
Model?
End
CDA
Learning
Problem
Setup
H1:
Moving
average
H2:
Frequency
weighted
ra<o,
Ac<ons
History
of
recent
threshold=
V
A:
Offset
from
V
trades
H3:
Frequency
weighted
ra<o,
threshold=
A
Quotes
Q1:
Opposite
role
State
Q2:
Same
role
Space
Rewards
T1:
Total
Time
T2:
Since
last
trade
R:
Difference
between
U:
Number
of
trades
le`
unit
valua<on
and
trade
Pending
V:
Value
of
next
unit
to
be
traded
price
Trades
PRIMA-‐12
25
26. M.
Wellman
6
Sep
12
EGTA/RL
Round
1
Strategies
Payoff
NE
Learning
Strategy
Dev.
Payoff
Kaplan
ZI
248.1
1.000
ZI
L1
268.7
ZIbtq
L1
242.5
1.000
L1
EGTA/RL
Round
2
Strategies
Payoff
NE
Learning
Strategy
Dev.
Payoff
Kaplan
ZI
248.1
1.000
ZI
L1
268.7
ZIbtq
L1
242.5
1.000
L1
ZIP
248.0
1.000
ZIP
L2-‐L8
-‐-‐-‐
GD
248.6
1.000
GD
L9
251.8
0.531
GD
L10
252.1
L9
246.1
0.469
L9
PRIMA-‐12
26
27. M.
Wellman
6
Sep
12
EGTA/RL
Rounds
3+
Strategies
Payoff
NE
Learning
Strategy
Dev.
Payoff
…
…
…
…
…
L10
248.0
0.191
GD
L11
251.0
0.809
L10
L11
246.2
1.000
L11
GDX
245.8
0.192
GDX
L12
248.3
0.808
L11
L12
245.8
0.049
L11
L13
245.9
0.951
L12
Final
champion
L13
245.6
0.872
L12
L14
245.6
0.128
L13
RB
245.6
0.872
L12
0.128
L13
Strategy
Explora<on
Problem
• Premise:
– Limited
ability
to
cover
profile
space
– Expecta<on
to
reasonably
evaluate
all
considered
strategies
• Need
deliberate
policy
to
decide
which
strategies
to
introduce
• RL
for
strategy
explora<on
– abempt
at
best
response
to
current
equilibrium
– is
this
a
good
heuris<c
(even
assuming
ideal
BR
calc?)
PRIMA-‐12
27
29. M.
Wellman
6
Sep
12
Explora<on
Policies
• RND:
Random
(uniform)
selec<on
• Devia<on-‐Based
– DEV:
Uniform
among
strategies
that
deviate
from
current
equilibrium
– BR:
Best
response
to
current
equilibrium
– BR+DEV:
Alternate
on
successive
itera<ons
– ST(t):
So`max
selec<on
among
deviators,
propor<onal
to
gain
• MEMT:
– Select
strategy
that
maximizes
the
gain
(regret)
from
devia<ng
to
a
strategy
outside
the
set
from
any
mixture
over
the
set.
CDA↓4"
103 MEMT
DEV
Expected Regret
102 RND
BR
101 ST10
ST1
ST0.1
100
10!1
10!2
10!3
1 3 5 7 9 11 13
Step
PRIMA-‐12
29
30. M.
Wellman
6
Sep
12
EGTA
Applica<ons
• Market
games
– TAC:
Travel,
Supply
Chain,
Ad
Auc<on
– Canonical
auc<ons:
SimAAs,
CDAs,
SimSPSBs,…
– Equity
premium
in
financial
trading
• Other
domains
– Privacy:
informa<on
sharing
abacks
– Networking:
rou<ng,
wireless
AP
selec<on
– Credit
network
forma<on
• Mechanism
design
Conclusion:
EGTA
Methodology
• Extends
scope
of
GT
to
procedurally
defined
scenarios
• Embraces
sta<s<cal
underpinnings
of
strategic
reasoning
• Search
process:
– GT
for
establishing
salient
strategic
context
– Strategy
explora<on:
• e.g.,
RL
to
search
for
best
response
to
that
context
→ Principled
approach
to
evaluate
complex
strategy
spaces
• Growing
toolbox
of
EGTA
techniques
PRIMA-‐12
30