Empirical Game-Theoretic Analysis for Practical Strategic Reasoning

M.
Wellman
6
Sep
12

Empirical
Game-‐Theore<c
Analysis

for
Prac<cal
Strategic
Reasoning

Michael
P.
Wellman

University
of
Michigan

Planning
in
Strategic
Environments

•  Planning
problem

–  ﬁnd
agent
behavior
sa<sfying/op<mizing
objec<ves

wrt
environment

–  strives
for
ra<onality

Agent1
Environment
Agent2

•  When
environment
contains
other
agents

–  model
them
as
ra#onal
planners
as
well

–  problem
is
a
game

–  search
now
mul<-‐dimensional,
diﬀerent
(global)

objec<ve

PRIMA-‐12
1

M.
Wellman
6
Sep
12

Real-‐World
Games

complex
dynamics
and
uncertainty

•  rich
strategy
space

–  strategy:
obs*
×
<me


ac<on

•  severely
incomplete
two
approaches

informa<on
1.  analyze
(stylized)

–  interdependent
types
(signals)
approxima<ons

–  info
par<ally
revealed
over
–  one-‐shot,
complete
info…

<me

2.  simula<on-‐based

➙ analy<c
game-‐theore<c
methods

solu<ons
few
and
far
–  search

between
–  empirical:
sta<s<cs,

machine
learning,…

Empirical
Game-‐Theore<c
Analysis

(EGTA)

•  Game
described
procedurally,
no
directly

usable
analy<cal
form

•  Parametrize
strategy
space
based
on
agent

architecture

•  Selec<vely
explore
strategy/proﬁle
space

•  Induce
game
model
(payoﬀ
func<on)
from

simula<on
data
Empirical
game

PRIMA-‐12
2

M.
Wellman
6
Sep
12

EGTA
Process

2.
Es<mate
empirical
game

Payoff
Empirical

Profile
Simulator

Data
Game

Game

Profile
Space
Analysis
(NE)

3.
Solve
empirical
game

Strategy
Set
1.
Parametrize
strategy
space

Simula<on-‐Based
Game
Modeling

…
5,1

0,2

6,8

PRIMA-‐12
3

M.
Wellman
6
Sep
12

TAC
Supply
Chain
Mgmt
Game

suppliers

manufacturers

Pintel

CPU

Manufacturer 1

IMD

component
RFQs

Manufacturer 2

Basus

Motherboard

PC RFQs

Macrostar

supplier
offers

Manufacturer 3

PC bids

customer

PC orders

Memory

Mec Manufacturer 4

component
Queenmax

orders

Manufacturer 5

Watergate

10 component types

Hard Disk

16 PC types
Manufacturer 6

Mintor

220 simulation days
15 seconds per day

Two-‐Strategy
Game
(Unpreempted)

PRIMA-‐12
4

M.
Wellman
6
Sep
12

Two-‐Strategy
Game
(Unpreempted)

Three-‐Strategy
Game:
Devia<ons

PRIMA-‐12
5

M.
Wellman
6
Sep
12

Ranking
Strategies

•  O`en
want
to
know:
which
is
“beber”,

strategy
A
or
strategy
B?

•  Problem:

–  Depends
on
what
other
agents
do

–  Cannot
evaluate
independent
of
strategic
context

•  Which
context?

–  Self-‐play

–  Fixed
propor<ons
of
other
agents

–  Equilibrium
(NE
Regret)

Ranking
Strategies:
TAC/SCM-‐07

SCM-‐07
Tournament
SCM-‐07
EGTA

from
PR
Jordan
PhD
Thesis,
2009

PRIMA-‐12
6

M.
Wellman
6
Sep
12

Strategy
Ranking
(TAC
Travel)

50
Strategies
ranked
with
24
49
42

respect
to
the
final
43
5
47
20
equilibrium
context
31
40
44

17
9
3
25
from
LJ
Schvartzman
PhD
Thesis,
2009
7
18
39
30
16
26
4
32
28
22
8
6
27
29
19
45
46
23
10
35
36
34
37
15
41
21
14
38
33
11
12
13
1
2
−1400 −1200 −1000 −800 −600 −400 −200 0 200
Deviation Gain

Strategy
Ranking
(CDA)

strategy
NE1
regret
NE2
regret
symm.

profile
payoff

GDX
0
1.32
247.98

GD
0.49
3.26
248.57

RB
2.20
8.64
248.08

ZIP
2.90
9.86
247.95

Kaplan
4.56
24.55
2.02

ZIbtq
14.67
17.44
247.45

ZI
16.42
16.82
248.07

PRIMA-‐12
7

M.
Wellman
6
Sep
12

Strategy
Ranking
(SimSPSB)

SC
Local:
Heuris<c
search
for
op<mal
bid
in
response
to

self-‐conﬁrming
prices

SC Local SC BidEval Local BidEval AvgMU
U[6,4] U[5,5] U[5,8] H[5,3] H[5,5]
SCLocalBidSearchS5K6_HB
SCLocalBidSearch_K16Z_HB
SCLocalBidSearch_K16_HB
SCBidXEvaluatorMixA_K16_HB
SCBidEvaluatorMixA_K16_HB
LocalBidSearch_K16_HB
BidXEvaluatorMixA_K16_HB
BidXEvaluatorMix3_K16_HB
BidEvaluatorMixA
BidEvaluatorMix_E8S32K8_HB
AverageMU64Z_HB
AverageMU64_HB

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0 2 4 6 8 10 12 0 2 4 6 8 10

NE Regret

bobom
line:

SC
Local
most
eﬀec<ve
overall,
robust
across
environments

Scaling
#Players

log(|G|)
strategic

complexity

#players

PRIMA-‐12
8

M.
Wellman
6
Sep
12

Improving
Scalability

•  Exploit
locality
of
interac<on

–  graphical
games,
MAIDs,

ac<on-‐graph
games,
…

•  Aggregate
agents

–  hierarchical
reduc<on

(Wellman
et
al.
AAAI-‐05)

–  clustering
(Ficici
et
al.
UAI-‐08)

–  devia<on-‐preserving

reduc<on
(Wiedenbeck
&
W,

2012)

Game
Size

Number
of
proﬁles:

N +|S| 1
N
“proﬁle”

PRIMA-‐12
9

M.
Wellman
6
Sep
12

Player
Reduc<on

⇡

[Wellman,
et
al.
2005]

Hierarchical
Reduc<on

PRIMA-‐12
10

M.
Wellman
6
Sep
12

[Ficici,
et
al.
2008]

Twins
Reduc<on

vs

[Ficici,
et
al.
2008]

Twins
Reduc<on

vs

PRIMA-‐12
11

M.
Wellman
6
Sep
12

Devia<on-‐Preserving
Reduc<on

vs

Reduc<on

vs

PRIMA-‐12
12

M.
Wellman
6
Sep
12

6
HR

24
HR

12p-‐6s
Network
Forma<on
Game
100p-‐2s
Conges<on
Game

5
20

4
DPR

16
DPR

3
12

2
8

TR
TR

1
4

0
0

0
500
1000
1500
0
5
10
15
20

4
12p-‐6s
Local
Eﬀect
Game
4

HR
12p-‐6s
Conges<on
Game

HR

3
3

DPR
DPR

2
2

TR
TR

1
1

0
0

0
500
1000
1500
0
500
1000
1500

PRIMA-‐12
13

M.
Wellman
6
Sep
12

Role-‐Symmetric
Games

32

Hierarchical
Reduc<on

PRIMA-‐12
14

M.
Wellman
6
Sep
12

Reduc<on

vs

Reduc<on

vs

PRIMA-‐12
15

M.
Wellman
6
Sep
12

Itera<ve
EGTA
Process

Game
Model

Induc<on

Sampling
Control
Problem

Problem

Payoff
Empirical

Profile
Simulator

Data
Game

Select

Game

Profile
Space
Analysis
(NE)

More
More

Strategy
Set
Add
Strategy
Strategy
Space

Strategies

Refine?
Samples

N

Strategy
Explora<on

Problem

End

Sampling
Control
Problem

•  Revealed
payoff
model

–  sample
provides
exact
payoff

–  minimum-‐regret-‐first
search
(MRFS)

•  abempts
to
refute
best
current
candidate

•  Noisy
payoff
model

–  sample
drawn
from
payoff
distribu<on

–  informa<on
gain
search
(IGS)

•  sample
profile
maximizing
entropy
difference
wrt

probability
of
being
min-‐regret
profile

PRIMA-‐12
16

M.
Wellman
6
Sep
12

Min-‐Regret-‐First
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 0

start r1 9,5 3,3 2,5 4,8
(arbitrary)
r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 0
(r1,c2) 2
evaluated
r1 9,5 3,3 2,5 4,8
best
r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Select
random
devia<on
from
current
best
proﬁle

PRIMA-‐12
17

M.
Wellman
6
Sep
12

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 0
(r1,c2) 2
evaluated (r2,c1) 3
r1 9,5 3,3 2,5 4,8
best
r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 0
(r1,c2) 2
evaluated (r2,c1) 3
r1 9,5 3,3 2,5 4,8
best (r3,c1) 7

r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

PRIMA-‐12
18

M.
Wellman
6
Sep
12

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 3
(r1,c2) 5
evaluated (r2,c1) 3
r1 9,5 3,3 2,5 4,8
best (r3,c1) 7
(r1,c4) 0
r2 6,4 8,8 3,0 5,3

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 3
(r1,c2) 5
evaluated (r2,c1) 3
r1 9,5 3,3 2,5 4,8
best (r3,c1) 7
(r1,c4) 1
r2 6,4 8,8 3,0 5,3 (r2,c4) 1

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

PRIMA-‐12
19

M.
Wellman
6
Sep
12

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 3
(r1,c2) 5
evaluated (r2,c1) 4
r1 9,5 3,3 2,5 4,8
best (r3,c1) 7
(r1,c4) 1
r2 6,4 8,8 3,0 5,3 (r2,c4) 5
(r2,c2) 0

r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 3
(r1,c2) 5
evaluated (r2,c1) 4
r1 9,5 3,3 2,5 4,8
best (r3,c1) 7
(r1,c4) 1
r2 6,4 8,8 3,0 5,3 (r2,c4) 5
(r2,c2) 0
(r2,c3) 8
r3 2,2 2,1 3,2 4,6

r4 4,4 2,0 2,2 9,3

PRIMA-‐12
20

M.
Wellman
6
Sep
12

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 3
(r1,c2) 5
evaluated (r2,c1) 4
r1 9,5 3,3 2,5 4,8
best (r3,c1) 7
(r1,c4) 1
r2 6,4 8,8 3,0 5,3 (r2,c4) 5
(r2,c2) 0
(r2,c3) 8
r3 2,2 2,1 3,2 4,6 (r3,c2) 6

r4 4,4 2,0 2,2 9,3

Min-‐Regret
Search

Profile ε-bound
c1 c2 c3 c4 (r1,c1) 3
(r1,c2) 5
evaluated (r2,c1) 4
r1 9,5 3,3 2,5 4,8
best (r3,c1) 7
(r1,c4) 1
r2 6,4 8,8 3,0 5,3 (r2,c4) 5
(r2,c2) 0*
(r2,c3) 8
NE r3 2,2 2,1 3,2 4,6 (r3,c2) 6
Confirmed! (r4,c2) 6
r4 4,4 2,0 2,2 9,3

PRIMA-‐12
21

M.
Wellman
6
Sep
12

Finding
Approximate
PSNE

Itera<ve
EGTA
Process

Game
Model

Induc<on

Sampling
Control
Problem

Problem

Payoff
Empirical

Profile
Simulator

Data
Game

Select

Game

Profile
Space
Analysis
(NE)

More
More

Strategy
Set
Add
Strategy
Strategy
Space

Strategies

Refine?
Samples

N

Strategy
Explora<on

Problem

End

PRIMA-‐12
22

M.
Wellman
6
Sep
12

Construct
Empirical
Game

•  Simplest
approach:
direct
es<ma<on

–  employ
control
variates
and
other
variance

reduc<on
techniques

Empirical
Game

(s1,u(s1))

?

...

u(•)

(sL,u(sL))

Payoff
data
from
selected
profiles

Payoff
Func<on
Regression

Si
=
[0,1]

generate
data
(simula<ons)
FPSB2
Example

0
0.5
1

0
3,3
1,4
1,1

0.5
4,1
2,2
4,1

1,1
1,0
3,3

1

learn
regression

solve
learned
game

eq
=
(0.32,0.32)

Vorobeychik
et
al.,
ML
2007

PRIMA-‐12
23

M.
Wellman
6
Sep
12

Generaliza<on
Risk
Approach

•  Model
varia<ons

–  func<onal
forms,
rela<onship
Cross
ValidaHon

structures,
parameters

–  strategy
granularity
Observa#on
Data

•  Approach:

–  Treat
candidate
game
model Fold
1
Fold
2
Fold
3

as
a
predictor
for
payoff
data

–  Adopt
loss
func<on
for

predictor
Training
Valida#on

–  Select
model
candidate

minimizing
expected
loss

Jordan
et
al.,
AAMAS-‐09

Itera<ve
EGTA
Process

Game
Model

Induc<on

Sampling
Control
Problem

Problem

Payoff
Empirical

Profile
Simulator

Data
Game

Select

Game

Profile
Space
Analysis
(NE)

More
More

Strategy
Set
Add
Strategy
Strategy
Space

Strategies

Refine?
Samples

N

Strategy
Explora<on

Problem

End

PRIMA-‐12
24

M.
Wellman
6
Sep
12

Learning
New
Strategies:
EGTA+RL

Payoff
Empirical

Profile
Simulator

Data
Game

Select

Game

Profile
Space
Online

Analysis
(NE)

Learning

New
RL:
Best
response
More
More

Strategy
Set
Strategy
to
NE
Strategies

Refine?
Samples

Add
new
N

Strategy

Y

Y
N
Improve
N

Deviates?
RL
Model?
End

CDA
Learning
Problem
Setup

H1:
Moving
average

H2:
Frequency
weighted
ra<o,
Ac<ons

History
of

recent
threshold=
V

A:
Offset
from
V

trades
H3:
Frequency
weighted
ra<o,

threshold=
A

Quotes
Q1:
Opposite
role

State
Q2:
Same
role

Space

Rewards

T1:
Total

Time

T2:
Since
last
trade

R:
Difference
between

U:
Number
of
trades
le`
unit
valua<on
and
trade

Pending

V:
Value
of
next
unit
to
be
traded
price

Trades

PRIMA-‐12
25

M.
Wellman
6
Sep
12

EGTA/RL
Round
1

Strategies
Payoff
NE
Learning

Strategy
Dev.

Payoff

Kaplan

ZI
248.1
1.000

ZI
L1
268.7

ZIbtq

L1
242.5
1.000

L1

EGTA/RL
Round
2

Strategies
Payoff
NE
Learning

Strategy
Dev.

Payoff

Kaplan

ZI
248.1
1.000

ZI
L1
268.7

ZIbtq

L1
242.5
1.000

L1

ZIP
248.0
1.000

ZIP

L2-‐L8
-‐-‐-‐

GD
248.6
1.000

GD

L9
251.8

0.531

GD
L10
252.1

L9
246.1

0.469

L9

PRIMA-‐12
26

M.
Wellman
6
Sep
12

EGTA/RL
Rounds
3+

Strategies
Payoff
NE
Learning

Strategy
Dev.
Payoff

…
…
…
…
…

L10
248.0
0.191

GD
L11
251.0

0.809

L10

L11
246.2
1.000

L11

GDX
245.8
0.192

GDX
L12
248.3

0.808

L11

L12
245.8
0.049

L11
L13
245.9

0.951

L12

Final
champion

L13
245.6
0.872

L12
L14
245.6

0.128

L13

RB
245.6
0.872

L12

0.128

L13

Strategy
Explora<on
Problem

•  Premise:

–  Limited
ability
to
cover
profile
space

–  Expecta<on
to
reasonably
evaluate
all
considered

strategies

•  Need
deliberate
policy
to
decide
which
strategies

to
introduce

•  RL
for
strategy
explora<on

–  abempt
at
best
response
to
current
equilibrium

–  is
this
a
good
heuris<c
(even
assuming
ideal
BR
calc?)

PRIMA-‐12
27

M.
Wellman
6
Sep
12

Example"
Introduce
strategies
in
order:
A1
A2
A3
A4

A1,
A2,
A3,
A4
A1
1,
1
1,
2
1,
3
1,
4

A2
2,
1
2,
2
2,
3
2,
6

Regret may increase A3
3,
1
3,
2
3,
3
3,
8

over subsequent steps!" A4
4,
1
6,
2
8,
3
4,
4

Strategy
Set
Candidate
Eq.
Regret
wrt
True
Game

{A1}
(A1,A1)
3

{A1,A2}
(A2,A2)
4

{A1,A2,A3}
(A3,A3)
5

{A1,A2,A3,A4}
(A4,A4)
0

FPSB2
Regret
Surface

BR 0.1
E(DEV)
0.14 DEV [MESH] 0.09
0.12
0.08
0.1
0.07
0.08
!(kj)

0.06
0.06
0.05
0.04
0.04
0.02
0.03
0
0 0.02

0.5 0.01
0
kj 0.2
0.4
0.6
1 0.8
1
ki

PRIMA-‐12
28

M.
Wellman
6
Sep
12

Explora<on
Policies

•  RND:
Random
(uniform)
selec<on

•  Devia<on-‐Based

–  DEV:
Uniform
among
strategies
that
deviate
from
current

equilibrium

–  BR:
Best
response
to
current
equilibrium

–  BR+DEV:
Alternate
on
successive
itera<ons

–  ST(t):
So`max
selec<on
among
deviators,
propor<onal
to
gain

•  MEMT:

–  Select
strategy
that
maximizes
the
gain
(regret)
from

devia<ng
to
a
strategy
outside
the
set
from
any

mixture
over
the
set.

CDA↓4"
103 MEMT
DEV
Expected Regret

102 RND
BR
101 ST10
ST1
ST0.1
100

10!1

10!2

10!3

1 3 5 7 9 11 13

Step

PRIMA-‐12
29

M.
Wellman
6
Sep
12

EGTA
Applica<ons

•  Market
games

–  TAC:
Travel,
Supply
Chain,
Ad
Auc<on

–  Canonical
auc<ons:
SimAAs,
CDAs,
SimSPSBs,…

–  Equity
premium
in
ﬁnancial
trading

•  Other
domains

–  Privacy:
informa<on
sharing
abacks

–  Networking:
rou<ng,
wireless
AP
selec<on

–  Credit
network
forma<on

•  Mechanism
design

Conclusion:
EGTA
Methodology

•  Extends
scope
of
GT
to
procedurally
deﬁned

scenarios

•  Embraces
sta<s<cal
underpinnings
of
strategic

reasoning

•  Search
process:

–  GT
for
establishing
salient
strategic
context

–  Strategy
explora<on:

•  e.g.,
RL
to
search
for
best
response
to
that
context

→ Principled
approach
to
evaluate
complex
strategy

spaces

•  Growing
toolbox
of
EGTA
techniques

PRIMA-‐12
30

Empirical Game-Theoretic Analysis for Practical Strategic Reasoning

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Empirical Game-Theoretic Analysis for Practical Strategic Reasoning