SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
Uncertainty Awareness in Integrating
Machine Learning and Game Theory
不確実性を通して見る
機械学習とゲーム理論とのつながり
Rikiya Takahashi
SmartNews, Inc.
rikiya.takahashi@smartnews.com
Mar 5, 2017
Game Theory Workshop 2017
https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-
machine-learning-and-game-theory
About Myself
●
Rikiya TAKAHASHI, Ph.D. (高橋 力矢)
– Engineer in SmartNews, Inc., from 2015 to current
– Research Staff Member in IBM Research – Tokyo, from 2004 to 2015
● Research Interests: machine learning, reinforcement learning,
cognitive science, behavioral economics, complex systems
– Descriptive models about real human behavior
– Prescriptive decision making from descriptive models
– Robust algorithms working under high uncertainty
● Limited sample size, high dimensionality, high noise
Example of Previous Work
● Budget-Constrained Markov Decision Process for
Marketing-Mix Optimization (Takahashi+, 2013 & 2014)
2014/01/01 2014/01/08 … 2014/12/31
EM DM TM EM DM TM … EM DM TM
Segment #1 …
Segment #2 …
… …
Segment #N …
EM: e-mail DM: direct mail TM: tele-marketing
$$
E-mail
TV CM
Purchase
prediction
response
stimulus
Browsing
Revenues in past
16 weeks > $200?
#purchase in past
8 weeks > 2?
#browsing in past
4 weeks > 15?
No Yes
Strategic Segment #1
MS
#1
MS
#2
#EMs in past
2 weeks > 2?
No Yes
MS
#255
MS
#256
#EMs in past
2 weeks > 2?
No Yes
…..............................................................
...
Historical
Data
Consumer Segmentation
Time-Series Predictive Modeling
Optimal Marketing-Mix
& Targeting Rules
Example of Previous Work
● Travel-Time Distribution Prediction on a Large
Road Network (Takahashi+, 2012)
A
B
rN/L
rN/L
rN/L
rN/L
rN/L
rN/L
ψ1
(y)
ψ2
(y)
ψ3
(y)
ψ4
(y)
ψ5
(y)
ψ6
(y)
intersection
link
1
0 0
00.5 00.5
0
0.85
Road Network &
Travel Time Data by Taxi
Predictive Modeling
of Travel Time
Distribution
Route-Choice
Recommendation or
Traffic Simulation
Example of Previous Work
● Bayesian Discrete Choice Modeling for Irrational
Compromise Effect (Takahashi & Morimura, 2015)
– Explained later today
A
0
B
C
D
{A, B, C}
{B, C, D}
The option having
the highest share
inexpensiveness
product quality
Utility Calculator
(UC)
Decision Making
System (DMS)
Vector
of attributes
=
A uiA
=3.26
B uiB
=3.33
C uiC
=2.30
send
samples
utility
A
B
utility sample
utility estimate
C
Agenda
1.Uncertainty Awareness as an Essence in
Data-Oriented Real-World Decision Making
2.From Machine Learning to Game Theory #1 –
Linking Uncertainty with Bounded Rationality
3.From Machine Learning to Game Theory #2—
Open Questions Implied by Numerical Issues
Machine Learning (ML)
● Set of inductive disciplines to design probabilistic
model and estimate its parameters that maximize
out-of-sample predictive accuracy
– Supervised learning: model and fit P(Y|X)
– Unsupervised learning: model and fit P(X)
● What machine learners care about
– Bias-variance trade-off
– Curse of dimensionality
Estimation via Bayes' theorem
● Basis behind today's most ML algorithm
posterior distribution: p(θ∣D)=
p(D∣θ ) p(θ)
∫θ
p(D∣θ ) p(θ)d θ
predictive distribution: p( y∗
∣D)=∫θ
p( y∗
∣θ) p(θ∣D)d θ
posterior mode: ̂θ =argmax
θ
[log p(D∣θ )+log p(θ )]
predictive distribution: p( y∗
∣D)≃p( y∗
∣̂θ )
Maximum A
Posteriori
estimation
Bayesian
estimation
p(θ )
approximation
● Q. Why placing a prior ?
– A1. To quantify uncertainty as posterior
– A2. To avoid overfitting
data:D model parameter:θ
E.g., Gaussian Process Regression (GPR)
● Bayesian Ridge Regression
– Unlike MAP Ridge regression (dark gray), input-
dependent uncertainty (light gray) is quantified.
prior:( f
f ∗)∼N
(0n+1 ,
(K k∗
k∗
T
K (x
∗
, x
∗
)))
where K =(Kij≡K (xi , x j )),
k∗=(K (x1, x
∗
),…, K (xn , x
∗
))
T
,
K (x , x ')=exp(−γ∥x−x'∥
2
)
data likelihood:(y
y
∗)∼N
((f
f
∗),σ
2
In+1
)
predictive distribution: y
∗
∣K , x
∗
, X , y
∼N (k∗
T
(σ
2
I n+K )
−1
y ,
K (x
∗
, x
∗
)−k∗
T
(σ
2
In+K)
−1
k∗+σ
2
)
Gap between Deduction & Induction
Today's AI is integrating both.
Do not divide the work between
inductive & deductive researchers.
Deductive Mind
● Optimize decisions for
a given environment
● Casino owner's mentality
● Game theorist, probabilist,
operations researcher
Inductive Mind
● Estimate the environment
from observations
● Gambler's mentality
● Statistician, machine learner,
econometrician
Induction ↔ Deduction
Dataset
Typical Problem Solving
in the Real World
Estimate of
Environment
Inductive Process
Machine Learning, Statistics,
Econometrics, etc.
Policy
Decisions
Deductive Process
Game theory, mathematical
programming, Markov
Decision Process, etc.
D
̂Θ D
̂π D
Estimate is different from
the true environment .
̂Θ D
Θ
∀i∈{1,…, n} ̂π D , i=arg max
πi
R(πi∣{̂π D , j }j≠i , ̂Θ D )
Induction ↔ Deduction
Dataset
Typical Problem Solving
in the Real World
Estimate of
Environment
Inductive Process
Machine Learning, Statistics,
Econometrics, etc.
Policy
Decisions
Deductive Process
Game theory, mathematical
programming, Markov
Decision Process, etc.
D
̂Θ D
̂π D
∀i∈{1,…, n} ̂π D , i=arg max
πi
R(πi∣{̂π D , j }j≠i , ̂Θ D )
How the estimation-based
policy is different from
the true optimal policy ?
̂π D
π
∗
∀i∈{1,…, n} π i
∗
=arg max
πi
R(πi∣{π j
∗
}j≠i ,Θ )
Induction ↔ Deduction
Dataset
Typical Problem Solving
in the Real World
Estimate of
Environment
Inductive Process
Machine Learning, Statistics,
Econometrics, etc.
Policy
Decisions
Deductive Process
Game theory, mathematical
programming, Markov
Decision Process, etc.
D
̂Θ D
̂π D
State-of-the-art AI
Dataset
By-product
Direct Optimization
Integration of Machine
Learning and Optimization
Algorithms
Policy
Decisions
D
̌Θ D
̌π D
See the Difference
Typical Problem Solving
in the Real World:
Unnecessarily too much effort
in solving each subproblem
Vulnerable to estimation error
State-of-the-art AI
Less effort of needless
intermediate estimation
Robust to estimation error
̌Θ D
̌π D̂π D
̂Θ D
Accurately fitted on minimal
prediction error for dataset D,
while minimizing the error of
this parameter is not the goal.
Exceedingly optimized
given wrong assumption
Fitted but not minimizing the
error for dataset D. Often
less complex than .
Safely optimized with less
reliance on ̌Θ D
̂Θ D
See the Difference
Typical Problem Solving
in the Real World:
State-of-the-art AI
Solve a Hard
Inductive Problem
Solve another Hard
Deductive Problem
Solve an Easier Problem
that Involves both
Induction & Deduction
● Recommendation of simple solving
– Gigerenzer & Taleb, https://www.youtube.com/watch?v=4VSqfRnxvV8
Optimization under Uncertainty
● Interval Estimation
(e.g., Bayesian)
– Quantify uncertainty
– Optimize over all
possible environments
● Minimal Estimation
(e.g., Vapnik)
– Omit intermediate step
– Solve the minimal
optimization problem
● Two principles are effective in practice.
Vapnik's Principle (Vapnik, 1995)
When solving a problem of interest, do not solve a
more general problem as an intermediate step.
—Vladimir N. Vapnik
● E.g., classification or regression : predict Y given X
– #1. Fit P(X,Y) and infer P(Y|X) by Bayes’ theorem
– #2. Only fit P(Y|X)
● #2 is better than #1 because of its less estimation error.
– Better particularly when uncertainty is high: small
sample size, high dimensionality, and/or high noise
Batch Reinforcement Learning
● A good example of involving both inductive and
deductive processes.
● Also a good example of how to avoid
needlessly hard estimation.
● Basis behind the recent success of Deep Q-
Network to play games (Mnih+, 2013 & 2015),
and Alpha-Go (Silver+, 2016)
Markov Decision Process
● Framework for long-term-optimal decision making
– S: set of states, A: set of actions
P(s'|s,a): state-transition probability
r(s,a): immediate reward, : discounting factor
– Optimize policy for maximal cumulative reward
…
State #1
(e.g., Gold
Customer)
State #2
(e.g., Silver
Customer)
State #3
(e.g., Normal
Customer) t=0 t=1 t=2
$
$$
$$$
By Action #1
(e.g., ordinary discount on flight ticket)
…
t=0 t=1 t=2
$$
$
$
By Action #2
(e.g., free business-class upgrade)
γ ∈[0,1]
π (a∣s)
Markov Decision Process
● Easy to solve If the environment is known
– Via dynamic programming or linear programming
when P(s'|s,a) & r(s,a) are given with no uncertainty
– Behave myopically at
● For each state s, choose the action a that maximizes r(s,a).
– At time (t-1), choose the optimal action that maximizes
the immediate reward at time (t-1) plus the expected
reward after time t over the state transition distribution.
● What If the environment is unknown?
t →∞
Types of Reinforcement Learning
● Model-based ↔ Model-free
● On policy ↔ Off policy
● Value iteration ↔ policy search
● Model-based approach
– 1. System identification: estimate the MDP parameters
– 2. Sample multiple MDPs from the interval estimate
– 3. Solve every MDP & take the best action of best MDP
● Optimism in the face of uncertainty
Model-free approach
● Remember: our aim is to get the optimal policy.
No need of estimating environment, in principle.
– Act without fully identifying system: as long as we
choose the optimal action, it turned out right in the end.
● Even when doing estimation, utilize intermediate
statistic less complex than P(s'|s,a) & r(s,a).
Bellman Optimality Equation
● Policy is derived if we have an estimate of Q(s,a).
– Simpler than estimating P(s'|s,a) & r(s,a)
r
Q(s ,a)=E[r(s ,a)]+γ EP (s'∣s,a)
[max
a'
Q(s' ,a' )
]
π (a∣s)=
{1 a=argmax
a'
Q(s ,a' )
0 otherwise
̂Q(s ,a) (si ,ai ,si ' ,ri)i=1
n● Get an estimate from episodes
Fitted Q-Iteration (Ernst+, 2005)
● For k=1,2,... iterate 1) value computation and
2) regression as
∀i∈{1,…, n} vi
(k)
:=ri+γ ̂Qk
(1)
(si ' ,argmax
a'
̂Qk
(0)
(si ' ,a')
)
∀ f ∈{0,1} ̂Qk+1
( f )
:=argmin
Q∈H
[1
2
∑i∈J f
(vi
(k )
−Q(si ,ai))
2
+R(Q)]
1)
2)
– H: hypothesis space of function, Q0
≡ 0, R: regularization term
– Indices 1...n are randomly split into sets J0
and J1
, for avoiding
over-estimation of Q values (Double Q-Learning (Hasselt, 2010)).
● Related with Experience Replay in Deep Q-
Network (Mnih+, 2013 & 2015)
– See (Lange+, 2012) for more details.
Policy Gradient
●
Accurately fit policy   while roughly fit Q(s,a)
– More directness to the final aim
– Applicable for continuous action problem
π θ (a∣s)
∇θ J (θ)⏟
gradient of performance
= Eπ θ
[∇θ logπ θ (a∣s)Q
π
(s ,a)]⏟
expected log-policy times cumulative-reward over s and a
Policy Gradient Theorem (Sutton+, 2000)
● Variations on providing the rough estimate of Q
– REINFORCE (Williams, 1992): reward samples
– Actor-Critic: regression models (e.g., Natural
Gradient (Kakade, 2002), A3C (Mnih+, 2016))
Functional Approximation in Practice
● Concrete functional form of Q(s,a) and/or
– Q should be a universal functional approximator:
class of functions that can approximate any function
if sufficiently many parameters are introduced.
● Examples of universal approximator
Tree Ensembles
Random Forest, Gradient
Boosted Decision Trees
(Deep) Neural
Networks
Mixture of Radial
Basis Functions
(RBFs)
+
π (a∣s)
Functional Approximation in Practice
● Is any univ. approximator OK? – No, unfortunately.
– Universal approximator is merely asymptotically unbiased.
– Better to have
● Low variance in terms of bias-variance trade-off
● Resistance to curse of dimensionality
● One reason of deep learning's success
– Flexibility to represent multi-modal function with less
parameters than nonparametric (RBF or tree) models
– Techniques to stabilize numerical optimization
● AdaGrad or ADAM, dropout, ReLU, batch normalization, etc.
Message
● Uncertainty awareness is essential on data-
oriented decision making.
– No division between induction and deduction
– Removing needless intermediate estimation
– Fitted Q-Iteration as an illustrative example
● Less parameters, less uncertainty
Agenda
1.Uncertainty Awareness as an Essence in
Data-Oriented Real-World Decision Making
2.From Machine Learning to Game Theory #1 –
Linking Uncertainty with Bounded Rationality
3.From Machine Learning to Game Theory #2—
Open Questions Implied by Numerical Issues
Shrinkage Matters in the Real World.
● Q. Why prior helps avoid over-fitting?
– A. shrinkage towards prior mean (e.g., 0 in Ridge reg.)
● Over-optimization ↔ Over-rationalization?
– (e.g., (Takahashi and Morimura, 2015))
0 Coefficient #1
Coefficient #2
Solution of
2-dimensional
OLS &
Ridge regression
Ordinary Least Squares (OLS)
Ridge : closer to prior mean 0 than OLS
Prior mean 0 is independent from training data
Discrete Choice Modelling
Goal: predict prob. of choosing an option from a choice set.
Why solving this problem?
Brand positioning among competitors
Sales promotion (yet involving some abuse)
Game Theory Workshop 2017 Uncertainty Awareness
Random Utility Theory as a Rational Model
Each human is a rational maximizer of random utility.
Theoretical basis behind many statistical marketing models.
Logit models (e.g., (McFadden, 1980; Williams, 1977; McFadden and Train,
2000)), Learning to rank (e.g., (Chapelle and Harchaoui, 2005)), Conjoint
analysis (Green and Srinivasan, 1978), Matrix factorization (e.g., (Lawrence and
Urtasun, 2009)), ...
Game Theory Workshop 2017 Uncertainty Awareness
Complexity of Real Human’s Choice
An example of choosing PC (Kivetz et al., 2004)
Each subject chooses 1 option from a choice set
A B C D E
CPU [MHz] 250 300 350 400 450
Mem. [MB] 192 160 128 96 64
Choice Set #subjects
{A, B, C} 36:176:144
{B, C, D} 56:177:115
{C, D, E} 94:181:109
Can random utility theory still explain the preference reversals?
B C or C B?
Game Theory Workshop 2017 Uncertainty Awareness
Similarity E↵ect (Tversky, 1972)
Top-share choice can change due to correlated utilities.
E.g., one color from {Blue, Red} or {Violet, Blue, Red}?
Game Theory Workshop 2017 Uncertainty Awareness
Attraction E↵ect (Huber et al., 1982)
Introduction of an absolutely-inferior option A (=decoy)
causes irregular increase of option A’s attractiveness.
Despite the natural guess that decoy never a↵ects the choice.
If D A, then D A A .
If A D, then A is superior to both A and D.
Game Theory Workshop 2017 Uncertainty Awareness
Compromise E↵ect (Simonson, 1989)
Moderate options within each chosen set are preferred.
Di↵erent from non-linear utility function involving
diminishing returns (e.g.,
p
inexpensiveness+
p
quality).
Game Theory Workshop 2017 Uncertainty Awareness
Positioning of the Proposed Work
Sim.: similarity, Attr.: attraction, Com.: compromise
Sim. Attr. Com. Mechanism Predict. for Likelihood
Test Set Maximization
SPM OK NG NG correlation OK MCMC
MDFT OK OK OK dominance & indi↵erence OK MCMC
PD OK OK OK nonlinear pairwise comparison OK MCMC
MMLM OK NG OK none OK Non-convex
NLM OK NG NG hierarchy NG Non-convex
BSY OK OK OK Bayesian OK MCMC
LCA OK OK OK loss aversion OK MCMC
MLBA OK OK OK nonlinear accumulation OK Non-convex
Proposed OK NG OK Bayesian OK Convex
MDFT: Multialternative Decision Field Theory (Roe et al., 2001)
PD: Proportional Di↵erence Model (Gonz´alez-Vallejo, 2002)
MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000)
SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009)
NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001)
BSY: Bayesian Model of (Shenoy and Yu, 2013)
LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004)
MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014)
Game Theory Workshop 2017 Uncertainty Awareness
Key Idea #1: a Dual Personality Model
Regard human as an estimator of her/his own utility function.
Assumption 1: DMS does not know the original utility func.
1 UC computes the sample value of every option’s utility,
and sends only these samples to DMS.
2 DMS statistically estimates the utility function.
Game Theory Workshop 2017 Uncertainty Awareness
Utility Calculator as Rational Personality
For every context i and option j, UC computes noiseless
sample of utility vij by applying utility function fUC : RdX !R.
vij = fUC (xij ), fUC (x),b + w>
(x)
b: bias term
: RdX
!Rd
: mapping function
w !Rd
: vector of coe cients
Game Theory Workshop 2017 Uncertainty Awareness
Key Idea #2: DMS is a Bayesian estimator
DMS does not know fUC but has utility samples {vij }
m[i]
j=1 .
Assumption 2: DMS places a choice-set-dependent Gaussian
Process (GP) prior on regressing the utility function.
µi ⇠ N 0m[i], 2
K(Xi )
K(Xi ) = (K(xij , xij0 ))2Rm[i]⇥m[i]
vi , (vi1, . . ., vim[i])>
⇠N µi , 2
Im[i]
µi 2Rm[i]
: vector of utility
2
: noise level
K(·, ·): similarity function
Xi , (xi1 2RdX
, . . . , xim[i])>
The posterior mean is given as
u⇤
i ,E[µi |vi , Xi , K] = K(Xi ) Im[i]+K(Xi )
1
b1m[i]+ i w .
Game Theory Workshop 2017 Uncertainty Awareness
Convex Optimization for Model Parameters
Likelihood of the entire model is tractable, assuming the choice
is given by a logit whose mean utility is the posterior mean u⇤
i .
Thus we can fit the function fUC from the choice data.
Conveniently, MAP estimation of fUC is convex for fixed K.
bb, cw = max
b,w
nX
i=1
`(bHi 1m[i]+Hi i w , yi )
c
2
kw k2
where `(u⇤
i , yi ),log
exp(u⇤
iyi
)
Pm[i]
j0=1exp(u⇤
ij0 )
and Hi ,K(Xi )(Im[i]+K(Xi )) 1
Game Theory Workshop 2017 Uncertainty Awareness
Irrationality as Bayesian Shrinkage
Implication from the posterior-mean utility in (1)
Each option’s utility is shrunk into prior mean 0.
Strong shrinkage for an option dissimilar to the others,
due to its high posterior variance (=uncertainty).
u⇤
i = K(Xi ) Im[i]+K(Xi )
1
| {z }
shrinkage factor
b1m[i]+ i w
| {z }
vec. of utility samples
. (1)
Context e↵ects as Bayesian uncertainty aversion
E.g., RBF kernel
K(x, x0
)=exp( kx x0
k2
)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 2 3 4
FinalEvaluation
X1=(5-X2)
DCBA
{A,B,C}
{B,C,D}
Game Theory Workshop 2017 Uncertainty Awareness
Recovered Context-Dependent Choice Criteria
For a speaker dataset: successfully captured mixture of
objective preference and subjective context e↵ects.
A B C D E
Power [Watt] 50 75 100 125 150
Price [USD] 100 130 160 190 220
Choice Set #subjects
{A, B, C} 45:135:145
{B, C, D} 58:137:111
{C, D, E} 95:155: 91
2
3
4
100 150 200
Evaluation
Price [USD]
EDCBA
Obj. Eval.
{A,B,C}
{B,C,D}
{C,D,E}
-1.1
-1
-0.9
-0.8
AverageLog-Likelihood
Dataset
PC SP SM
LinLogit
NpLogit
LinMix
NpMix
GPUA
Game Theory Workshop 2017 Uncertainty Awareness
A Result of p-beauty Contest by Real Humans
Guess 2/3 of all votes (0-100). Mean is apart from the Nash
equilibrium 0 (Camerer et al., 2004; Ho et al., 2006).
Table: Average Choice in (2/3)-beauty Contests
Subject Pool Group Size Sample Size Mean[Yi ]
Caltech Board 73 73 49.4
80 year olds 33 33 37.0
High School Students 20-32 52 32.5
Economics PhDs 16 16 27.4
Portfolio Managers 26 26 24.3
Caltech Students 3 24 21.5
Game Theorists 27-54 136 19.1
Game Theory Workshop 2017 Uncertainty Awareness
Modeling Bounded Rationality
Early stopping at step k: Level-k thinking or Cognitive
Hierarchy Theory (Camerer et al., 2004)
Humans cannot predict the infinite future.
Using non-stationary transitional state
Randomization of utility via noise "it: Quantal Response
Equilibrium (McKelvey and Palfrey, 1995)
8i 2{1, . . . , n} Y
(t)
i |Y
(t 1)
i = arg max
Y
h
fi (Y , Y
(t 1)
i ) + "it
i
Both methods essentially work as regularization of rationality.
Shrinkage into initial values or uniform choice probabilities
Game Theory Workshop 2017 Uncertainty Awareness
Linking ML with Game Theory (GT)
via Shrinkage Principle
Optimization
without shrinkage
Optimization
with shrinkage
ML GT
Maximum-Likelihood estimation
Bayesian estimation Transitional State
or Quantal Response Equilibrium
Nash Equilibrium
Optimal for training data,
but less generalization
capability to test data
Optimal for given game
but less predictable to real-
world decisions
Shrinkage towards uniform
probabilities causes suboptimality
for the given game, but more
predictable to real-world decisions
Shrinkage towards prior causes
suboptimality for training data,
but more generalization capability
to test data
Early Stopping and Regularization
ML as a Dynamical System
to find the optimal parameters
GT as a Dynamical System
to find the equilibrium
Parameter #1
Parameter #2
Exact Maximum-likelihood
estimate (e.g., OLS)
Exact Bayesian estimate
shrunk towards zero
(e.g., Ridge regression)
0
t=10
t=20
t=30
t=50
An early-stopping
estimate (e.g., Partial
Least Squares)
t=0
t=1
t →∞
t=2
...
mean = 50
mean = 34
mean = 15
mean = 0
Nash
Equilibrium
Level-2
Transitional State
Message
● Bayesian shrinkage ↔ Bounded rationality
– Dual-personality model for contextual effects
– Towards data-oriented & more realistic games:
export ML regularization techniques to GT
● Analyze dynamics or uncertainty-aware equilibria
– Early-stopped transitional state, or
– QRE with uncertainty on each player's utility function
Agenda
1.Uncertainty Awareness as an Essence in
Data-Oriented Real-World Decision Making
2.From Machine Learning to Game Theory #1 –
Linking Uncertainty with Bounded Rationality
3.From Machine Learning to Game Theory #2—
Open Questions Implied by Numerical Issues
Additional Implications from ML
● Multiple equilibria or saddle points?
● Equilibria or “typical” transitional states?
– Slow convergence
– Plateau of objective function
Recent history in ML
● Waste of ~20 years for local optimality issue
– Neural Networks (NNs) have been criticized for their local
optimality in fitting the parameters.
– ML community has been sticked with convex optimization
approaches (e.g., Support Vector Machines (Vapnik, 1995)).
– Most solutions in fitting high-dimensional NNs, however, are
found to be not local optima but saddle points (Bray & Dean,
2007; Dauphin+, 2014)!
– After skipping saddle points by perturbation, most of the local
optima empirically provide similar prediction capabilities.
● Please do not make the same mistake in multi-
agent optimization problems (=games)!
Why most are saddle points?
● See spectrum of Hessian matrices of a random-
drawn non-linear function from a Gaussian process.
Local minima: every
eigenvalue is positive.
Local maxima: every
eigenvalue is negative.
Univariate Function
Saddle point: both
positive & negative
eigenvalues exist.
● In high-dimensional function, Hessian contains both
positive & negative eigenvalues with high probability.
Bivariate Function
https://en.wikipedia.org/wiki/Saddle_point
Open Questions for Multiple Equilibria
● If a game is very complex involving lots of
parameters in pay-off or utility functions, then
– Are most of its critical points unstable saddle points?
– Is number of equilibria much smaller than our guess?
● If we obtain a few equilibria of such complex game,
– Do most of such equilibria have similar properties?
– Don't we have to obtain other equilibria?
See Dynamics:
“Typical” Transitional State?
● MLers are sensitive to convergence rate in fitting.
– We are in the finite-sample & high-dimensional world:
only asymptotics is powerless, and computational
estimate is not equilibrium but transitional state.
http://sebastianruder.com/optimizing-gradient-descent/
(Kingma & Ba, 2015)
See Dynamics:
“Typical” Transitional State?
● Mixing time of Markov processes of some games
is exponential to the number of players.
– E.g., (Axtell+, 2000) equilibrium: equality of wealth
transitional states: severe inequality
Nash demand game
Equilibrium Transitional State
● What If #players is over thousands or millions?
– Severe inequality in most of the time
See Dynamics: Trapped in Plateau?
● Fitting of a Deep NN is often trapped in plateaus.
– Natural gradient descent (Amari, 1997) is often used
for quickly escaping from plateau.
– In real-world games, are people trapped in plateaus
rather than equilibria?
https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html
Conclusion
● Discussed how uncertainty should be incorporated
in inductive & deductive decision making.
– Quantifying uncertainty or simpler minimal estimation
● Linked Bayesian shrinkage with bounded rationality
– Towards data-oriented regularized equilibrium
● Implications from high-dimensional ML
– Saddle points, transitional state, and/or plateau
THANK YOU FOR ATTENDING!
Download this material from
https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-
machine-learning-and-game-theory
References
References I
Amari, S. (1997). Neural learning in structured parameter spaces -
natural Riemannian gradient. In Advances in Neural Information
Processing Systems 9, pages 127–133. MIT Press.
Axtell, R., Epstein, J., and Young, H. (2000). The emergence of classes
in a multi-agent bargaining model. Working papers, Brookings
Institution - Working Papers.
Bray, A. J. and Dean, D. S. (2007). Statistics of critical points of
gaussian fields on large-dimensional spaces. Physics Review Letters,
98:150201.
Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is there
something quantum-like about the human mental lexicon? Journal of
Mathematical Psychology, 53(5):362–377.
Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitive hierarchy
model of games. Quarterly Journal of Economics, 119:861–898.
Game Theory Workshop 2017 Uncertainty Awareness
References
References II
Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach to
conjoint analysis. In Advances in Neural Information Processing
Systems 17, pages 257–264. MIT Press, Cambridge, MA, USA.
Clarke, E. H. (1971). Multipart pricing of public goods. Public Choice,
2:19–33.
Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and
Bengio, Y. (2014). Identifying and attacking the saddle point problem
in high-dimensional non-convex optimization. In Advances in Neural
Information Processing Systems 27, pages 2933–2941. Curran
Associates, Inc.
de Barros, J. A. and Suppes, P. (2009). Quantum mechanics,
interference, and the brain. Journal of Mathematical Psychology,
53(5):306–313.
Game Theory Workshop 2017 Uncertainty Awareness
References
References III
Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N., and
Allenby, G. M. (2009). A probit model with structured covariance for
similarity e↵ects and source of volume calculations.
http://ssrn.com/abstract=1396232.
Gonz´alez-Vallejo, C. (2002). Making trade-o↵s: A probabilistic and
context-sensitive model of choice behavior. Psychological Review,
109:137–154.
Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumer
research: Issues and outlook. Journal of Consumer Research,
5:103–123.
Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling the psychology
of consumer and firm behavior with behavioral economics. Journal of
Marketing Research, 43(3):307–331.
Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetrically
dominated alternatives: Violations of regularity and the similarity
hypothesis. Journal of Consumer Research, 9:90–98.
Game Theory Workshop 2017 Uncertainty Awareness
References
References IV
Kakade, S. M. (2002). A natural policy gradient. In Dietterich, T. G.,
Becker, S., and Ghahramani, Z., editors, Advances in Neural
Information Processing Systems 14, pages 1531–1538. MIT Press.
Kingma, D. and Ba, J. (2015). Adam: A method for stochastic
optimization. In The International Conference on Learning
Representations (ICLR), San Diego.
Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternative models
for capturing the compromise e↵ect. Journal of Marketing Research,
41(3):237–257.
Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorization
with gaussian processes. In Proceedings of the 26th Annual
International Conference on Machine Learning (ICML 2009), pages
601–608, New York, NY, USA. ACM.
McFadden, D. and Train, K. (2000). Mixed MNL models for discrete
response. Journal of Applied Econometrics, 15:447–470.
Game Theory Workshop 2017 Uncertainty Awareness
References
References V
McFadden, D. L. (1980). Econometric models of probabilistic choice
among products. Journal of Business, 53(3):13–29.
McKelvey, R. and Palfrey, T. (1995). Quantal response equilibria for
normal form games. Games and Economic Behavior, 10:6–38.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T.,
Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for
deep reinforcement learning. In Proceedings of The 33rd International
Conference on Machine Learning (ICML 2016), pages 1928–1937.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare,
M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen,
S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D.,
Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control
through deep reinforcement learning. Nature, 518:529–533.
Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Type indeterminacy:
A model of the KT (kahnemantversky)-man. Journal of Mathematical
Psychology, 53(5):349–361.
Game Theory Workshop 2017 Uncertainty Awareness
References
References VI
Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001).
Multialternative decision field theory: A dynamic connectionist model
of decision making. Psychological Review, 108:370–392.
Shenoy, P. and Yu, A. J. (2013). A rational account of contextual e↵ects
in preference choice: What makes for a bargain? In Proceedings of the
Cognitive Science Society Conference.
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den
Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V.,
Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N.,
Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T.,
and Hassabis, D. (2016). Mastering the game of Go with deep neural
networks and tree search. Nature, 529:484–489.
Simonson, I. (1989). Choice based on reasons: The case of attraction
and compromise e↵ects. Journal of Consumer Research, 16:158–174.
Game Theory Workshop 2017 Uncertainty Awareness
References
References VII
Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000).
Policy gradient methods for reinforcement learning with function
approximation. In Advances in Neural Information Processing Systems
12, pages 1057–1063. MIT Press.
Takahashi, R. and Morimura, T. (2015). Predicting preference reversals
via gaussian process uncertainty aversion. In Proceedings of the 18th
International Conference on Artificial Intelligence and Statistics
(AISTATS 2015), pages 958–967.
Trueblood, J. S. (2014). The multiattribute linear ballistic accumulator
model of context e↵ects in multialternative choice. Psychological
Review, 121(2):179–205.
Tversky, A. (1972). Elimination by aspects: A theory of choice.
Psychological Review, 79:281–299.
Usher, M. and McClelland, J. L. (2004). Loss aversion and inhibition in
dynamical models of multialternative choice. Psychological Review,
111:757–769.
Game Theory Workshop 2017 Uncertainty Awareness
References
References VIII
Wen, C.-H. and Koppelman, F. (2001). The generalized nested logit
model. Transportation Research Part B, 35:627–641.
Williams, H. (1977). On the formulation of travel demand models and
economic evaluation measures of user benefit. Environment and
Planning A, 9(3):285–344.
Williams, R. J. (1992). Simple statistical gradient-following algorithms
for connectionist reinforcement learning. 8(3):229–256.
Yai, T. (1997). Multinomial probit with structured covariance for route
choice behavior. Transportation Research Part B: Methodological,
31(3):195–207.
Game Theory Workshop 2017 Uncertainty Awareness

Mais conteúdo relacionado

Mais procurados

Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向HiroakiMinoura
 
Embedding Watermarks into Deep Neural Networks
Embedding Watermarks into Deep Neural NetworksEmbedding Watermarks into Deep Neural Networks
Embedding Watermarks into Deep Neural NetworksYusuke Uchida
 
introduction to double deep Q-learning
introduction to double deep Q-learningintroduction to double deep Q-learning
introduction to double deep Q-learningWEBFARMER. ltd.
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜Jun Okumura
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent ImaginationDeep Learning JP
 
Hyperoptとその周辺について
Hyperoptとその周辺についてHyperoptとその周辺について
Hyperoptとその周辺についてKeisuke Hosaka
 
CF-FinML 金融時系列予測のための機械学習
CF-FinML 金融時系列予測のための機械学習CF-FinML 金融時系列予測のための機械学習
CF-FinML 金融時系列予測のための機械学習Katsuya Ito
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...Deep Learning JP
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Ohnishi Katsunori
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksEiichi Matsumoto
 
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化The Whole Brain Architecture Initiative
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健Preferred Networks
 
ファクター投資と機械学習
ファクター投資と機械学習ファクター投資と機械学習
ファクター投資と機械学習Kei Nakagawa
 
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural NetworksYosuke Shinya
 
相関と因果について考える:統計的因果推論、その(不)可能性の中心
相関と因果について考える:統計的因果推論、その(不)可能性の中心相関と因果について考える:統計的因果推論、その(不)可能性の中心
相関と因果について考える:統計的因果推論、その(不)可能性の中心takehikoihayashi
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説弘毅 露崎
 
自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)
自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)
自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)syou6162
 
レコメンド研究のあれこれ
レコメンド研究のあれこれレコメンド研究のあれこれ
レコメンド研究のあれこれMasahiro Sato
 
Tableauから始める機械学習ーやってみようPython連携_2019-05-23
Tableauから始める機械学習ーやってみようPython連携_2019-05-23Tableauから始める機械学習ーやってみようPython連携_2019-05-23
Tableauから始める機械学習ーやってみようPython連携_2019-05-23Tomohiro Iwahashi
 
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...Masatoshi Yoshida
 

Mais procurados (20)

Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向
 
Embedding Watermarks into Deep Neural Networks
Embedding Watermarks into Deep Neural NetworksEmbedding Watermarks into Deep Neural Networks
Embedding Watermarks into Deep Neural Networks
 
introduction to double deep Q-learning
introduction to double deep Q-learningintroduction to double deep Q-learning
introduction to double deep Q-learning
 
DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜DQNからRainbowまで 〜深層強化学習の最新動向〜
DQNからRainbowまで 〜深層強化学習の最新動向〜
 
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
[DL輪読会]Dream to Control: Learning Behaviors by Latent Imagination
 
Hyperoptとその周辺について
Hyperoptとその周辺についてHyperoptとその周辺について
Hyperoptとその周辺について
 
CF-FinML 金融時系列予測のための機械学習
CF-FinML 金融時系列予測のための機械学習CF-FinML 金融時系列予測のための機械学習
CF-FinML 金融時系列予測のための機械学習
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
 
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
 
ファクター投資と機械学習
ファクター投資と機械学習ファクター投資と機械学習
ファクター投資と機械学習
 
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
【宝くじ仮説】The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
 
相関と因果について考える:統計的因果推論、その(不)可能性の中心
相関と因果について考える:統計的因果推論、その(不)可能性の中心相関と因果について考える:統計的因果推論、その(不)可能性の中心
相関と因果について考える:統計的因果推論、その(不)可能性の中心
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
 
自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)
自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)
自分の目的に合った統計量と そのバラ付きを計算しよう ~NPSを例に~(統計学勉強会)
 
レコメンド研究のあれこれ
レコメンド研究のあれこれレコメンド研究のあれこれ
レコメンド研究のあれこれ
 
Tableauから始める機械学習ーやってみようPython連携_2019-05-23
Tableauから始める機械学習ーやってみようPython連携_2019-05-23Tableauから始める機械学習ーやってみようPython連携_2019-05-23
Tableauから始める機械学習ーやってみようPython連携_2019-05-23
 
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
 

Destaque

15分でわかる(範囲の)ベイズ統計学
15分でわかる(範囲の)ベイズ統計学15分でわかる(範囲の)ベイズ統計学
15分でわかる(範囲の)ベイズ統計学Ken'ichi Matsui
 
Twitter炎上分析事例 2014年
Twitter炎上分析事例 2014年Twitter炎上分析事例 2014年
Twitter炎上分析事例 2014年Takeshi Sakaki
 
Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPKoji Matsuda
 
[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text GenerationDeep Learning JP
 
最先端NLP勉強会 “Learning Language Games through Interaction” Sida I. Wang, Percy L...
最先端NLP勉強会“Learning Language Games through Interaction”Sida I. Wang, Percy L...最先端NLP勉強会“Learning Language Games through Interaction”Sida I. Wang, Percy L...
最先端NLP勉強会 “Learning Language Games through Interaction” Sida I. Wang, Percy L...Yuya Unno
 
オープンソースを利用した新時代を生き抜くためのデータ解析
オープンソースを利用した新時代を生き抜くためのデータ解析オープンソースを利用した新時代を生き抜くためのデータ解析
オープンソースを利用した新時代を生き抜くためのデータ解析nakapara
 
「人工知能」の表紙に関するTweetの分析・続報
「人工知能」の表紙に関するTweetの分析・続報「人工知能」の表紙に関するTweetの分析・続報
「人工知能」の表紙に関するTweetの分析・続報Fujio Toriumi
 
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]Takayuki Sekine
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksShuyo Nakatani
 
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by ホットリンク 公開用
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by  ホットリンク 公開用2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by  ホットリンク 公開用
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by ホットリンク 公開用Takeshi Sakaki
 
あなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントあなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントShohei Hido
 
オンコロジストなるためのスキル
オンコロジストなるためのスキルオンコロジストなるためのスキル
オンコロジストなるためのスキルmusako-oncology
 
新たなRNNと自然言語処理
新たなRNNと自然言語処理新たなRNNと自然言語処理
新たなRNNと自然言語処理hytae
 
ディープラーニングでラーメン二郎(全店舗)を識別してみた
ディープラーニングでラーメン二郎(全店舗)を識別してみたディープラーニングでラーメン二郎(全店舗)を識別してみた
ディープラーニングでラーメン二郎(全店舗)を識別してみたknjcode
 
学部生向けベイズ統計イントロ(公開版)
学部生向けベイズ統計イントロ(公開版)学部生向けベイズ統計イントロ(公開版)
学部生向けベイズ統計イントロ(公開版)考司 小杉
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...Deep Learning JP
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料tm_2648
 
現在のDNNにおける未解決問題
現在のDNNにおける未解決問題現在のDNNにおける未解決問題
現在のDNNにおける未解決問題Daisuke Okanohara
 

Destaque (20)

15分でわかる(範囲の)ベイズ統計学
15分でわかる(範囲の)ベイズ統計学15分でわかる(範囲の)ベイズ統計学
15分でわかる(範囲の)ベイズ統計学
 
Twitter炎上分析事例 2014年
Twitter炎上分析事例 2014年Twitter炎上分析事例 2014年
Twitter炎上分析事例 2014年
 
Argmax Operations in NLP
Argmax Operations in NLPArgmax Operations in NLP
Argmax Operations in NLP
 
Approximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLPApproximate Scalable Bounded Space Sketch for Large Data NLP
Approximate Scalable Bounded Space Sketch for Large Data NLP
 
[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation[DL輪読会]Adversarial Feature Matching for Text Generation
[DL輪読会]Adversarial Feature Matching for Text Generation
 
最先端NLP勉強会 “Learning Language Games through Interaction” Sida I. Wang, Percy L...
最先端NLP勉強会“Learning Language Games through Interaction”Sida I. Wang, Percy L...最先端NLP勉強会“Learning Language Games through Interaction”Sida I. Wang, Percy L...
最先端NLP勉強会 “Learning Language Games through Interaction” Sida I. Wang, Percy L...
 
オープンソースを利用した新時代を生き抜くためのデータ解析
オープンソースを利用した新時代を生き抜くためのデータ解析オープンソースを利用した新時代を生き抜くためのデータ解析
オープンソースを利用した新時代を生き抜くためのデータ解析
 
「人工知能」の表紙に関するTweetの分析・続報
「人工知能」の表紙に関するTweetの分析・続報「人工知能」の表紙に関するTweetの分析・続報
「人工知能」の表紙に関するTweetの分析・続報
 
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
第35回 強化学習勉強会・論文紹介 [Lantao Yu : 2016]
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by ホットリンク 公開用
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by  ホットリンク 公開用2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by  ホットリンク 公開用
2016.03.11 「論文に書(け|か)ない自然言語処理」 ソーシャルメディア分析サービスにおけるNLPに関する諸問題について by ホットリンク 公開用
 
あなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイントあなたの業務に機械学習を活用する5つのポイント
あなたの業務に機械学習を活用する5つのポイント
 
オンコロジストなるためのスキル
オンコロジストなるためのスキルオンコロジストなるためのスキル
オンコロジストなるためのスキル
 
新たなRNNと自然言語処理
新たなRNNと自然言語処理新たなRNNと自然言語処理
新たなRNNと自然言語処理
 
ディープラーニングでラーメン二郎(全店舗)を識別してみた
ディープラーニングでラーメン二郎(全店舗)を識別してみたディープラーニングでラーメン二郎(全店舗)を識別してみた
ディープラーニングでラーメン二郎(全店舗)を識別してみた
 
学部生向けベイズ統計イントロ(公開版)
学部生向けベイズ統計イントロ(公開版)学部生向けベイズ統計イントロ(公開版)
学部生向けベイズ統計イントロ(公開版)
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
Deep Convolutional Generative Adversarial Networks - Nextremer勉強会資料
 
現在のDNNにおける未解決問題
現在のDNNにおける未解決問題現在のDNNにおける未解決問題
現在のDNNにおける未解決問題
 

Semelhante a Uncertainty Awareness in Integrating Machine Learning and Game Theory

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systemsOlivier Teytaud
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingShengyuan Wang Steven
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavAgile Testing Alliance
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansWayne Lee
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesIntuit Inc.
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud
 

Semelhante a Uncertainty Awareness in Integrating Machine Learning and Game Theory (20)

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systems
 
Direct policy search
Direct policy searchDirect policy search
Direct policy search
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative Investing
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for Statisticians
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
 
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesStrata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systems
 

Último

Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...ssifa0344
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfMichael Silva
 
CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual serviceanilsa9823
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Delhi Call girls
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Pooja Nehwal
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfGale Pooley
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Call Girls in Nagpur High Profile
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptxFinTech Belgium
 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...Call Girls in Nagpur High Profile
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...ssifa0344
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Pooja Nehwal
 
The Economic History of the U.S. Lecture 26.pdf
The Economic History of the U.S. Lecture 26.pdfThe Economic History of the U.S. Lecture 26.pdf
The Economic History of the U.S. Lecture 26.pdfGale Pooley
 
The Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfThe Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfGale Pooley
 
Basic concepts related to Financial modelling
Basic concepts related to Financial modellingBasic concepts related to Financial modelling
Basic concepts related to Financial modellingbaijup5
 

Último (20)

Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
Solution Manual for Financial Accounting, 11th Edition by Robert Libby, Patri...
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdf
 
CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gomti Nagar Lucknow best sexual service
 
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
Best VIP Call Girls Noida Sector 18 Call Me: 8448380779
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdf
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
 
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Maya Call 7001035870 Meet With Nagpur Escorts
 
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
02_Fabio Colombo_Accenture_MeetupDora&Cybersecurity.pptx
 
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
(Vedika) Low Rate Call Girls in Pune Call Now 8250077686 Pune Escorts 24x7
 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
 
The Economic History of the U.S. Lecture 26.pdf
The Economic History of the U.S. Lecture 26.pdfThe Economic History of the U.S. Lecture 26.pdf
The Economic History of the U.S. Lecture 26.pdf
 
The Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdfThe Economic History of the U.S. Lecture 20.pdf
The Economic History of the U.S. Lecture 20.pdf
 
Basic concepts related to Financial modelling
Basic concepts related to Financial modellingBasic concepts related to Financial modelling
Basic concepts related to Financial modelling
 
Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024
 

Uncertainty Awareness in Integrating Machine Learning and Game Theory

  • 1. Uncertainty Awareness in Integrating Machine Learning and Game Theory 不確実性を通して見る 機械学習とゲーム理論とのつながり Rikiya Takahashi SmartNews, Inc. rikiya.takahashi@smartnews.com Mar 5, 2017 Game Theory Workshop 2017 https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating- machine-learning-and-game-theory
  • 2. About Myself ● Rikiya TAKAHASHI, Ph.D. (高橋 力矢) – Engineer in SmartNews, Inc., from 2015 to current – Research Staff Member in IBM Research – Tokyo, from 2004 to 2015 ● Research Interests: machine learning, reinforcement learning, cognitive science, behavioral economics, complex systems – Descriptive models about real human behavior – Prescriptive decision making from descriptive models – Robust algorithms working under high uncertainty ● Limited sample size, high dimensionality, high noise
  • 3. Example of Previous Work ● Budget-Constrained Markov Decision Process for Marketing-Mix Optimization (Takahashi+, 2013 & 2014) 2014/01/01 2014/01/08 … 2014/12/31 EM DM TM EM DM TM … EM DM TM Segment #1 … Segment #2 … … … Segment #N … EM: e-mail DM: direct mail TM: tele-marketing $$ E-mail TV CM Purchase prediction response stimulus Browsing Revenues in past 16 weeks > $200? #purchase in past 8 weeks > 2? #browsing in past 4 weeks > 15? No Yes Strategic Segment #1 MS #1 MS #2 #EMs in past 2 weeks > 2? No Yes MS #255 MS #256 #EMs in past 2 weeks > 2? No Yes ….............................................................. ... Historical Data Consumer Segmentation Time-Series Predictive Modeling Optimal Marketing-Mix & Targeting Rules
  • 4. Example of Previous Work ● Travel-Time Distribution Prediction on a Large Road Network (Takahashi+, 2012) A B rN/L rN/L rN/L rN/L rN/L rN/L ψ1 (y) ψ2 (y) ψ3 (y) ψ4 (y) ψ5 (y) ψ6 (y) intersection link 1 0 0 00.5 00.5 0 0.85 Road Network & Travel Time Data by Taxi Predictive Modeling of Travel Time Distribution Route-Choice Recommendation or Traffic Simulation
  • 5. Example of Previous Work ● Bayesian Discrete Choice Modeling for Irrational Compromise Effect (Takahashi & Morimura, 2015) – Explained later today A 0 B C D {A, B, C} {B, C, D} The option having the highest share inexpensiveness product quality Utility Calculator (UC) Decision Making System (DMS) Vector of attributes = A uiA =3.26 B uiB =3.33 C uiC =2.30 send samples utility A B utility sample utility estimate C
  • 6. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
  • 7. Machine Learning (ML) ● Set of inductive disciplines to design probabilistic model and estimate its parameters that maximize out-of-sample predictive accuracy – Supervised learning: model and fit P(Y|X) – Unsupervised learning: model and fit P(X) ● What machine learners care about – Bias-variance trade-off – Curse of dimensionality
  • 8. Estimation via Bayes' theorem ● Basis behind today's most ML algorithm posterior distribution: p(θ∣D)= p(D∣θ ) p(θ) ∫θ p(D∣θ ) p(θ)d θ predictive distribution: p( y∗ ∣D)=∫θ p( y∗ ∣θ) p(θ∣D)d θ posterior mode: ̂θ =argmax θ [log p(D∣θ )+log p(θ )] predictive distribution: p( y∗ ∣D)≃p( y∗ ∣̂θ ) Maximum A Posteriori estimation Bayesian estimation p(θ ) approximation ● Q. Why placing a prior ? – A1. To quantify uncertainty as posterior – A2. To avoid overfitting data:D model parameter:θ
  • 9. E.g., Gaussian Process Regression (GPR) ● Bayesian Ridge Regression – Unlike MAP Ridge regression (dark gray), input- dependent uncertainty (light gray) is quantified. prior:( f f ∗)∼N (0n+1 , (K k∗ k∗ T K (x ∗ , x ∗ ))) where K =(Kij≡K (xi , x j )), k∗=(K (x1, x ∗ ),…, K (xn , x ∗ )) T , K (x , x ')=exp(−γ∥x−x'∥ 2 ) data likelihood:(y y ∗)∼N ((f f ∗),σ 2 In+1 ) predictive distribution: y ∗ ∣K , x ∗ , X , y ∼N (k∗ T (σ 2 I n+K ) −1 y , K (x ∗ , x ∗ )−k∗ T (σ 2 In+K) −1 k∗+σ 2 )
  • 10. Gap between Deduction & Induction Today's AI is integrating both. Do not divide the work between inductive & deductive researchers. Deductive Mind ● Optimize decisions for a given environment ● Casino owner's mentality ● Game theorist, probabilist, operations researcher Inductive Mind ● Estimate the environment from observations ● Gambler's mentality ● Statistician, machine learner, econometrician
  • 11. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D Estimate is different from the true environment . ̂Θ D Θ ∀i∈{1,…, n} ̂π D , i=arg max πi R(πi∣{̂π D , j }j≠i , ̂Θ D )
  • 12. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D ∀i∈{1,…, n} ̂π D , i=arg max πi R(πi∣{̂π D , j }j≠i , ̂Θ D ) How the estimation-based policy is different from the true optimal policy ? ̂π D π ∗ ∀i∈{1,…, n} π i ∗ =arg max πi R(πi∣{π j ∗ }j≠i ,Θ )
  • 13. Induction ↔ Deduction Dataset Typical Problem Solving in the Real World Estimate of Environment Inductive Process Machine Learning, Statistics, Econometrics, etc. Policy Decisions Deductive Process Game theory, mathematical programming, Markov Decision Process, etc. D ̂Θ D ̂π D State-of-the-art AI Dataset By-product Direct Optimization Integration of Machine Learning and Optimization Algorithms Policy Decisions D ̌Θ D ̌π D
  • 14. See the Difference Typical Problem Solving in the Real World: Unnecessarily too much effort in solving each subproblem Vulnerable to estimation error State-of-the-art AI Less effort of needless intermediate estimation Robust to estimation error ̌Θ D ̌π D̂π D ̂Θ D Accurately fitted on minimal prediction error for dataset D, while minimizing the error of this parameter is not the goal. Exceedingly optimized given wrong assumption Fitted but not minimizing the error for dataset D. Often less complex than . Safely optimized with less reliance on ̌Θ D ̂Θ D
  • 15. See the Difference Typical Problem Solving in the Real World: State-of-the-art AI Solve a Hard Inductive Problem Solve another Hard Deductive Problem Solve an Easier Problem that Involves both Induction & Deduction ● Recommendation of simple solving – Gigerenzer & Taleb, https://www.youtube.com/watch?v=4VSqfRnxvV8
  • 16. Optimization under Uncertainty ● Interval Estimation (e.g., Bayesian) – Quantify uncertainty – Optimize over all possible environments ● Minimal Estimation (e.g., Vapnik) – Omit intermediate step – Solve the minimal optimization problem ● Two principles are effective in practice.
  • 17. Vapnik's Principle (Vapnik, 1995) When solving a problem of interest, do not solve a more general problem as an intermediate step. —Vladimir N. Vapnik ● E.g., classification or regression : predict Y given X – #1. Fit P(X,Y) and infer P(Y|X) by Bayes’ theorem – #2. Only fit P(Y|X) ● #2 is better than #1 because of its less estimation error. – Better particularly when uncertainty is high: small sample size, high dimensionality, and/or high noise
  • 18. Batch Reinforcement Learning ● A good example of involving both inductive and deductive processes. ● Also a good example of how to avoid needlessly hard estimation. ● Basis behind the recent success of Deep Q- Network to play games (Mnih+, 2013 & 2015), and Alpha-Go (Silver+, 2016)
  • 19. Markov Decision Process ● Framework for long-term-optimal decision making – S: set of states, A: set of actions P(s'|s,a): state-transition probability r(s,a): immediate reward, : discounting factor – Optimize policy for maximal cumulative reward … State #1 (e.g., Gold Customer) State #2 (e.g., Silver Customer) State #3 (e.g., Normal Customer) t=0 t=1 t=2 $ $$ $$$ By Action #1 (e.g., ordinary discount on flight ticket) … t=0 t=1 t=2 $$ $ $ By Action #2 (e.g., free business-class upgrade) γ ∈[0,1] π (a∣s)
  • 20. Markov Decision Process ● Easy to solve If the environment is known – Via dynamic programming or linear programming when P(s'|s,a) & r(s,a) are given with no uncertainty – Behave myopically at ● For each state s, choose the action a that maximizes r(s,a). – At time (t-1), choose the optimal action that maximizes the immediate reward at time (t-1) plus the expected reward after time t over the state transition distribution. ● What If the environment is unknown? t →∞
  • 21. Types of Reinforcement Learning ● Model-based ↔ Model-free ● On policy ↔ Off policy ● Value iteration ↔ policy search ● Model-based approach – 1. System identification: estimate the MDP parameters – 2. Sample multiple MDPs from the interval estimate – 3. Solve every MDP & take the best action of best MDP ● Optimism in the face of uncertainty
  • 22. Model-free approach ● Remember: our aim is to get the optimal policy. No need of estimating environment, in principle. – Act without fully identifying system: as long as we choose the optimal action, it turned out right in the end. ● Even when doing estimation, utilize intermediate statistic less complex than P(s'|s,a) & r(s,a).
  • 23. Bellman Optimality Equation ● Policy is derived if we have an estimate of Q(s,a). – Simpler than estimating P(s'|s,a) & r(s,a) r Q(s ,a)=E[r(s ,a)]+γ EP (s'∣s,a) [max a' Q(s' ,a' ) ] π (a∣s)= {1 a=argmax a' Q(s ,a' ) 0 otherwise ̂Q(s ,a) (si ,ai ,si ' ,ri)i=1 n● Get an estimate from episodes
  • 24. Fitted Q-Iteration (Ernst+, 2005) ● For k=1,2,... iterate 1) value computation and 2) regression as ∀i∈{1,…, n} vi (k) :=ri+γ ̂Qk (1) (si ' ,argmax a' ̂Qk (0) (si ' ,a') ) ∀ f ∈{0,1} ̂Qk+1 ( f ) :=argmin Q∈H [1 2 ∑i∈J f (vi (k ) −Q(si ,ai)) 2 +R(Q)] 1) 2) – H: hypothesis space of function, Q0 ≡ 0, R: regularization term – Indices 1...n are randomly split into sets J0 and J1 , for avoiding over-estimation of Q values (Double Q-Learning (Hasselt, 2010)). ● Related with Experience Replay in Deep Q- Network (Mnih+, 2013 & 2015) – See (Lange+, 2012) for more details.
  • 25. Policy Gradient ● Accurately fit policy   while roughly fit Q(s,a) – More directness to the final aim – Applicable for continuous action problem π θ (a∣s) ∇θ J (θ)⏟ gradient of performance = Eπ θ [∇θ logπ θ (a∣s)Q π (s ,a)]⏟ expected log-policy times cumulative-reward over s and a Policy Gradient Theorem (Sutton+, 2000) ● Variations on providing the rough estimate of Q – REINFORCE (Williams, 1992): reward samples – Actor-Critic: regression models (e.g., Natural Gradient (Kakade, 2002), A3C (Mnih+, 2016))
  • 26. Functional Approximation in Practice ● Concrete functional form of Q(s,a) and/or – Q should be a universal functional approximator: class of functions that can approximate any function if sufficiently many parameters are introduced. ● Examples of universal approximator Tree Ensembles Random Forest, Gradient Boosted Decision Trees (Deep) Neural Networks Mixture of Radial Basis Functions (RBFs) + π (a∣s)
  • 27. Functional Approximation in Practice ● Is any univ. approximator OK? – No, unfortunately. – Universal approximator is merely asymptotically unbiased. – Better to have ● Low variance in terms of bias-variance trade-off ● Resistance to curse of dimensionality ● One reason of deep learning's success – Flexibility to represent multi-modal function with less parameters than nonparametric (RBF or tree) models – Techniques to stabilize numerical optimization ● AdaGrad or ADAM, dropout, ReLU, batch normalization, etc.
  • 28. Message ● Uncertainty awareness is essential on data- oriented decision making. – No division between induction and deduction – Removing needless intermediate estimation – Fitted Q-Iteration as an illustrative example ● Less parameters, less uncertainty
  • 29. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
  • 30. Shrinkage Matters in the Real World. ● Q. Why prior helps avoid over-fitting? – A. shrinkage towards prior mean (e.g., 0 in Ridge reg.) ● Over-optimization ↔ Over-rationalization? – (e.g., (Takahashi and Morimura, 2015)) 0 Coefficient #1 Coefficient #2 Solution of 2-dimensional OLS & Ridge regression Ordinary Least Squares (OLS) Ridge : closer to prior mean 0 than OLS Prior mean 0 is independent from training data
  • 31. Discrete Choice Modelling Goal: predict prob. of choosing an option from a choice set. Why solving this problem? Brand positioning among competitors Sales promotion (yet involving some abuse) Game Theory Workshop 2017 Uncertainty Awareness
  • 32. Random Utility Theory as a Rational Model Each human is a rational maximizer of random utility. Theoretical basis behind many statistical marketing models. Logit models (e.g., (McFadden, 1980; Williams, 1977; McFadden and Train, 2000)), Learning to rank (e.g., (Chapelle and Harchaoui, 2005)), Conjoint analysis (Green and Srinivasan, 1978), Matrix factorization (e.g., (Lawrence and Urtasun, 2009)), ... Game Theory Workshop 2017 Uncertainty Awareness
  • 33. Complexity of Real Human’s Choice An example of choosing PC (Kivetz et al., 2004) Each subject chooses 1 option from a choice set A B C D E CPU [MHz] 250 300 350 400 450 Mem. [MB] 192 160 128 96 64 Choice Set #subjects {A, B, C} 36:176:144 {B, C, D} 56:177:115 {C, D, E} 94:181:109 Can random utility theory still explain the preference reversals? B C or C B? Game Theory Workshop 2017 Uncertainty Awareness
  • 34. Similarity E↵ect (Tversky, 1972) Top-share choice can change due to correlated utilities. E.g., one color from {Blue, Red} or {Violet, Blue, Red}? Game Theory Workshop 2017 Uncertainty Awareness
  • 35. Attraction E↵ect (Huber et al., 1982) Introduction of an absolutely-inferior option A (=decoy) causes irregular increase of option A’s attractiveness. Despite the natural guess that decoy never a↵ects the choice. If D A, then D A A . If A D, then A is superior to both A and D. Game Theory Workshop 2017 Uncertainty Awareness
  • 36. Compromise E↵ect (Simonson, 1989) Moderate options within each chosen set are preferred. Di↵erent from non-linear utility function involving diminishing returns (e.g., p inexpensiveness+ p quality). Game Theory Workshop 2017 Uncertainty Awareness
  • 37. Positioning of the Proposed Work Sim.: similarity, Attr.: attraction, Com.: compromise Sim. Attr. Com. Mechanism Predict. for Likelihood Test Set Maximization SPM OK NG NG correlation OK MCMC MDFT OK OK OK dominance & indi↵erence OK MCMC PD OK OK OK nonlinear pairwise comparison OK MCMC MMLM OK NG OK none OK Non-convex NLM OK NG NG hierarchy NG Non-convex BSY OK OK OK Bayesian OK MCMC LCA OK OK OK loss aversion OK MCMC MLBA OK OK OK nonlinear accumulation OK Non-convex Proposed OK NG OK Bayesian OK Convex MDFT: Multialternative Decision Field Theory (Roe et al., 2001) PD: Proportional Di↵erence Model (Gonz´alez-Vallejo, 2002) MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000) SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009) NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001) BSY: Bayesian Model of (Shenoy and Yu, 2013) LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004) MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014) Game Theory Workshop 2017 Uncertainty Awareness
  • 38. Key Idea #1: a Dual Personality Model Regard human as an estimator of her/his own utility function. Assumption 1: DMS does not know the original utility func. 1 UC computes the sample value of every option’s utility, and sends only these samples to DMS. 2 DMS statistically estimates the utility function. Game Theory Workshop 2017 Uncertainty Awareness
  • 39. Utility Calculator as Rational Personality For every context i and option j, UC computes noiseless sample of utility vij by applying utility function fUC : RdX !R. vij = fUC (xij ), fUC (x),b + w> (x) b: bias term : RdX !Rd : mapping function w !Rd : vector of coe cients Game Theory Workshop 2017 Uncertainty Awareness
  • 40. Key Idea #2: DMS is a Bayesian estimator DMS does not know fUC but has utility samples {vij } m[i] j=1 . Assumption 2: DMS places a choice-set-dependent Gaussian Process (GP) prior on regressing the utility function. µi ⇠ N 0m[i], 2 K(Xi ) K(Xi ) = (K(xij , xij0 ))2Rm[i]⇥m[i] vi , (vi1, . . ., vim[i])> ⇠N µi , 2 Im[i] µi 2Rm[i] : vector of utility 2 : noise level K(·, ·): similarity function Xi , (xi1 2RdX , . . . , xim[i])> The posterior mean is given as u⇤ i ,E[µi |vi , Xi , K] = K(Xi ) Im[i]+K(Xi ) 1 b1m[i]+ i w . Game Theory Workshop 2017 Uncertainty Awareness
  • 41. Convex Optimization for Model Parameters Likelihood of the entire model is tractable, assuming the choice is given by a logit whose mean utility is the posterior mean u⇤ i . Thus we can fit the function fUC from the choice data. Conveniently, MAP estimation of fUC is convex for fixed K. bb, cw = max b,w nX i=1 `(bHi 1m[i]+Hi i w , yi ) c 2 kw k2 where `(u⇤ i , yi ),log exp(u⇤ iyi ) Pm[i] j0=1exp(u⇤ ij0 ) and Hi ,K(Xi )(Im[i]+K(Xi )) 1 Game Theory Workshop 2017 Uncertainty Awareness
  • 42. Irrationality as Bayesian Shrinkage Implication from the posterior-mean utility in (1) Each option’s utility is shrunk into prior mean 0. Strong shrinkage for an option dissimilar to the others, due to its high posterior variance (=uncertainty). u⇤ i = K(Xi ) Im[i]+K(Xi ) 1 | {z } shrinkage factor b1m[i]+ i w | {z } vec. of utility samples . (1) Context e↵ects as Bayesian uncertainty aversion E.g., RBF kernel K(x, x0 )=exp( kx x0 k2 ) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 3 4 FinalEvaluation X1=(5-X2) DCBA {A,B,C} {B,C,D} Game Theory Workshop 2017 Uncertainty Awareness
  • 43. Recovered Context-Dependent Choice Criteria For a speaker dataset: successfully captured mixture of objective preference and subjective context e↵ects. A B C D E Power [Watt] 50 75 100 125 150 Price [USD] 100 130 160 190 220 Choice Set #subjects {A, B, C} 45:135:145 {B, C, D} 58:137:111 {C, D, E} 95:155: 91 2 3 4 100 150 200 Evaluation Price [USD] EDCBA Obj. Eval. {A,B,C} {B,C,D} {C,D,E} -1.1 -1 -0.9 -0.8 AverageLog-Likelihood Dataset PC SP SM LinLogit NpLogit LinMix NpMix GPUA Game Theory Workshop 2017 Uncertainty Awareness
  • 44. A Result of p-beauty Contest by Real Humans Guess 2/3 of all votes (0-100). Mean is apart from the Nash equilibrium 0 (Camerer et al., 2004; Ho et al., 2006). Table: Average Choice in (2/3)-beauty Contests Subject Pool Group Size Sample Size Mean[Yi ] Caltech Board 73 73 49.4 80 year olds 33 33 37.0 High School Students 20-32 52 32.5 Economics PhDs 16 16 27.4 Portfolio Managers 26 26 24.3 Caltech Students 3 24 21.5 Game Theorists 27-54 136 19.1 Game Theory Workshop 2017 Uncertainty Awareness
  • 45. Modeling Bounded Rationality Early stopping at step k: Level-k thinking or Cognitive Hierarchy Theory (Camerer et al., 2004) Humans cannot predict the infinite future. Using non-stationary transitional state Randomization of utility via noise "it: Quantal Response Equilibrium (McKelvey and Palfrey, 1995) 8i 2{1, . . . , n} Y (t) i |Y (t 1) i = arg max Y h fi (Y , Y (t 1) i ) + "it i Both methods essentially work as regularization of rationality. Shrinkage into initial values or uniform choice probabilities Game Theory Workshop 2017 Uncertainty Awareness
  • 46. Linking ML with Game Theory (GT) via Shrinkage Principle Optimization without shrinkage Optimization with shrinkage ML GT Maximum-Likelihood estimation Bayesian estimation Transitional State or Quantal Response Equilibrium Nash Equilibrium Optimal for training data, but less generalization capability to test data Optimal for given game but less predictable to real- world decisions Shrinkage towards uniform probabilities causes suboptimality for the given game, but more predictable to real-world decisions Shrinkage towards prior causes suboptimality for training data, but more generalization capability to test data
  • 47. Early Stopping and Regularization ML as a Dynamical System to find the optimal parameters GT as a Dynamical System to find the equilibrium Parameter #1 Parameter #2 Exact Maximum-likelihood estimate (e.g., OLS) Exact Bayesian estimate shrunk towards zero (e.g., Ridge regression) 0 t=10 t=20 t=30 t=50 An early-stopping estimate (e.g., Partial Least Squares) t=0 t=1 t →∞ t=2 ... mean = 50 mean = 34 mean = 15 mean = 0 Nash Equilibrium Level-2 Transitional State
  • 48. Message ● Bayesian shrinkage ↔ Bounded rationality – Dual-personality model for contextual effects – Towards data-oriented & more realistic games: export ML regularization techniques to GT ● Analyze dynamics or uncertainty-aware equilibria – Early-stopped transitional state, or – QRE with uncertainty on each player's utility function
  • 49. Agenda 1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making 2.From Machine Learning to Game Theory #1 – Linking Uncertainty with Bounded Rationality 3.From Machine Learning to Game Theory #2— Open Questions Implied by Numerical Issues
  • 50. Additional Implications from ML ● Multiple equilibria or saddle points? ● Equilibria or “typical” transitional states? – Slow convergence – Plateau of objective function
  • 51. Recent history in ML ● Waste of ~20 years for local optimality issue – Neural Networks (NNs) have been criticized for their local optimality in fitting the parameters. – ML community has been sticked with convex optimization approaches (e.g., Support Vector Machines (Vapnik, 1995)). – Most solutions in fitting high-dimensional NNs, however, are found to be not local optima but saddle points (Bray & Dean, 2007; Dauphin+, 2014)! – After skipping saddle points by perturbation, most of the local optima empirically provide similar prediction capabilities. ● Please do not make the same mistake in multi- agent optimization problems (=games)!
  • 52. Why most are saddle points? ● See spectrum of Hessian matrices of a random- drawn non-linear function from a Gaussian process. Local minima: every eigenvalue is positive. Local maxima: every eigenvalue is negative. Univariate Function Saddle point: both positive & negative eigenvalues exist. ● In high-dimensional function, Hessian contains both positive & negative eigenvalues with high probability. Bivariate Function https://en.wikipedia.org/wiki/Saddle_point
  • 53. Open Questions for Multiple Equilibria ● If a game is very complex involving lots of parameters in pay-off or utility functions, then – Are most of its critical points unstable saddle points? – Is number of equilibria much smaller than our guess? ● If we obtain a few equilibria of such complex game, – Do most of such equilibria have similar properties? – Don't we have to obtain other equilibria?
  • 54. See Dynamics: “Typical” Transitional State? ● MLers are sensitive to convergence rate in fitting. – We are in the finite-sample & high-dimensional world: only asymptotics is powerless, and computational estimate is not equilibrium but transitional state. http://sebastianruder.com/optimizing-gradient-descent/ (Kingma & Ba, 2015)
  • 55. See Dynamics: “Typical” Transitional State? ● Mixing time of Markov processes of some games is exponential to the number of players. – E.g., (Axtell+, 2000) equilibrium: equality of wealth transitional states: severe inequality Nash demand game Equilibrium Transitional State ● What If #players is over thousands or millions? – Severe inequality in most of the time
  • 56. See Dynamics: Trapped in Plateau? ● Fitting of a Deep NN is often trapped in plateaus. – Natural gradient descent (Amari, 1997) is often used for quickly escaping from plateau. – In real-world games, are people trapped in plateaus rather than equilibria? https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html
  • 57. Conclusion ● Discussed how uncertainty should be incorporated in inductive & deductive decision making. – Quantifying uncertainty or simpler minimal estimation ● Linked Bayesian shrinkage with bounded rationality – Towards data-oriented regularized equilibrium ● Implications from high-dimensional ML – Saddle points, transitional state, and/or plateau
  • 58. THANK YOU FOR ATTENDING! Download this material from https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating- machine-learning-and-game-theory
  • 59. References References I Amari, S. (1997). Neural learning in structured parameter spaces - natural Riemannian gradient. In Advances in Neural Information Processing Systems 9, pages 127–133. MIT Press. Axtell, R., Epstein, J., and Young, H. (2000). The emergence of classes in a multi-agent bargaining model. Working papers, Brookings Institution - Working Papers. Bray, A. J. and Dean, D. S. (2007). Statistics of critical points of gaussian fields on large-dimensional spaces. Physics Review Letters, 98:150201. Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is there something quantum-like about the human mental lexicon? Journal of Mathematical Psychology, 53(5):362–377. Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitive hierarchy model of games. Quarterly Journal of Economics, 119:861–898. Game Theory Workshop 2017 Uncertainty Awareness
  • 60. References References II Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In Advances in Neural Information Processing Systems 17, pages 257–264. MIT Press, Cambridge, MA, USA. Clarke, E. H. (1971). Multipart pricing of public goods. Public Choice, 2:19–33. Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., and Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in Neural Information Processing Systems 27, pages 2933–2941. Curran Associates, Inc. de Barros, J. A. and Suppes, P. (2009). Quantum mechanics, interference, and the brain. Journal of Mathematical Psychology, 53(5):306–313. Game Theory Workshop 2017 Uncertainty Awareness
  • 61. References References III Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N., and Allenby, G. M. (2009). A probit model with structured covariance for similarity e↵ects and source of volume calculations. http://ssrn.com/abstract=1396232. Gonz´alez-Vallejo, C. (2002). Making trade-o↵s: A probabilistic and context-sensitive model of choice behavior. Psychological Review, 109:137–154. Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumer research: Issues and outlook. Journal of Consumer Research, 5:103–123. Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling the psychology of consumer and firm behavior with behavioral economics. Journal of Marketing Research, 43(3):307–331. Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research, 9:90–98. Game Theory Workshop 2017 Uncertainty Awareness
  • 62. References References IV Kakade, S. M. (2002). A natural policy gradient. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, pages 1531–1538. MIT Press. Kingma, D. and Ba, J. (2015). Adam: A method for stochastic optimization. In The International Conference on Learning Representations (ICLR), San Diego. Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternative models for capturing the compromise e↵ect. Journal of Marketing Research, 41(3):237–257. Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorization with gaussian processes. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), pages 601–608, New York, NY, USA. ACM. McFadden, D. and Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15:447–470. Game Theory Workshop 2017 Uncertainty Awareness
  • 63. References References V McFadden, D. L. (1980). Econometric models of probabilistic choice among products. Journal of Business, 53(3):13–29. McKelvey, R. and Palfrey, T. (1995). Quantal response equilibria for normal form games. Games and Economic Behavior, 10:6–38. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning (ICML 2016), pages 1928–1937. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518:529–533. Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Type indeterminacy: A model of the KT (kahnemantversky)-man. Journal of Mathematical Psychology, 53(5):349–361. Game Theory Workshop 2017 Uncertainty Awareness
  • 64. References References VI Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001). Multialternative decision field theory: A dynamic connectionist model of decision making. Psychological Review, 108:370–392. Shenoy, P. and Yu, A. J. (2013). A rational account of contextual e↵ects in preference choice: What makes for a bargain? In Proceedings of the Cognitive Science Society Conference. Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529:484–489. Simonson, I. (1989). Choice based on reasons: The case of attraction and compromise e↵ects. Journal of Consumer Research, 16:158–174. Game Theory Workshop 2017 Uncertainty Awareness
  • 65. References References VII Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063. MIT Press. Takahashi, R. and Morimura, T. (2015). Predicting preference reversals via gaussian process uncertainty aversion. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015), pages 958–967. Trueblood, J. S. (2014). The multiattribute linear ballistic accumulator model of context e↵ects in multialternative choice. Psychological Review, 121(2):179–205. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79:281–299. Usher, M. and McClelland, J. L. (2004). Loss aversion and inhibition in dynamical models of multialternative choice. Psychological Review, 111:757–769. Game Theory Workshop 2017 Uncertainty Awareness
  • 66. References References VIII Wen, C.-H. and Koppelman, F. (2001). The generalized nested logit model. Transportation Research Part B, 35:627–641. Williams, H. (1977). On the formulation of travel demand models and economic evaluation measures of user benefit. Environment and Planning A, 9(3):285–344. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. 8(3):229–256. Yai, T. (1997). Multinomial probit with structured covariance for route choice behavior. Transportation Research Part B: Methodological, 31(3):195–207. Game Theory Workshop 2017 Uncertainty Awareness