http://www.yukawa.kyoto-u.ac.jp/contents/seminar/detail.php?SNUM=51633
Tatsuji Takahashi1, Yu Kohno1,2
Seminar on science of complex systems
(organized by Yukio-Pegio Gunji),
Yukawa Institute for Theoretical Physics,
Kyoto University,
Jan. 20, 2014
1 Tokyo Denki University,
2 JSPS (from Apr., 2014)
A toy model of human cognition: Utilizing fluctuation in uncertain and non-stationary environments
1. A toy model of
human cognition:
!
Utilizing fluctuation in uncertain
and non-‐‑stationary environments
1
1,2
Tatsuji Takahashi , Yu Kohno
Seminar on science of complex systems (organized by
Yukio-‐‑Pegio Gunji), Yukawa Institute for Theoretical
Physics, Kyoto University, Jan. 20, 2014
1
Tokyo Denki University, 2JSPS (from Apr., 2014)
6. Contents
The loosely symmetric (LS) model
Cognitive properties or cognitive biases
Analysis of reconstruction of LS
Result: Efficacy in reinforcement learning
!2
7. Contents
The loosely symmetric (LS) model
Cognitive properties or cognitive biases
Analysis of reconstruction of LS
Result: Efficacy in reinforcement learning
Utilization of fluctuation in non-‐‑
stationary environments
!2
9. A toy model of human cognition
Modeling focussing on deviations from rational standards:
cognitive biases
!3
10. A toy model of human cognition
Modeling focussing on deviations from rational standards:
cognitive biases
the differences from “machines”
!3
11. A toy model of human cognition
Modeling focussing on deviations from rational standards:
cognitive biases
the differences from “machines”
Principal properties implemented in a form as simple as
possible
!3
12. A toy model of human cognition
Modeling focussing on deviations from rational standards:
cognitive biases
the differences from “machines”
Principal properties implemented in a form as simple as
possible
so that it can be analyzed and applied easily
!3
13. A toy model of human cognition
Modeling focussing on deviations from rational standards:
cognitive biases
the differences from “machines”
Principal properties implemented in a form as simple as
possible
so that it can be analyzed and applied easily
Intuition of human beings
!3
14. A toy model of human cognition
Modeling focussing on deviations from rational standards:
cognitive biases
the differences from “machines”
Principal properties implemented in a form as simple as
possible
so that it can be analyzed and applied easily
Intuition of human beings
as simple, again: not the policy (or strategy) that is learnt
through education and culture
!3
16. LS as a toy model of cognition
We treat the loosely symmetric (LS) model
proposed by Shinohara (2007). LS:
!4
17. LS as a toy model of cognition
We treat the loosely symmetric (LS) model
proposed by Shinohara (2007). LS:
models cognitive biases
!4
18. LS as a toy model of cognition
We treat the loosely symmetric (LS) model
proposed by Shinohara (2007). LS:
models cognitive biases
merely a function over co-‐‑occurrence information
between two events
!4
19. LS as a toy model of cognition
We treat the loosely symmetric (LS) model
proposed by Shinohara (2007). LS:
models cognitive biases
merely a function over co-‐‑occurrence information
between two events
faithfully describes the causal intuition of humans
!4
20. LS as a toy model of cognition
We treat the loosely symmetric (LS) model
proposed by Shinohara (2007). LS:
models cognitive biases
merely a function over co-‐‑occurrence information
between two events
faithfully describes the causal intuition of humans
which form the basis of decision-‐‑making and action for
adaptation in the world
!4
22. The loosely symmetric (LS) model
A quasi-‐‑probability function LS(-‐‑|-‐‑) like conditional
probability P(-‐‑|-‐‑).
!5
23. The loosely symmetric (LS) model
A quasi-‐‑probability function LS(-‐‑|-‐‑) like conditional
probability P(-‐‑|-‐‑).
Defined over the co-‐‑occurrence information of events p and q
!5
24. The loosely symmetric (LS) model
A quasi-‐‑probability function LS(-‐‑|-‐‑) like conditional
probability P(-‐‑|-‐‑).
Defined over the co-‐‑occurrence information of events p and q
posterior event
q
prior
event
!5
p
¬p
¬q
a
c
b
d
25. The loosely symmetric (LS) model
A quasi-‐‑probability function LS(-‐‑|-‐‑) like conditional
probability P(-‐‑|-‐‑).
Defined over the co-‐‑occurrence information of events p and q
The relationship from p to q: LS(q|p)
posterior event
q
prior
event
!5
p
¬p
¬q
a
c
b
d
26. The loosely symmetric (LS) model
A quasi-‐‑probability function LS(-‐‑|-‐‑) like conditional
probability P(-‐‑|-‐‑).
Defined over the co-‐‑occurrence information of events p and q
The relationship from p to q: LS(q|p)
LS describes the causal intuition of human beings the most
faithfully (among more than 40 existing models).
posterior event
q
prior
event
!5
p
¬p
¬q
a
c
b
d
27. The loosely symmetric (LS) model
A quasi-‐‑probability function LS(-‐‑|-‐‑) like conditional
probability P(-‐‑|-‐‑).
Defined over the co-‐‑occurrence information of events p and q
The relationship from p to q: LS(q|p)
LS describes the causal intuition of human beings the most
faithfully (among more than 40 existing models).
posterior event
a
P (q|p) =
a+b
q
prior
event
!5
p
¬p
¬q
a
c
b
d
28. The loosely symmetric (LS) model
A quasi-‐‑probability function LS(-‐‑|-‐‑) like conditional
probability P(-‐‑|-‐‑).
Defined over the co-‐‑occurrence information of events p and q
The relationship from p to q: LS(q|p)
LS describes the causal intuition of human beings the most
faithfully (among more than 40 existing models).
posterior event
a
P (q|p) =
a+b
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
+b+
a
a+c c
!5
prior
event
p
¬p
¬q
a
c
b
d
29. The loosely symmetric (LS) model
posterior event
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
a
a+c c
+b+
!6
prior
event
p
¬p
¬q
a
c
b
d
30. The loosely symmetric (LS) model
Inductive inference of causal relationship
posterior event
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
a
a+c c
+b+
!6
prior
event
p
¬p
¬q
a
c
b
d
31. The loosely symmetric (LS) model
Inductive inference of causal relationship
How humans form the intensity of causal relationship from p to
q,
posterior event
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
a
a+c c
+b+
!6
prior
event
p
¬p
¬q
a
c
b
d
32. The loosely symmetric (LS) model
Inductive inference of causal relationship
How humans form the intensity of causal relationship from p to
q,
when p is the candidate cause of the effect q in focus?
posterior event
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
a
a+c c
+b+
!6
prior
event
p
¬p
¬q
a
c
b
d
33. The loosely symmetric (LS) model
Inductive inference of causal relationship
How humans form the intensity of causal relationship from p to
q,
when p is the candidate cause of the effect q in focus?
The function form of f(a, b, c, d) for the human causal intuition
posterior event
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
a
a+c c
+b+
!6
prior
event
p
¬p
¬q
a
c
b
d
34. The loosely symmetric (LS) model
Inductive inference of causal relationship
How humans form the intensity of causal relationship from p to
q,
when p is the candidate cause of the effect q in focus?
The function form of f(a, b, c, d) for the human causal intuition
posterior event
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
a
a+c c
+b+
prior
event
p
¬p
¬q
a
c
b
d
Meta analysis as in Hattori & Oaksford (2007)
!6
35. The loosely symmetric (LS) model
Inductive inference of causal relationship
How humans form the intensity of causal relationship from p to
q,
when p is the candidate cause of the effect q in focus?
The function form of f(a, b, c, d) for the human causal intuition
posterior event
LS(q|p) =
a+
a+
b
b+d d
q
b
b+d d
prior
event
a
a+c c
+b+
p
¬p
¬q
a
c
b
d
Meta analysis as in Hattori & Oaksford (2007)
Experiment
r for LS
r for ΔP
AS95
0.95
0.88
BCC03.1 BCC03.3
0.98
0.92
0.98
0.84
!6
H03
0.98
0.00
H06
0.97
0.71
LS00 W03.2 W03.6
0.85
0.95
0.85
0.88
0.28
0.46
38. In 2-‐‑armed bandit problems
LS used as the value
function in
reinforcement learning:
later on bandit problems
!7
39. In 2-‐‑armed bandit problems
LS used as the value
function in
reinforcement learning:
later on bandit problems
The agent evaluates the
actions according to
the causal intuition of
humans.
!7
40. In 2-‐‑armed bandit problems
LS used as the value
function in
reinforcement learning:
1.0
LS
CP
ToWH0.5L
SMH0.3L
SMH0.7L
0.9
Accuracy rate
The agent evaluates the
actions according to
the causal intuition of
humans.
later on bandit problems
0.8
0.7
0.6
0.5
1
5
10
50
step
!7
100
500 1000
41. In 2-‐‑armed bandit problems
LS used as the value
function in
reinforcement learning:
Very good adaptation
to the environment,
both in short term and
long term.
1.0
LS
CP
ToWH0.5L
SMH0.3L
SMH0.7L
0.9
Accuracy rate
The agent evaluates the
actions according to
the causal intuition of
humans.
later on bandit problems
0.8
0.7
0.6
0.5
1
5
10
50
step
!7
100
500 1000
43. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
44. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
45. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
Comparative valuation
46. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
Comparative valuation
psychology: Tversky & Kahneman, Science, 1974.
47. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
Comparative valuation
psychology: Tversky & Kahneman, Science, 1974.
brain science: Daw et al., Nature, 2006.
48. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
Comparative valuation
psychology: Tversky & Kahneman, Science, 1974.
brain science: Daw et al., Nature, 2006.
Idiosyncratic, asymmetric risk a8itude as in the prospect theory
49. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
Comparative valuation
psychology: Tversky & Kahneman, Science, 1974.
brain science: Daw et al., Nature, 2006.
Idiosyncratic, asymmetric risk a8itude as in the prospect theory
Kahneman & Tversky, Am., Psy., 1984, Boorman et al., Neuron, 2009
50. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
Comparative valuation
psychology: Tversky & Kahneman, Science, 1974.
brain science: Daw et al., Nature, 2006.
Idiosyncratic, asymmetric risk a8itude as in the prospect theory
Kahneman & Tversky, Am., Psy., 1984, Boorman et al., Neuron, 2009
Satisficing
51. The loosely symmetric (LS) model
From the analysis of LS, we found the following cognitive
properties:
Ground-‐‑invariance (like visual acention, Takahashi et al., 2010)
Comparative valuation
psychology: Tversky & Kahneman, Science, 1974.
brain science: Daw et al., Nature, 2006.
Idiosyncratic, asymmetric risk a8itude as in the prospect theory
Kahneman & Tversky, Am., Psy., 1984, Boorman et al., Neuron, 2009
Satisficing
Simon, Psy. Rev., 1954, Kolling et al., Science, 2012.
55. Principal human cognitive biases
Humans:
Satisficing: do not optimize but satisfice.
become satisfied when it is becer than the reference
level
!9
56. Principal human cognitive biases
Humans:
Satisficing: do not optimize but satisfice.
become satisfied when it is becer than the reference
level
Comparative valuation: evaluate states and actions in
a relative manner
!9
57. Principal human cognitive biases
Humans:
Satisficing: do not optimize but satisfice.
become satisfied when it is becer than the reference
level
Comparative valuation: evaluate states and actions in
a relative manner
Asymmetric risk a:itude: asymmetrically recognize
gain and loss
!9
58. Satisficing
A1
A2
A1
A2
reference
all arms are over reference
No pursuit of arms over the reference level given
reference
all arms are under reference
Search hard for an arm over the reference level
59. Satisficing
A1
A2
A1
A2
reference
all arms are over reference
No pursuit of arms over the reference level given
reference
all arms are under reference
Search hard for an arm over the reference level
Risk a:itude (Reliability consideration)
Risk-avoiding over the reference
Expected value
0.75
win (o) and lose (x) in the past
=
○×○○○
×○○○○
○○○×○
○○×○×
comparison considering reliability
Choose 15/20 than 3/4
75%
○×○○
>
Risk-seeking under the reference
25%
reflection effect
=
×○×××
○××××
×××○×
××○×○
25%
×○××
<
Gamble on 1/4 rather than 5/20
60. Satisficing
A1
A2
A1
reference
all arms are over reference
No pursuit of arms over the reference level given
reference
all arms are under reference
Search hard for an arm over the reference level
A2
Risk a:itude (Reliability consideration)
Risk-avoiding over the reference
Expected value
0.75
win (o) and lose (x) in the past
=
○×○○○
×○○○○
○○○×○
○○×○×
○×○○
Choose 15/20 than 3/4
Comparative evaluation
value of A1
A2
75%
=
25%
reflection effect
×○×××
○××××
×××○×
××○×○
>
comparison considering reliability
A1
Risk-seeking under the reference
25%
×○××
<
Gamble on 1/4 rather than 5/20
Try arms other than A1 by
comparative valuation
value of A2
(see-saw)
Choose A1 and lose
comparative
absolute
A1
A2
61. The generalized LS with variable
reference (LSVR)
Variable Reference
Abstract image
LSVR is a generalization of LS with an autonomously
adjusted parameter of reference.
63. n-‐‑armed bandit problem (nABP)
The simplest framework in reinforcement learning,
exhibiting the exploration-‐‑exploitation dilemma and
the speed-‐‑accuracy tradeoff.
!12
64. n-‐‑armed bandit problem (nABP)
The simplest framework in reinforcement learning,
exhibiting the exploration-‐‑exploitation dilemma and
the speed-‐‑accuracy tradeoff.
It is to maximize the total reward acquired from n
actions (sources) with unknown reward distribution.
!12
65. n-‐‑armed bandit problem (nABP)
The simplest framework in reinforcement learning,
exhibiting the exploration-‐‑exploitation dilemma and
the speed-‐‑accuracy tradeoff.
It is to maximize the total reward acquired from n
actions (sources) with unknown reward distribution.
One-‐‑armed bandit is a slot machine that gives a
reward (win) or not (lose).
!12
66. n-‐‑armed bandit problem (nABP)
The simplest framework in reinforcement learning,
exhibiting the exploration-‐‑exploitation dilemma and
the speed-‐‑accuracy tradeoff.
It is to maximize the total reward acquired from n
actions (sources) with unknown reward distribution.
One-‐‑armed bandit is a slot machine that gives a
reward (win) or not (lose).
n-‐‑armed bandit is a slot machine with n arms that have
different probability of winning.
!12
69. Performance indices for nABP
Accuracy:
the average percentage of choosing the optimal
action
!13
70. Performance indices for nABP
Accuracy:
the average percentage of choosing the optimal
action
Regret (expected loss):
!13
71. Performance indices for nABP
Accuracy:
the average percentage of choosing the optimal
action
Regret (expected loss):
the difference of the actually acquired
accumulated rewards from the best possible
sequence of actions (where accuracy=1.0 all
through the trial)
!13
72. Result
n=100, the reward probability for each action is taken uniformly from [0,1].
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
10
Expected loss
0.6
0
0.0
0.2
5
0.4
Accuracy rate
0.8
15
1.0
LS
LS-VR
UCB1-tuned
0e+00
2e+05
4e+05
6e+05
8e+05
1e+06
0e+00
Steps
Accuracy: highest
The more there are actions, the better
the performance of LSVR becomes.
2e+05
4e+05
6e+05
8e+05
1e+06
Steps
Regret: smallest
Kohno & Takahashi, 2012; in prep.
74. Result in non-stationary environment 1
n=16, the reward probability is from [0,1].
The probabilities are totally reset every 10,000 steps.
200
100
150
Expected loss
0.6
0.4
50
0.2
0
0.0
Accuracy rate
0.8
250
1.0
LS
LS-VR
UCB1-tuned
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
300
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
40000
50000
0
10000
Steps
Accuracy: highest
Kohno & Takahashi, in prep.
20000
30000
40000
Steps
Regret: smallest
50000
75. Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
0.6
0.4
0.2
0.0
Accuracy rate
0.8
1.0
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
Steps
40000
50000
Accuracy (the rate of the optimal
action at the time chosen)
76. Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.
0.6
0.4
0.2
0.0
Accuracy rate
0.8
1.0
Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
Steps
40000
50000
Accuracy (the rate of the optimal
action at the time chosen)
77. Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.
0.6
0.4
0.2
0.0
Accuracy rate
0.8
1.0
Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
Steps
40000
50000
Accuracy (the rate of the optimal
action at the time chosen)
78. Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.
0.6
0.2
0.4
If the reward is given
deterministically, this is
impossible.
0.0
Accuracy rate
0.8
1.0
Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
Steps
40000
50000
Accuracy (the rate of the optimal
action at the time chosen)
79. Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.
0.6
0.2
0.4
If the reward is given
deterministically, this is
impossible.
0.0
Accuracy rate
0.8
1.0
Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
Steps
40000
50000
Accuracy (the rate of the optimal
action at the time chosen)
80. Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.
0.6
0.2
0.4
If the reward is given
deterministically, this is
impossible.
Efficient search utilizing
uncertainty and fluctuation
in non-stationary
Accuracy (the rate of the optimal
environments
action at the time chosen)
0.0
Accuracy rate
0.8
1.0
Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
Steps
40000
50000
83. Results
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
The more there are options,
the becer the performance
of LSVR becomes.
0.6
0.2
0.4
stationary
0.0
Accuracy rate
0.8
1.0
LS
LS-VR
UCB1-tuned
0e+00
2e+05
4e+05
6e+05
8e+05
1e+06
Steps
!18
84. Results
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
The more there are options,
the becer the performance
of LSVR becomes.
0.6
stationary
0.0
0.2
0.4
Accuracy rate
0.8
1.0
LS
LS-VR
UCB1-tuned
0e+00
2e+05
4e+05
6e+05
8e+05
1e+06
Steps
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
0.6
non-stationary 2
0.0
0.2
0.4
Accuracy rate
0.8
1.0
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
40000
50000
Steps
LSVR can trace the unobserved
change, amplifying fluctuation.
!18
85. Results
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
The more there are options,
the becer the performance
of LSVR becomes.
0.6
stationary
2e+05
4e+05
6e+05
8e+05
1e+06
non-stationary!
synchronous
0.2
0.6
0.0
non-stationary 2
0.4
Accuracy rate
0.8
0.4
Accuracy rate
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
1.0
0.8
Steps
0.6
0e+00
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999
LS
LS-VR
UCB1-tuned
1.0
0.0
0.2
0.4
Accuracy rate
0.8
1.0
LS
LS-VR
UCB1-tuned
0
10000
20000
30000
0.2
Steps
10000
20000
30000
40000
50000
Steps
LSVR can trace the unobserved
change, amplifying fluctuation.
50000
LSVR can trace the change in
non-‐‑stationary environments.
0.0
0
40000
!18
89. Discussion
The cognitive biases of humans, when combined:
Effectively works for adaptation under uncertainty
Conflates an action and the set of the actions
through comparative valuation.
!19
90. Discussion
The cognitive biases of humans, when combined:
Effectively works for adaptation under uncertainty
Conflates an action and the set of the actions
through comparative valuation.
Symbolizes the whole situation into a virtual action.
!19
91. Discussion
The cognitive biases of humans, when combined:
Effectively works for adaptation under uncertainty
Conflates an action and the set of the actions
through comparative valuation.
Symbolizes the whole situation into a virtual action.
Utilizes fluctuation from uncertainty and enables
adaptation to non-‐‑stationary environments.
!19
93. Conflating part and whole
Comparative valuation conflates the
information of an action and of the whole
set of actions.
!20
94. Conflating part and whole
Comparative valuation conflates the
information of an action and of the whole
set of actions.
Universal in living systems from slime
molds (Lacy & Beekman, 2011) to neurons
(Royer & Paré, 2003) to animals and human
beings.
!20
95. Relative evaluation is especially
important
if absolute
value
of A1
value
of A2 if relative
Try arms other than A1
by relative evaluation
(see-saw)
Choose A1 and lose
value
of A1
value
of A2
value
of A1
value
of A2
96. Relative evaluation is especially
important
★ Relative evaluation:
if absolute
value
of A1
value
of A2 if relative
Try arms other than A1
by relative evaluation
(see-saw)
Choose A1 and lose
value
of A1
value
of A2
value
of A1
value
of A2
97. Relative evaluation is especially
important
★ Relative evaluation:
★ is what even slime molds and real neural networks (conservation
of synaptic weights) do. Behavioral economics found that humans
comparatively evaluate actions and states.
if absolute
value
of A1
value
of A2 if relative
Try arms other than A1
by relative evaluation
(see-saw)
Choose A1 and lose
value
of A1
value
of A2
value
of A1
value
of A2
98. Relative evaluation is especially
important
★ Relative evaluation:
★ is what even slime molds and real neural networks (conservation
of synaptic weights) do. Behavioral economics found that humans
comparatively evaluate actions and states.
★ weakens the dilemma between exploitation and exploration with
the see-‐‑saw game like competition among arms:
if absolute
value
of A1
value
of A2 if relative
Try arms other than A1
by relative evaluation
(see-saw)
Choose A1 and lose
value
of A1
value
of A2
value
of A1
value
of A2
99. Relative evaluation is especially
important
★ Relative evaluation:
★ is what even slime molds and real neural networks (conservation
of synaptic weights) do. Behavioral economics found that humans
comparatively evaluate actions and states.
★ weakens the dilemma between exploitation and exploration with
the see-‐‑saw game like competition among arms:
★ Through failure (low reward), choice of greedy action may quickly
trigger to the next choice of the previously second best, non-‐‑greedy arm.
if absolute
value
of A1
value
of A2 if relative
Try arms other than A1
by relative evaluation
(see-saw)
Choose A1 and lose
value
of A1
value
of A2
value
of A1
value
of A2
100. Relative evaluation is especially
important
★ Relative evaluation:
★ is what even slime molds and real neural networks (conservation
of synaptic weights) do. Behavioral economics found that humans
comparatively evaluate actions and states.
★ weakens the dilemma between exploitation and exploration with
the see-‐‑saw game like competition among arms:
★ Through failure (low reward), choice of greedy action may quickly
trigger to the next choice of the previously second best, non-‐‑greedy arm.
★ Through success (high reward), choice of greedy action may quickly
trigger to focussing on the currently greedy action, lessening the
possibility of choosing non-‐‑greedy arms by decreasing the value of other
arms.
if absolute
value
of A1
value
of A2 if relative
Try arms other than A1
by relative evaluation
(see-saw)
Choose A1 and lose
value
of A1
value
of A2
value
of A1
value
of A2
101. Symbolization of the whole and
comparative valuation with multi actions
777
777
777
A1
A2
An
...
102. Symbolization of the whole and
comparative valuation with multi actions
777
777
777
A1
A2
An
...
103. Symbolization of the whole and
comparative valuation with multi actions
777
777
777
A1
A2
An
...
104. Symbolization of the whole and
comparative valuation with multi actions
Virtual machine
representing the whole
777
Ag
777
777
777
A1
A2
An
...
105. Comparative valuation with a virtual
action representing the whole
Virtual machine representing the whole
777
777
777
777
Ag
A1
A2
A2
An
...
“>” or “<”?
“>” or “<”?
“>” or “<”?
106. Comparative valuation with a virtual
action representing the whole
Virtual machine representing the whole
777
777
777
777
Ag
A1
A2
A2
An
...
“>” or “<”?
“>” or “<”?
“>” or “<”?
108. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
24
109. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
24
110. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
24
111. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
24
112. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
24
113. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
Brain science and the three cognitive biases:
24
114. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
Brain science and the three cognitive biases:
Satisficing
24
115. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
Brain science and the three cognitive biases:
Satisficing
Kolling et al., Science, 2012.
24
116. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
Brain science and the three cognitive biases:
Satisficing
Kolling et al., Science, 2012.
Comparative valuation of state-‐‑action
value
24
117. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
Brain science and the three cognitive biases:
Satisficing
Kolling et al., Science, 2012.
Comparative valuation of state-‐‑action
value
Daw et al., Nature, 2006.
24
118. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
Brain science and the three cognitive biases:
Satisficing
Kolling et al., Science, 2012.
Comparative valuation of state-‐‑action
value
Daw et al., Nature, 2006.
Idiosyncratic risk evaluation
24
119. Conclusion
The cognitive biases that look irrational are, when appropriately
combined together as in humans, actually rational for adapting to
uncertain environments and survival through evolution
Applicable in engineering, in machine learning and robot control
Implications to brain science (brain as a machine learning equipment)
Modeling PFC and vmPFC
Brain science and the three cognitive biases:
Satisficing
Kolling et al., Science, 2012.
Comparative valuation of state-‐‑action
value
Daw et al., Nature, 2006.
Idiosyncratic risk evaluation
Boorman et al., Neuron, 2009.
24
122. Applications of bandit problems
Game-tree
★ Monte-‐‑Carlo tree search (Go-‐‑AI)
★ Online advertisement
123. Applications of bandit problems
Game-tree
★ Monte-‐‑Carlo tree search (Go-‐‑AI)
★ Online advertisement
★ e.g., A/B test
124. Applications of bandit problems
Game-tree
★ Monte-‐‑Carlo tree search (Go-‐‑AI)
★ Online advertisement
★ e.g., A/B test
★ Design of medical treatment
125. Applications of bandit problems
Game-tree
★ Monte-‐‑Carlo tree search (Go-‐‑AI)
★ Online advertisement
★ e.g., A/B test
★ Design of medical treatment
★ Reinforcement learning