SlideShare uma empresa Scribd logo
1 de 126
Baixar para ler offline
A  toy  model  of    
human  cognition:    
!

Utilizing  fluctuation  in  uncertain  
and  non-­‐‑stationary  environments
1

1,2

Tatsuji  Takahashi ,  Yu  Kohno   
Seminar  on  science  of  complex  systems  (organized  by  
Yukio-­‐‑Pegio  Gunji),  Yukawa  Institute  for  Theoretical  
Physics,  Kyoto  University,  Jan.  20,  2014  
1

Tokyo  Denki  University,  2JSPS  (from  Apr.,  2014)
Contents

!2
Contents
The  loosely  symmetric  (LS)  model  

!2
Contents
The  loosely  symmetric  (LS)  model  
Cognitive  properties  or  cognitive  biases

!2
Contents
The  loosely  symmetric  (LS)  model  
Cognitive  properties  or  cognitive  biases
Analysis  of  reconstruction  of  LS  

!2
Contents
The  loosely  symmetric  (LS)  model  
Cognitive  properties  or  cognitive  biases
Analysis  of  reconstruction  of  LS  
Result:  Efficacy  in  reinforcement  learning

!2
Contents
The  loosely  symmetric  (LS)  model  
Cognitive  properties  or  cognitive  biases
Analysis  of  reconstruction  of  LS  
Result:  Efficacy  in  reinforcement  learning
Utilization  of  fluctuation  in  non-­‐‑
stationary  environments
!2
A  toy  model  of  human  cognition

!3
A  toy  model  of  human  cognition
Modeling  focussing  on  deviations  from  rational  standards:  
cognitive  biases

!3
A  toy  model  of  human  cognition
Modeling  focussing  on  deviations  from  rational  standards:  
cognitive  biases
the  differences  from  “machines”

!3
A  toy  model  of  human  cognition
Modeling  focussing  on  deviations  from  rational  standards:  
cognitive  biases
the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  
possible

!3
A  toy  model  of  human  cognition
Modeling  focussing  on  deviations  from  rational  standards:  
cognitive  biases
the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  
possible
so  that  it  can  be  analyzed  and  applied  easily

!3
A  toy  model  of  human  cognition
Modeling  focussing  on  deviations  from  rational  standards:  
cognitive  biases
the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  
possible
so  that  it  can  be  analyzed  and  applied  easily

Intuition  of  human  beings

!3
A  toy  model  of  human  cognition
Modeling  focussing  on  deviations  from  rational  standards:  
cognitive  biases
the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  
possible
so  that  it  can  be  analyzed  and  applied  easily

Intuition  of  human  beings
as  simple,  again:  not  the  policy  (or  strategy)  that  is  learnt  
through  education  and  culture
!3
LS  as  a  toy  model  of  cognition

!4
LS  as  a  toy  model  of  cognition
We  treat  the  loosely  symmetric  (LS)  model  
proposed  by  Shinohara  (2007).  LS:  

!4
LS  as  a  toy  model  of  cognition
We  treat  the  loosely  symmetric  (LS)  model  
proposed  by  Shinohara  (2007).  LS:  
models  cognitive  biases

!4
LS  as  a  toy  model  of  cognition
We  treat  the  loosely  symmetric  (LS)  model  
proposed  by  Shinohara  (2007).  LS:  
models  cognitive  biases
merely  a  function  over  co-­‐‑occurrence  information  
between  two  events

!4
LS  as  a  toy  model  of  cognition
We  treat  the  loosely  symmetric  (LS)  model  
proposed  by  Shinohara  (2007).  LS:  
models  cognitive  biases
merely  a  function  over  co-­‐‑occurrence  information  
between  two  events
faithfully  describes  the  causal  intuition  of  humans

!4
LS  as  a  toy  model  of  cognition
We  treat  the  loosely  symmetric  (LS)  model  
proposed  by  Shinohara  (2007).  LS:  
models  cognitive  biases
merely  a  function  over  co-­‐‑occurrence  information  
between  two  events
faithfully  describes  the  causal  intuition  of  humans
which  form  the  basis  of  decision-­‐‑making  and  action  for  
adaptation  in  the  world
!4
The  loosely  symmetric  (LS)  model

!5
The  loosely  symmetric  (LS)  model
A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  
probability  P(-­‐‑|-­‐‑).

!5
The  loosely  symmetric  (LS)  model
A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  
probability  P(-­‐‑|-­‐‑).
Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

!5
The  loosely  symmetric  (LS)  model
A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  
probability  P(-­‐‑|-­‐‑).
Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

posterior event

q
prior
event
!5

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  
probability  P(-­‐‑|-­‐‑).
Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q
The  relationship  from  p  to  q:  LS(q|p)

posterior event

q
prior
event
!5

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  
probability  P(-­‐‑|-­‐‑).
Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q
The  relationship  from  p  to  q:  LS(q|p)
LS  describes  the  causal  intuition  of  human  beings  the  most  
faithfully  (among  more  than  40  existing  models).  

posterior event

q
prior
event
!5

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  
probability  P(-­‐‑|-­‐‑).
Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q
The  relationship  from  p  to  q:  LS(q|p)
LS  describes  the  causal  intuition  of  human  beings  the  most  
faithfully  (among  more  than  40  existing  models).  

posterior event

a
P (q|p) =
a+b

q
prior
event
!5

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  
probability  P(-­‐‑|-­‐‑).
Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q
The  relationship  from  p  to  q:  LS(q|p)
LS  describes  the  causal  intuition  of  human  beings  the  most  
faithfully  (among  more  than  40  existing  models).  

posterior event

a
P (q|p) =
a+b
LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d

+b+

a
a+c c
!5

prior
event

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model

posterior event

LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d
a
a+c c

+b+

!6

prior
event

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
Inductive  inference  of  causal  relationship

posterior event

LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d
a
a+c c

+b+

!6

prior
event

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
Inductive  inference  of  causal  relationship
How  humans  form  the  intensity  of  causal  relationship  from  p  to  
q,  

posterior event

LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d
a
a+c c

+b+

!6

prior
event

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
Inductive  inference  of  causal  relationship
How  humans  form  the  intensity  of  causal  relationship  from  p  to  
q,  
when  p  is  the  candidate  cause  of  the  effect  q  in  focus?
posterior event

LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d
a
a+c c

+b+

!6

prior
event

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
Inductive  inference  of  causal  relationship
How  humans  form  the  intensity  of  causal  relationship  from  p  to  
q,  
when  p  is  the  candidate  cause  of  the  effect  q  in  focus?
The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition
posterior event

LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d
a
a+c c

+b+

!6

prior
event

p
¬p

¬q

a
c

b
d
The  loosely  symmetric  (LS)  model
Inductive  inference  of  causal  relationship
How  humans  form  the  intensity  of  causal  relationship  from  p  to  
q,  
when  p  is  the  candidate  cause  of  the  effect  q  in  focus?
The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition
posterior event

LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d
a
a+c c

+b+

prior
event

p
¬p

¬q

a
c

b
d

Meta analysis as in Hattori & Oaksford (2007)

!6
The  loosely  symmetric  (LS)  model
Inductive  inference  of  causal  relationship
How  humans  form  the  intensity  of  causal  relationship  from  p  to  
q,  
when  p  is  the  candidate  cause  of  the  effect  q  in  focus?
The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition
posterior event

LS(q|p) =

a+
a+

b
b+d d

q

b
b+d d

prior
event

a
a+c c

+b+

p
¬p

¬q

a
c

b
d

Meta analysis as in Hattori & Oaksford (2007)
Experiment
r for LS
r for ΔP

AS95
0.95
0.88

BCC03.1 BCC03.3

0.98
0.92

0.98
0.84

!6

H03
0.98
0.00

H06
0.97
0.71

LS00 W03.2 W03.6
0.85
0.95
0.85
0.88
0.28
0.46
In  2-­‐‑armed  bandit  problems

!7
In  2-­‐‑armed  bandit  problems
later on bandit problems

!7
In  2-­‐‑armed  bandit  problems
LS  used  as  the  value  
function  in  
reinforcement  learning:  

later on bandit problems

!7
In  2-­‐‑armed  bandit  problems
LS  used  as  the  value  
function  in  
reinforcement  learning:  

later on bandit problems

The  agent  evaluates  the  
actions  according  to  
the  causal  intuition  of  
humans.

!7
In  2-­‐‑armed  bandit  problems
LS  used  as  the  value  
function  in  
reinforcement  learning:  
1.0

LS
CP
ToWH0.5L
SMH0.3L
SMH0.7L

0.9
Accuracy rate

The  agent  evaluates  the  
actions  according  to  
the  causal  intuition  of  
humans.

later on bandit problems

0.8
0.7
0.6
0.5
1

5

10

50
step

!7

100

500 1000
In  2-­‐‑armed  bandit  problems
LS  used  as  the  value  
function  in  
reinforcement  learning:  

Very  good  adaptation  
to  the  environment,  
both  in  short  term  and  
long  term.

1.0

LS
CP
ToWH0.5L
SMH0.3L
SMH0.7L

0.9
Accuracy rate

The  agent  evaluates  the  
actions  according  to  
the  causal  intuition  of  
humans.

later on bandit problems

0.8
0.7
0.6
0.5
1

5

10

50
step

!7

100

500 1000
The  loosely  symmetric  (LS)  model
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
Comparative  valuation
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
Comparative  valuation
psychology:  Tversky  &  Kahneman,  Science,  1974.
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
Comparative  valuation
psychology:  Tversky  &  Kahneman,  Science,  1974.
brain  science:  Daw  et  al.,  Nature,  2006.
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
Comparative  valuation
psychology:  Tversky  &  Kahneman,  Science,  1974.
brain  science:  Daw  et  al.,  Nature,  2006.
Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
Comparative  valuation
psychology:  Tversky  &  Kahneman,  Science,  1974.
brain  science:  Daw  et  al.,  Nature,  2006.
Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory
Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
Comparative  valuation
psychology:  Tversky  &  Kahneman,  Science,  1974.
brain  science:  Daw  et  al.,  Nature,  2006.
Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory
Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009
Satisficing  
The  loosely  symmetric  (LS)  model
From  the  analysis  of  LS,  we  found  the  following  cognitive  
properties:  
Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
Comparative  valuation
psychology:  Tversky  &  Kahneman,  Science,  1974.
brain  science:  Daw  et  al.,  Nature,  2006.
Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory
Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009
Satisficing  
Simon,  Psy.  Rev.,  1954,  Kolling  et  al.,  Science,  2012.
Principal  human  cognitive  biases

!9
Principal  human  cognitive  biases
Humans:  

!9
Principal  human  cognitive  biases
Humans:  
Satisficing:  do  not  optimize  but  satisfice.

!9
Principal  human  cognitive  biases
Humans:  
Satisficing:  do  not  optimize  but  satisfice.
become  satisfied  when  it  is  becer  than  the  reference  
level

!9
Principal  human  cognitive  biases
Humans:  
Satisficing:  do  not  optimize  but  satisfice.
become  satisfied  when  it  is  becer  than  the  reference  
level
Comparative  valuation:  evaluate  states  and  actions  in  
a  relative  manner

!9
Principal  human  cognitive  biases
Humans:  
Satisficing:  do  not  optimize  but  satisfice.
become  satisfied  when  it  is  becer  than  the  reference  
level
Comparative  valuation:  evaluate  states  and  actions  in  
a  relative  manner
Asymmetric  risk  a:itude:  asymmetrically  recognize  
gain  and  loss
!9
Satisficing
A1

A2

A1

A2

reference
all arms are over reference
No pursuit of arms over the reference level given
reference
all arms are under reference
Search hard for an arm over the reference level
Satisficing
A1

A2

A1

A2

reference
all arms are over reference
No pursuit of arms over the reference level given
reference
all arms are under reference
Search hard for an arm over the reference level

Risk  a:itude  (Reliability  consideration)
Risk-avoiding over the reference
Expected value

0.75

win (o) and lose (x) in the past

=

○×○○○
×○○○○
○○○×○
○○×○×

comparison considering reliability

Choose 15/20 than 3/4

75%
○×○○

>

Risk-seeking under the reference
25%

reflection effect

=

×○×××
○××××
×××○×
××○×○

25%
×○××

<

Gamble on 1/4 rather than 5/20
Satisficing
A1

A2

A1

reference
all arms are over reference
No pursuit of arms over the reference level given
reference
all arms are under reference
Search hard for an arm over the reference level

A2

Risk  a:itude  (Reliability  consideration)
Risk-avoiding over the reference
Expected value

0.75

win (o) and lose (x) in the past

=

○×○○○
×○○○○
○○○×○
○○×○×

○×○○

Choose 15/20 than 3/4
Comparative  evaluation
value of A1

A2

75%

=

25%

reflection effect

×○×××
○××××
×××○×
××○×○

>

comparison considering reliability

A1

Risk-seeking under the reference
25%
×○××

<

Gamble on 1/4 rather than 5/20
Try arms other than A1 by
comparative valuation
value of A2
(see-saw)

Choose A1 and lose
comparative
absolute

A1

A2
The  generalized  LS  with  variable  
reference  (LSVR)

Variable Reference

Abstract image

LSVR is a generalization of LS with an autonomously
adjusted parameter of reference.
n-­‐‑armed  bandit  problem  (nABP)

!12
n-­‐‑armed  bandit  problem  (nABP)
The  simplest  framework  in  reinforcement  learning,  
exhibiting  the  exploration-­‐‑exploitation  dilemma  and  
the  speed-­‐‑accuracy  tradeoff.

!12
n-­‐‑armed  bandit  problem  (nABP)
The  simplest  framework  in  reinforcement  learning,  
exhibiting  the  exploration-­‐‑exploitation  dilemma  and  
the  speed-­‐‑accuracy  tradeoff.
It  is  to  maximize  the  total  reward  acquired  from  n  
actions  (sources)  with  unknown  reward  distribution.

!12
n-­‐‑armed  bandit  problem  (nABP)
The  simplest  framework  in  reinforcement  learning,  
exhibiting  the  exploration-­‐‑exploitation  dilemma  and  
the  speed-­‐‑accuracy  tradeoff.
It  is  to  maximize  the  total  reward  acquired  from  n  
actions  (sources)  with  unknown  reward  distribution.

One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a  
reward  (win)  or  not  (lose).

!12
n-­‐‑armed  bandit  problem  (nABP)
The  simplest  framework  in  reinforcement  learning,  
exhibiting  the  exploration-­‐‑exploitation  dilemma  and  
the  speed-­‐‑accuracy  tradeoff.
It  is  to  maximize  the  total  reward  acquired  from  n  
actions  (sources)  with  unknown  reward  distribution.

One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a  
reward  (win)  or  not  (lose).
n-­‐‑armed  bandit  is  a  slot  machine  with  n  arms  that  have  
different  probability  of  winning.  
!12
Performance  indices  for  nABP

!13
Performance  indices  for  nABP
Accuracy:  

!13
Performance  indices  for  nABP
Accuracy:  
the  average  percentage  of  choosing  the  optimal  
action

!13
Performance  indices  for  nABP
Accuracy:  
the  average  percentage  of  choosing  the  optimal  
action
Regret  (expected  loss):  

!13
Performance  indices  for  nABP
Accuracy:  
the  average  percentage  of  choosing  the  optimal  
action
Regret  (expected  loss):  
the  difference  of  the  actually  acquired  
accumulated  rewards  from  the  best  possible  
sequence  of  actions  (where  accuracy=1.0  all  
through  the  trial)
!13
Result
n=100, the reward probability for each action is taken uniformly from [0,1].
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

10

Expected loss

0.6

0

0.0

0.2

5

0.4

Accuracy rate

0.8

15

1.0

LS
LS-VR
UCB1-tuned

0e+00

2e+05

4e+05

6e+05

8e+05

1e+06

0e+00

Steps

Accuracy: highest
The more there are actions, the better
the performance of LSVR becomes.

2e+05

4e+05

6e+05

8e+05

1e+06

Steps

Regret: smallest
Kohno & Takahashi, 2012; in prep.
Non-­‐‑stationary  bandits

The  reward  probabilities  change  while  
playing.

!15
Result in non-stationary environment 1
n=16, the reward probability is from [0,1]. 	

The probabilities are totally reset every 10,000 steps.

200
100

150

Expected loss

0.6
0.4

50

0.2

0

0.0

Accuracy rate

0.8

250

1.0

LS
LS-VR
UCB1-tuned
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

300

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

0

10000

20000

30000

40000

50000

0

10000

Steps

Accuracy: highest
Kohno & Takahashi, in prep.

20000

30000

40000

Steps

Regret: smallest

50000
Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

0.6
0.4
0.2
0.0

Accuracy rate

0.8

1.0

LS
LS-VR
UCB1-tuned

0

10000

20000

30000
Steps

40000

50000

Accuracy (the rate of the optimal
action at the time chosen)
Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.

0.6
0.4
0.2
0.0

Accuracy rate

0.8

1.0

Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

0

10000

20000

30000
Steps

40000

50000

Accuracy (the rate of the optimal
action at the time chosen)
Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.

0.6
0.4
0.2
0.0

Accuracy rate

0.8

1.0

Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

0

10000

20000

30000
Steps

40000

50000

Accuracy (the rate of the optimal
action at the time chosen)
Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.

0.6
0.2

0.4

If the reward is given
deterministically, this is
impossible.

0.0

Accuracy rate

0.8

1.0

Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

0

10000

20000

30000
Steps

40000

50000

Accuracy (the rate of the optimal
action at the time chosen)
Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.

0.6
0.2

0.4

If the reward is given
deterministically, this is
impossible.

0.0

Accuracy rate

0.8

1.0

Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

0

10000

20000

30000
Steps

40000

50000

Accuracy (the rate of the optimal
action at the time chosen)
Result in non-stationary environment 2
n=20, the initial probability from [0,1]. The probability of each
action is reset at the probability of 0.0001.

0.6
0.2

0.4

If the reward is given
deterministically, this is
impossible.

Efficient search utilizing
uncertainty and fluctuation
in non-stationary
Accuracy (the rate of the optimal
environments
action at the time chosen)
0.0

Accuracy rate

0.8

1.0

Even when a not well-tried
action becomes the new
optimal, it can switch to the
optimal action.

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

0

10000

20000

30000

Steps

40000

50000
Results

!18
Results
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

0.6
0.2

0.4

stationary

0.0

Accuracy rate

0.8

1.0

LS
LS-VR
UCB1-tuned

0e+00

2e+05

4e+05

6e+05

8e+05

1e+06

Steps

!18
Results
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

The  more  there  are  options,  
the  becer  the  performance  
of  LSVR  becomes.

0.6
0.2

0.4

stationary

0.0

Accuracy rate

0.8

1.0

LS
LS-VR
UCB1-tuned

0e+00

2e+05

4e+05

6e+05

8e+05

1e+06

Steps

!18
Results
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

The  more  there  are  options,  
the  becer  the  performance  
of  LSVR  becomes.

0.6

stationary

0.0

0.2

0.4

Accuracy rate

0.8

1.0

LS
LS-VR
UCB1-tuned

0e+00

2e+05

4e+05

6e+05

8e+05

1e+06

Steps
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

0.6

non-stationary 2

0.0

0.2

0.4

Accuracy rate

0.8

1.0

LS
LS-VR
UCB1-tuned

0

10000

20000

30000

40000

50000

Steps

LSVR  can  trace  the  unobserved  
change,  amplifying  fluctuation.

!18
Results
LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

The  more  there  are  options,  
the  becer  the  performance  
of  LSVR  becomes.

0.6

stationary

2e+05

4e+05

6e+05

8e+05

1e+06

non-stationary!
synchronous

0.2

0.6

0.0

non-stationary 2

0.4

Accuracy rate

0.8

0.4

Accuracy rate

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

1.0

0.8

Steps

0.6

0e+00

LS γ= 0.999
LS-VR γ= 0.999
UCB1-tuned γ= 0.999

LS
LS-VR
UCB1-tuned

1.0

0.0

0.2

0.4

Accuracy rate

0.8

1.0

LS
LS-VR
UCB1-tuned

0

10000

20000

30000

0.2

Steps

10000

20000

30000

40000

50000

Steps

LSVR  can  trace  the  unobserved  
change,  amplifying  fluctuation.

50000

LSVR  can  trace  the  change  in  
non-­‐‑stationary  environments.

0.0

0

40000

!18
Discussion

!19
Discussion
The  cognitive  biases  of  humans,  when  combined:  

!19
Discussion
The  cognitive  biases  of  humans,  when  combined:  
Effectively  works  for  adaptation  under  uncertainty

!19
Discussion
The  cognitive  biases  of  humans,  when  combined:  
Effectively  works  for  adaptation  under  uncertainty
Conflates  an  action  and  the  set  of  the  actions  
through  comparative  valuation.

!19
Discussion
The  cognitive  biases  of  humans,  when  combined:  
Effectively  works  for  adaptation  under  uncertainty
Conflates  an  action  and  the  set  of  the  actions  
through  comparative  valuation.
Symbolizes  the  whole  situation  into  a  virtual  action.

!19
Discussion
The  cognitive  biases  of  humans,  when  combined:  
Effectively  works  for  adaptation  under  uncertainty
Conflates  an  action  and  the  set  of  the  actions  
through  comparative  valuation.
Symbolizes  the  whole  situation  into  a  virtual  action.
Utilizes  fluctuation  from  uncertainty  and  enables  
adaptation  to  non-­‐‑stationary  environments.
!19
Conflating  part  and  whole

!20
Conflating  part  and  whole
Comparative  valuation  conflates  the  
information  of  an  action  and  of  the  whole  
set  of  actions.

!20
Conflating  part  and  whole
Comparative  valuation  conflates  the  
information  of  an  action  and  of  the  whole  
set  of  actions.
Universal  in  living  systems  from  slime  
molds  (Lacy  &  Beekman,  2011)  to  neurons  
(Royer  &  Paré,  2003)  to  animals  and  human  
beings.
!20
Relative  evaluation  is  especially  
important

if absolute

value 	

of A1

value 	

of A2 if relative

Try arms other than A1
by relative evaluation
(see-saw)

Choose A1 and lose

value 	

of A1

value 	

of A2

value 	

of A1

value 	

of A2
Relative  evaluation  is  especially  
important

★ Relative  evaluation:  

if absolute

value 	

of A1

value 	

of A2 if relative

Try arms other than A1
by relative evaluation
(see-saw)

Choose A1 and lose

value 	

of A1

value 	

of A2

value 	

of A1

value 	

of A2
Relative  evaluation  is  especially  
important

★ Relative  evaluation:  

★ is  what  even  slime  molds  and  real  neural  networks  (conservation  
of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  
comparatively  evaluate  actions  and  states.

if absolute

value 	

of A1

value 	

of A2 if relative

Try arms other than A1
by relative evaluation
(see-saw)

Choose A1 and lose

value 	

of A1

value 	

of A2

value 	

of A1

value 	

of A2
Relative  evaluation  is  especially  
important

★ Relative  evaluation:  

★ is  what  even  slime  molds  and  real  neural  networks  (conservation  
of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  
comparatively  evaluate  actions  and  states.
★ weakens  the  dilemma  between  exploitation  and  exploration  with  
the  see-­‐‑saw  game  like  competition  among  arms:  

if absolute

value 	

of A1

value 	

of A2 if relative

Try arms other than A1
by relative evaluation
(see-saw)

Choose A1 and lose

value 	

of A1

value 	

of A2

value 	

of A1

value 	

of A2
Relative  evaluation  is  especially  
important

★ Relative  evaluation:  

★ is  what  even  slime  molds  and  real  neural  networks  (conservation  
of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  
comparatively  evaluate  actions  and  states.
★ weakens  the  dilemma  between  exploitation  and  exploration  with  
the  see-­‐‑saw  game  like  competition  among  arms:  
★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly  
trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm.

if absolute

value 	

of A1

value 	

of A2 if relative

Try arms other than A1
by relative evaluation
(see-saw)

Choose A1 and lose

value 	

of A1

value 	

of A2

value 	

of A1

value 	

of A2
Relative  evaluation  is  especially  
important

★ Relative  evaluation:  

★ is  what  even  slime  molds  and  real  neural  networks  (conservation  
of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  
comparatively  evaluate  actions  and  states.
★ weakens  the  dilemma  between  exploitation  and  exploration  with  
the  see-­‐‑saw  game  like  competition  among  arms:  
★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly  
trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm.
★ Through  success  (high  reward),  choice  of  greedy  action  may  quickly  
trigger  to  focussing  on  the  currently  greedy  action,  lessening  the  
possibility  of  choosing  non-­‐‑greedy  arms  by  decreasing  the  value  of  other  
arms.
if absolute

value 	

of A1

value 	

of A2 if relative

Try arms other than A1
by relative evaluation
(see-saw)

Choose A1 and lose

value 	

of A1

value 	

of A2

value 	

of A1

value 	

of A2
Symbolization of the whole and
comparative valuation with multi actions
777

777

777

A1

A2

An

...
Symbolization of the whole and
comparative valuation with multi actions
777

777

777

A1

A2

An

...
Symbolization of the whole and
comparative valuation with multi actions
777

777

777

A1

A2

An

...
Symbolization of the whole and
comparative valuation with multi actions
Virtual machine
representing the whole

777

Ag

777

777

777

A1

A2

An

...
Comparative valuation with a virtual
action representing the whole
Virtual machine representing the whole

777

777

777

777

Ag

A1

A2
A2

An

...
“>” or “<”?
“>” or “<”?
“>” or “<”?
Comparative valuation with a virtual
action representing the whole
Virtual machine representing the whole

777

777

777

777

Ag

A1

A2
A2

An

...
“>” or “<”?
“>” or “<”?
“>” or “<”?
Conclusion

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC
Brain  science  and  the  three  cognitive  biases:  

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC
Brain  science  and  the  three  cognitive  biases:  
Satisficing  

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC
Brain  science  and  the  three  cognitive  biases:  
Satisficing  
Kolling  et  al.,  Science,  2012.

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC
Brain  science  and  the  three  cognitive  biases:  
Satisficing  
Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  
value

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC
Brain  science  and  the  three  cognitive  biases:  
Satisficing  
Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  
value
Daw  et  al.,  Nature,  2006.

24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC
Brain  science  and  the  three  cognitive  biases:  
Satisficing  
Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  
value
Daw  et  al.,  Nature,  2006.

Idiosyncratic  risk  evaluation
24
Conclusion
The  cognitive  biases  that  look  irrational  are,  when  appropriately  
combined  together  as  in  humans,  actually  rational  for  adapting  to  
uncertain  environments  and  survival  through  evolution
Applicable  in  engineering,  in  machine  learning  and  robot  control
Implications  to  brain  science  (brain  as  a  machine  learning  equipment)
Modeling  PFC  and  vmPFC
Brain  science  and  the  three  cognitive  biases:  
Satisficing  
Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  
value
Daw  et  al.,  Nature,  2006.

Idiosyncratic  risk  evaluation
Boorman  et  al.,  Neuron,  2009.

24
Applications  of  bandit  problems

Game-tree
Applications  of  bandit  problems

Game-tree

★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)
Applications  of  bandit  problems

Game-tree

★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)
★ Online  advertisement
Applications  of  bandit  problems

Game-tree

★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)
★ Online  advertisement
★ e.g.,  A/B  test
Applications  of  bandit  problems

Game-tree

★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)
★ Online  advertisement
★ e.g.,  A/B  test

★ Design  of  medical  treatment  
Applications  of  bandit  problems

Game-tree

★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)
★ Online  advertisement
★ e.g.,  A/B  test

★ Design  of  medical  treatment  
★ Reinforcement  learning
Robotic motion learning
Learning giant-swing motion with no prior knowledge
and under coarse-grained states through trial-and-error.
Real$Robot$

Simulator$

free$joint

1st$joint$(free)

Posi%on'State'

2nd$joint$
(ac,ve)

ac,ve$joint

1st$link

Acquired#reward#per#1000#steps

2nd$link

Typical(case

Average(of(100(trials

600#

500#

400#

400#

300#

P23 P0 P1
P22
P2
P21
P3
P20
P4
P19
P5
P18
P6
P17
P7
P16
P8
P15
P9
P14
P10
P13P12P11

Posture'State'
5/6π'[rad]
R4
R3

[rad/s]

R2
R1

W6 W5 W4 W3 W2 W1 W0

3π

0

.3π

300#

0#

20#

40#

60#

80#

100#

0#

20#

40#

60#

80#

A1
A2

100#

A0

Learning#steps#[#/1000#steps]

Reward'
r'='1

4.0'[rad/s]

200#

200#

Ac%on'

LSQ#
Q#

0.0'[rad/s]

.4.0'[rad/s]

r'='|θ%p'/'π|

Uragami, D., Takahashi, T., Matsuo,Y., Cognitively inspired reinforcement learning
architecture and its application to giant-swing motion control, BioSystems, 116, 1–9. (2014)

R0

0'[rad]

600#

500#

Velocity'State'

r'='0

Mais conteúdo relacionado

Semelhante a A toy model of human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsNicola Barbieri
 
On the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceOn the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceGiovanni Sileno
 
Spacey random walks from Householder Symposium XX 2017
Spacey random walks from Householder Symposium XX 2017Spacey random walks from Householder Symposium XX 2017
Spacey random walks from Householder Symposium XX 2017Austin Benson
 
IR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdfIR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdfhimarusti
 
Physics of Intelligence
Physics of IntelligencePhysics of Intelligence
Physics of IntelligenceRobert Fry
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelLaboratoire Statistique et génome
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisDaniel Oberski
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...Ana Luísa Pinho
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
Ldb Convergenze Parallele_sozzolabbasso_01
Ldb Convergenze Parallele_sozzolabbasso_01Ldb Convergenze Parallele_sozzolabbasso_01
Ldb Convergenze Parallele_sozzolabbasso_01laboratoridalbasso
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Ana Luísa Pinho
 
Computing Contrast on Conceptual Spaces
Computing Contrast on Conceptual SpacesComputing Contrast on Conceptual Spaces
Computing Contrast on Conceptual SpacesGiovanni Sileno
 
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...Deep Learning JP
 
RuleML 2015
RuleML 2015RuleML 2015
RuleML 2015livpre
 

Semelhante a A toy model of human cognition: Utilizing fluctuation in uncertain and non-stationary environments (20)

Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 
Logic 2
Logic 2Logic 2
Logic 2
 
On the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevanceOn the problems of interface: explainability, conceptual spaces, relevance
On the problems of interface: explainability, conceptual spaces, relevance
 
Dscrete structure
Dscrete  structureDscrete  structure
Dscrete structure
 
PSL Overview
PSL OverviewPSL Overview
PSL Overview
 
Spacey random walks from Householder Symposium XX 2017
Spacey random walks from Householder Symposium XX 2017Spacey random walks from Householder Symposium XX 2017
Spacey random walks from Householder Symposium XX 2017
 
IR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdfIR-lec17-probabilistic-ir.pdf
IR-lec17-probabilistic-ir.pdf
 
Knowldge reprsentations
Knowldge reprsentationsKnowldge reprsentations
Knowldge reprsentations
 
Physics of Intelligence
Physics of IntelligencePhysics of Intelligence
Physics of Intelligence
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysis
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...
Individual Brain Charting, a high-resolution fMRI dataset for cognitive mappi...
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
Ldb Convergenze Parallele_sozzolabbasso_01
Ldb Convergenze Parallele_sozzolabbasso_01Ldb Convergenze Parallele_sozzolabbasso_01
Ldb Convergenze Parallele_sozzolabbasso_01
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
 
Computing Contrast on Conceptual Spaces
Computing Contrast on Conceptual SpacesComputing Contrast on Conceptual Spaces
Computing Contrast on Conceptual Spaces
 
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
【DL輪読会】Physion: Evaluating Physical Prediction from Vision in Humans and Mach...
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
RuleML 2015
RuleML 2015RuleML 2015
RuleML 2015
 

Último

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 

Último (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 

A toy model of human cognition: Utilizing fluctuation in uncertain and non-stationary environments

  • 1. A  toy  model  of     human  cognition:     ! Utilizing  fluctuation  in  uncertain   and  non-­‐‑stationary  environments 1 1,2 Tatsuji  Takahashi ,  Yu  Kohno   Seminar  on  science  of  complex  systems  (organized  by   Yukio-­‐‑Pegio  Gunji),  Yukawa  Institute  for  Theoretical   Physics,  Kyoto  University,  Jan.  20,  2014   1 Tokyo  Denki  University,  2JSPS  (from  Apr.,  2014)
  • 3. Contents The  loosely  symmetric  (LS)  model   !2
  • 4. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases !2
  • 5. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases Analysis  of  reconstruction  of  LS   !2
  • 6. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases Analysis  of  reconstruction  of  LS   Result:  Efficacy  in  reinforcement  learning !2
  • 7. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases Analysis  of  reconstruction  of  LS   Result:  Efficacy  in  reinforcement  learning Utilization  of  fluctuation  in  non-­‐‑ stationary  environments !2
  • 8. A  toy  model  of  human  cognition !3
  • 9. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases !3
  • 10. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” !3
  • 11. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible !3
  • 12. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible so  that  it  can  be  analyzed  and  applied  easily !3
  • 13. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible so  that  it  can  be  analyzed  and  applied  easily Intuition  of  human  beings !3
  • 14. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible so  that  it  can  be  analyzed  and  applied  easily Intuition  of  human  beings as  simple,  again:  not  the  policy  (or  strategy)  that  is  learnt   through  education  and  culture !3
  • 15. LS  as  a  toy  model  of  cognition !4
  • 16. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   !4
  • 17. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases !4
  • 18. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases merely  a  function  over  co-­‐‑occurrence  information   between  two  events !4
  • 19. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases merely  a  function  over  co-­‐‑occurrence  information   between  two  events faithfully  describes  the  causal  intuition  of  humans !4
  • 20. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases merely  a  function  over  co-­‐‑occurrence  information   between  two  events faithfully  describes  the  causal  intuition  of  humans which  form  the  basis  of  decision-­‐‑making  and  action  for   adaptation  in  the  world !4
  • 21. The  loosely  symmetric  (LS)  model !5
  • 22. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). !5
  • 23. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q !5
  • 24. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q posterior event q prior event !5 p ¬p ¬q a c b d
  • 25. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) posterior event q prior event !5 p ¬p ¬q a c b d
  • 26. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) LS  describes  the  causal  intuition  of  human  beings  the  most   faithfully  (among  more  than  40  existing  models).   posterior event q prior event !5 p ¬p ¬q a c b d
  • 27. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) LS  describes  the  causal  intuition  of  human  beings  the  most   faithfully  (among  more  than  40  existing  models).   posterior event a P (q|p) = a+b q prior event !5 p ¬p ¬q a c b d
  • 28. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) LS  describes  the  causal  intuition  of  human  beings  the  most   faithfully  (among  more  than  40  existing  models).   posterior event a P (q|p) = a+b LS(q|p) = a+ a+ b b+d d q b b+d d +b+ a a+c c !5 prior event p ¬p ¬q a c b d
  • 29. The  loosely  symmetric  (LS)  model posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 30. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 31. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 32. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 33. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 34. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ prior event p ¬p ¬q a c b d Meta analysis as in Hattori & Oaksford (2007) !6
  • 35. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition posterior event LS(q|p) = a+ a+ b b+d d q b b+d d prior event a a+c c +b+ p ¬p ¬q a c b d Meta analysis as in Hattori & Oaksford (2007) Experiment r for LS r for ΔP AS95 0.95 0.88 BCC03.1 BCC03.3 0.98 0.92 0.98 0.84 !6 H03 0.98 0.00 H06 0.97 0.71 LS00 W03.2 W03.6 0.85 0.95 0.85 0.88 0.28 0.46
  • 37. In  2-­‐‑armed  bandit  problems later on bandit problems !7
  • 38. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   later on bandit problems !7
  • 39. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   later on bandit problems The  agent  evaluates  the   actions  according  to   the  causal  intuition  of   humans. !7
  • 40. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   1.0 LS CP ToWH0.5L SMH0.3L SMH0.7L 0.9 Accuracy rate The  agent  evaluates  the   actions  according  to   the  causal  intuition  of   humans. later on bandit problems 0.8 0.7 0.6 0.5 1 5 10 50 step !7 100 500 1000
  • 41. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   Very  good  adaptation   to  the  environment,   both  in  short  term  and   long  term. 1.0 LS CP ToWH0.5L SMH0.3L SMH0.7L 0.9 Accuracy rate The  agent  evaluates  the   actions  according  to   the  causal  intuition  of   humans. later on bandit problems 0.8 0.7 0.6 0.5 1 5 10 50 step !7 100 500 1000
  • 42. The  loosely  symmetric  (LS)  model
  • 43. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:  
  • 44. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
  • 45. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation
  • 46. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974.
  • 47. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006.
  • 48. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory
  • 49. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009
  • 50. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009 Satisficing  
  • 51. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009 Satisficing   Simon,  Psy.  Rev.,  1954,  Kolling  et  al.,  Science,  2012.
  • 53. Principal  human  cognitive  biases Humans:   !9
  • 54. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. !9
  • 55. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. become  satisfied  when  it  is  becer  than  the  reference   level !9
  • 56. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. become  satisfied  when  it  is  becer  than  the  reference   level Comparative  valuation:  evaluate  states  and  actions  in   a  relative  manner !9
  • 57. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. become  satisfied  when  it  is  becer  than  the  reference   level Comparative  valuation:  evaluate  states  and  actions  in   a  relative  manner Asymmetric  risk  a:itude:  asymmetrically  recognize   gain  and  loss !9
  • 58. Satisficing A1 A2 A1 A2 reference all arms are over reference No pursuit of arms over the reference level given reference all arms are under reference Search hard for an arm over the reference level
  • 59. Satisficing A1 A2 A1 A2 reference all arms are over reference No pursuit of arms over the reference level given reference all arms are under reference Search hard for an arm over the reference level Risk  a:itude  (Reliability  consideration) Risk-avoiding over the reference Expected value 0.75 win (o) and lose (x) in the past = ○×○○○ ×○○○○ ○○○×○ ○○×○× comparison considering reliability Choose 15/20 than 3/4 75% ○×○○ > Risk-seeking under the reference 25% reflection effect = ×○××× ○×××× ×××○× ××○×○ 25% ×○×× < Gamble on 1/4 rather than 5/20
  • 60. Satisficing A1 A2 A1 reference all arms are over reference No pursuit of arms over the reference level given reference all arms are under reference Search hard for an arm over the reference level A2 Risk  a:itude  (Reliability  consideration) Risk-avoiding over the reference Expected value 0.75 win (o) and lose (x) in the past = ○×○○○ ×○○○○ ○○○×○ ○○×○× ○×○○ Choose 15/20 than 3/4 Comparative  evaluation value of A1 A2 75% = 25% reflection effect ×○××× ○×××× ×××○× ××○×○ > comparison considering reliability A1 Risk-seeking under the reference 25% ×○×× < Gamble on 1/4 rather than 5/20 Try arms other than A1 by comparative valuation value of A2 (see-saw) Choose A1 and lose comparative absolute A1 A2
  • 61. The  generalized  LS  with  variable   reference  (LSVR) Variable Reference Abstract image LSVR is a generalization of LS with an autonomously adjusted parameter of reference.
  • 63. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. !12
  • 64. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. It  is  to  maximize  the  total  reward  acquired  from  n   actions  (sources)  with  unknown  reward  distribution. !12
  • 65. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. It  is  to  maximize  the  total  reward  acquired  from  n   actions  (sources)  with  unknown  reward  distribution. One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a   reward  (win)  or  not  (lose). !12
  • 66. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. It  is  to  maximize  the  total  reward  acquired  from  n   actions  (sources)  with  unknown  reward  distribution. One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a   reward  (win)  or  not  (lose). n-­‐‑armed  bandit  is  a  slot  machine  with  n  arms  that  have   different  probability  of  winning.   !12
  • 68. Performance  indices  for  nABP Accuracy:   !13
  • 69. Performance  indices  for  nABP Accuracy:   the  average  percentage  of  choosing  the  optimal   action !13
  • 70. Performance  indices  for  nABP Accuracy:   the  average  percentage  of  choosing  the  optimal   action Regret  (expected  loss):   !13
  • 71. Performance  indices  for  nABP Accuracy:   the  average  percentage  of  choosing  the  optimal   action Regret  (expected  loss):   the  difference  of  the  actually  acquired   accumulated  rewards  from  the  best  possible   sequence  of  actions  (where  accuracy=1.0  all   through  the  trial) !13
  • 72. Result n=100, the reward probability for each action is taken uniformly from [0,1]. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 10 Expected loss 0.6 0 0.0 0.2 5 0.4 Accuracy rate 0.8 15 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0e+00 Steps Accuracy: highest The more there are actions, the better the performance of LSVR becomes. 2e+05 4e+05 6e+05 8e+05 1e+06 Steps Regret: smallest Kohno & Takahashi, 2012; in prep.
  • 73. Non-­‐‑stationary  bandits The  reward  probabilities  change  while   playing. !15
  • 74. Result in non-stationary environment 1 n=16, the reward probability is from [0,1]. The probabilities are totally reset every 10,000 steps. 200 100 150 Expected loss 0.6 0.4 50 0.2 0 0.0 Accuracy rate 0.8 250 1.0 LS LS-VR UCB1-tuned LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 300 LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 40000 50000 0 10000 Steps Accuracy: highest Kohno & Takahashi, in prep. 20000 30000 40000 Steps Regret: smallest 50000
  • 75. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 0.6 0.4 0.2 0.0 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 76. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.4 0.2 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 77. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.4 0.2 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 78. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.2 0.4 If the reward is given deterministically, this is impossible. 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 79. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.2 0.4 If the reward is given deterministically, this is impossible. 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 80. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.2 0.4 If the reward is given deterministically, this is impossible. Efficient search utilizing uncertainty and fluctuation in non-stationary Accuracy (the rate of the optimal environments action at the time chosen) 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000
  • 82. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 0.6 0.2 0.4 stationary 0.0 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 Steps !18
  • 83. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 The  more  there  are  options,   the  becer  the  performance   of  LSVR  becomes. 0.6 0.2 0.4 stationary 0.0 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 Steps !18
  • 84. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 The  more  there  are  options,   the  becer  the  performance   of  LSVR  becomes. 0.6 stationary 0.0 0.2 0.4 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 Steps LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 0.6 non-stationary 2 0.0 0.2 0.4 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0 10000 20000 30000 40000 50000 Steps LSVR  can  trace  the  unobserved   change,  amplifying  fluctuation. !18
  • 85. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 The  more  there  are  options,   the  becer  the  performance   of  LSVR  becomes. 0.6 stationary 2e+05 4e+05 6e+05 8e+05 1e+06 non-stationary! synchronous 0.2 0.6 0.0 non-stationary 2 0.4 Accuracy rate 0.8 0.4 Accuracy rate LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 1.0 0.8 Steps 0.6 0e+00 LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 1.0 0.0 0.2 0.4 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0 10000 20000 30000 0.2 Steps 10000 20000 30000 40000 50000 Steps LSVR  can  trace  the  unobserved   change,  amplifying  fluctuation. 50000 LSVR  can  trace  the  change  in   non-­‐‑stationary  environments. 0.0 0 40000 !18
  • 87. Discussion The  cognitive  biases  of  humans,  when  combined:   !19
  • 88. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty !19
  • 89. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty Conflates  an  action  and  the  set  of  the  actions   through  comparative  valuation. !19
  • 90. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty Conflates  an  action  and  the  set  of  the  actions   through  comparative  valuation. Symbolizes  the  whole  situation  into  a  virtual  action. !19
  • 91. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty Conflates  an  action  and  the  set  of  the  actions   through  comparative  valuation. Symbolizes  the  whole  situation  into  a  virtual  action. Utilizes  fluctuation  from  uncertainty  and  enables   adaptation  to  non-­‐‑stationary  environments. !19
  • 93. Conflating  part  and  whole Comparative  valuation  conflates  the   information  of  an  action  and  of  the  whole   set  of  actions. !20
  • 94. Conflating  part  and  whole Comparative  valuation  conflates  the   information  of  an  action  and  of  the  whole   set  of  actions. Universal  in  living  systems  from  slime   molds  (Lacy  &  Beekman,  2011)  to  neurons   (Royer  &  Paré,  2003)  to  animals  and  human   beings. !20
  • 95. Relative  evaluation  is  especially   important if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 96. Relative  evaluation  is  especially   important ★ Relative  evaluation:   if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 97. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 98. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. ★ weakens  the  dilemma  between  exploitation  and  exploration  with   the  see-­‐‑saw  game  like  competition  among  arms:   if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 99. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. ★ weakens  the  dilemma  between  exploitation  and  exploration  with   the  see-­‐‑saw  game  like  competition  among  arms:   ★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly   trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm. if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 100. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. ★ weakens  the  dilemma  between  exploitation  and  exploration  with   the  see-­‐‑saw  game  like  competition  among  arms:   ★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly   trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm. ★ Through  success  (high  reward),  choice  of  greedy  action  may  quickly   trigger  to  focussing  on  the  currently  greedy  action,  lessening  the   possibility  of  choosing  non-­‐‑greedy  arms  by  decreasing  the  value  of  other   arms. if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 101. Symbolization of the whole and comparative valuation with multi actions 777 777 777 A1 A2 An ...
  • 102. Symbolization of the whole and comparative valuation with multi actions 777 777 777 A1 A2 An ...
  • 103. Symbolization of the whole and comparative valuation with multi actions 777 777 777 A1 A2 An ...
  • 104. Symbolization of the whole and comparative valuation with multi actions Virtual machine representing the whole 777 Ag 777 777 777 A1 A2 An ...
  • 105. Comparative valuation with a virtual action representing the whole Virtual machine representing the whole 777 777 777 777 Ag A1 A2 A2 An ... “>” or “<”? “>” or “<”? “>” or “<”?
  • 106. Comparative valuation with a virtual action representing the whole Virtual machine representing the whole 777 777 777 777 Ag A1 A2 A2 An ... “>” or “<”? “>” or “<”? “>” or “<”?
  • 108. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution 24
  • 109. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control 24
  • 110. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) 24
  • 111. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC 24
  • 112. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC 24
  • 113. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   24
  • 114. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   24
  • 115. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. 24
  • 116. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value 24
  • 117. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value Daw  et  al.,  Nature,  2006. 24
  • 118. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value Daw  et  al.,  Nature,  2006. Idiosyncratic  risk  evaluation 24
  • 119. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value Daw  et  al.,  Nature,  2006. Idiosyncratic  risk  evaluation Boorman  et  al.,  Neuron,  2009. 24
  • 120. Applications  of  bandit  problems Game-tree
  • 121. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)
  • 122. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement
  • 123. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement ★ e.g.,  A/B  test
  • 124. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement ★ e.g.,  A/B  test ★ Design  of  medical  treatment  
  • 125. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement ★ e.g.,  A/B  test ★ Design  of  medical  treatment   ★ Reinforcement  learning
  • 126. Robotic motion learning Learning giant-swing motion with no prior knowledge and under coarse-grained states through trial-and-error. Real$Robot$ Simulator$ free$joint 1st$joint$(free) Posi%on'State' 2nd$joint$ (ac,ve) ac,ve$joint 1st$link Acquired#reward#per#1000#steps 2nd$link Typical(case Average(of(100(trials 600# 500# 400# 400# 300# P23 P0 P1 P22 P2 P21 P3 P20 P4 P19 P5 P18 P6 P17 P7 P16 P8 P15 P9 P14 P10 P13P12P11 Posture'State' 5/6π'[rad] R4 R3 [rad/s] R2 R1 W6 W5 W4 W3 W2 W1 W0 3π 0 .3π 300# 0# 20# 40# 60# 80# 100# 0# 20# 40# 60# 80# A1 A2 100# A0 Learning#steps#[#/1000#steps] Reward' r'='1 4.0'[rad/s] 200# 200# Ac%on' LSQ# Q# 0.0'[rad/s] .4.0'[rad/s] r'='|θ%p'/'π| Uragami, D., Takahashi, T., Matsuo,Y., Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control, BioSystems, 116, 1–9. (2014) R0 0'[rad] 600# 500# Velocity'State' r'='0