脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム：銅谷賢治

脳とAIの接点から何を学びうるのか
銅⾕賢治 doya@oist.jp
神経計算ユニット groups.oist.jp/ncu
沖縄科学技術⼤学院⼤学
第5回全脳アーキテクチャシンポジウム

脳と人工知能
電⼦回路で知能を実現するために
脳のしくみにとらわれる必要はない。
脳のような⾼度な知能の実現例がある
のだからそれに学ばない⼿はない。
前世紀の⼈⼯知能：専⾨家の知識をプログラム化
今⽇の⼈⼯知能：ビッグデータからの統計的機械学習
⾼性能を追い求めた結果、ディープラーニング
のような脳を模した学習の強みが再認識された。

脳科学と人工知能の共進化：視覚
脳科学⼈⼯知能
多層教師あり学習
(Amari 1967)
ネオコグニトロン
(Fukushima 1980)
ConvNet(Krizhevsky, Sutskever, Hinton, 2012)
GoogleBrain(2012)
場所細胞
(O’Keefe 1976)
顔細胞(Bruce, Desimone, Gross1981)
(Sugase et al. 1999)
e calculated the time
from the number of
indow of 50 ms, and
th the x2
test13
(see
mation analysis for the
ron coded ‘significant’
he earliest information
d rapidly, correspond-
nse (beige histogram),
2 (monkey expression)
peaks before levelling off. F1 (monkey identity) and F4 (human
expression) were encoded at a much lower level than the others. The
response latency of this neuron was 53 ms, and the latencies for the
information about each category were as follows: G, 45; F1, 157; F2,
93; F3, 125; and F4, 125 ms. Thus, the earliest transient discharge
conveyed global information, and the later sustained discharge
encoded one category of the fine information best14
.
Of the 86 face-responsive neurons, 11 neurons (13%) did not
encode significant information in any category, 43 (50%) coded
either global or fine category information, and 32 (37%) coded both
categories. To clarify how single neurons encoded both global and
D
A B C D
A B C D
1
2
3
c
d
HIPPOCAMPAL PLACE UNITS 87
WALL
-213-4-j
lRACK
FIG. 2. Place fields for all place units except 21342 and those from animal 217.
distributed around the maze. The concentration of fields from the other
animals in arm B may have reflected the fact that many of the rats spent
their “free time” in this arm. The fact that the initial search for units
was conducted there might also have introduced a bias towards units
active in that area. In any case, it was clear that the majority of fields
were not located in those places which contained the rewards or other
FIG. 3. Place fields for place units from animal 217.
経験依存学習
(Blakemore & Cooper 1970)
RECEPTIVE FIELDS IN CAT STRIATE CORTEX 579
found by changing the size, shape and orientation of the stimulus until a clear
response was evoked. Often when a region with excitatory or inhibitory
responses was established the neighbouring opposing areas in the receptive
field could only be demonstrated indirectly. Such an indirect method is
illustrated in Fig. 3B, where two flanking areas are indicated by using a short
slit in various positions like the hand of a clock, always including the very
A B
+7 -!
mm -I
- m_
aS~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~T T T
Fig. 3. Same unit as in Fig. 2. A, responses to shinling a rectangular light spot, 1° x 8°; centre of
slit superimposed on centre of receptive field; successive stimuli rotated clockwise, as shown
to left of figure. B, responses to a 1° x 5° slit oriented in various directions, with one end
always coveringthe centre ofthe receptive field: note that this central region evoked responses
when stimulated alone (Fig. 2a). Stimulus and background intensities as in Fig. 1; stimulus
duration 1 sec.
centre of the field. The findings thus agree qualitatively with those obtained
with a small spot (Fig. 2a).
Receptive fields having a central area and opposing flanks represented a
common pattern, but several variations were seen. Some fields had longnarrow
central regions with extensive flanking areas (Figs. 1-3): others had a large
central area and concentrated slit-shaped flanks (Figs. 6, 9, 10). In many
fields the two flanking regions were asymmetrical, differing in size and shape;
in these a given spot gave unequal responses in symmetrically corresponding
37 PHYSIO. CXL,VIIT) by guest on October 20, 2012jp.physoc.orgDownloaded from J Physiol (
特徴抽出細胞
(Hubel & Wiesel 1959)
パーセプトロン
(Rosenblatt 1962)

脳科学と人工知能の共進化：強化学習
脳科学⼈⼯知能
0
1
delay)3s delay)6s delay)9s
**
古典的条件付け(Pavlov 1903)
オペラント条件付け
(Thorndike 1898)
Deep Q network (Mnihet al. 2015)difficult and engaging for human players. We used the same network
architecture, hyperparameter values (see Extended Data Table 1) and
learningprocedurethroughout—takinghigh-dimensionaldata(210|160
colour video at 60 Hz) as input—to demonstrate that our approach
robustly learns successful policies over a variety of games based solely
We compared DQN with the best performing methods from the
reinforcement learning literature on the 49 games where results were
available12,15
.Inaddition tothe learned agents,wealsoreportscoresfor
aprofessionalhumangamestesterplayingundercontrolledconditions
and a policy that selects actions uniformly at random (Extended Data
Convolution Convolution Fully connected Fully connected
No input
Figure 1 | Schematic illustration of the convolutional neural network. The
details of the architecture are explained in the Methods. The input to the neural
network consists of an 8438434 image produced by the preprocessing
map w, followed by three convolutional layers (note: snaking blue line
symbolizes sliding of each filter across input image) and two fully connected
layers with a single output for each valid action. Each hidden layer is followed
by a rectifier nonlinearity (that is, max 0,xð Þ).
RESEARCH LETTER
TD学習(Sutton et al. 1983)
ドーパミン細胞の報酬
予測誤差応答
(Schultzetal.1993,1997)
ドーパミン依存
シナプス可塑性
(Wickensetal.2000)
W. SCHULTZ4
fails to occur, even in the absence of an immediately preced-
ing stimulus (Fig. 2, bottom). This is observed when animals
fail to obtain reward because of erroneous behavior, when
liquid flow is stopped by the experimenter despite correct
behavior, or when a valve opens audibly without delivering
liquid (Hollerman and Schultz 1996; Ljungberg et al. 1991;
Schultz et al. 1993). When reward delivery is delayed for
0.5 or 1.0 s, a depression of neuronal activity occurs at the
regular time of the reward, and an activation follows the
reward at the new time (Hollerman and Schultz 1996). Both
responses occur only during a few repetitions until the new
time of reward delivery becomes predicted again. By con-
trast, delivering reward earlier than habitual results in an
activation at the new time of reward but fails to induce a
depression at the habitual time. This suggests that unusually
early reward delivery cancels the reward prediction for the
habitual time. Thus dopamine neurons monitor both the oc-
currence and the time of reward. In the absence of stimuli
immediately preceding the omitted reward, the depressions
do not constitute a simple neuronal response but reflect an
expectation process based on an internal clock tracking the
precise time of predicted reward.
Activation by conditioned, reward-predicting stimuli
About 55–70% of dopamine neurons are activated by
conditioned visual and auditory stimuli in the various classi-
cally or instrumentally conditioned tasks described earlier
(Fig. 2, middle and bottom) (Hollerman and Schultz 1996;
Ljungberg et al. 1991, 1992; Mirenowicz and Schultz 1994;
Schultz 1986; Schultz and Romo 1990; P. Waelti, J. Mire-
nowicz, and W. Schultz, unpublished data). The first dopa-
mine responses to conditioned light were reported by Miller
et al. (1981) in rats treated with haloperidol, which increased
the incidence and spontaneous activity of dopamine neurons
but resulted in more sustained responses than in undrugged
animals. Although responses occur close to behavioral reac-
tions (Nishino et al. 1987), they are unrelated to arm and
eye movements themselves, as they occur also ipsilateral toFIG. 2. Dopamine neurons report rewards according to an error in re-
ward prediction. Top: drop of liquid occurs although no reward is predicted the moving arm and in trials without arm or eye movements
at this time. Occurrence of reward thus constitutes a positive error in the (Schultz and Romo 1990). Conditioned stimuli are some-
prediction of reward. Dopamine neuron is activated by the unpredicted
what less effective than primary rewards in terms of response
線条体の⾏動価値表現 (Samejima et al. 2005)
⼤脳基底核TD学習仮説
(Bartoet al. 1995,
Montague et al. 1996)
!
V(s) Q(s,a)

人工知能は脳からさらに何を学ぶべきか
エネルギー効率
データ効率
l 世界モデルと脳内シミュレーション
l モジュール⾃⼰組織化
l メタ学習
⾃律性と社会性
(銅⾕, 松尾, Brain and Nerve, 2019; groups.oist.jp/ncu/publications)

モデルフリー／モデルベースの行動
モデルフリー
n 「⾏動の価値」を記憶
n 直感的・反射的⾏動選択
n やってみた経験から学ぶ
処理は単純／反復経験が必要
モデルベース
n ⾏動の結果を予測する内部モデ
ル
n 先読みによる⾏動選択
l 「脳内シミュレーション」
n やってみる前に考える
柔軟な適応／処理は複雑

PILCO: モデルベース強化学習
(Paavo Parmas)
初回２回⽬８回⽬

脳内シミュレーション
⾏動による⾝体や環境の状態変化の予測モデル
s’=f(s,a) または P(s’|s,a)
を使った認知⾏動機構
n 過去の状態と⾏動から、現在の状態を推定
l ノイズ、ディレイ、遮蔽への対処
n 現在の状態から、想定した⾏動の結果を予測
l モデルベース意思決定、⾏動計画、…
n 想定したの状態から、⾏動の結果や原因を予測
l 思考、推論、⾔語、科学、…

Model-based action planning
involves cortico-cerebellar and
basal ganglia networks
Alan S. R. Fermin , ,
,TakehikoYoshida ,
, JunichiroYoshimoto ,
, Makoto Ito ,
SaoriC.Tanaka & Kenji Doya , , ,
Humans can select actions by learning, planning, or retrieving motor memories. Reinforcement
Learning (RL) associates these processes with three major classes of strategies for action selection:
exploratory RL learns state-action values by exploration, model-based RL uses internal models to
simulate future states reached by hypothetical actions, and motor-memory RL selects past successful
state-action mapping. In order to investigate the neural substrates that implement these strategies,
we conducted a functional magnetic resonance imaging (fMRI) experiment while humans performed a
ventromedial prefrontal cortex and ventral striatum increased activity in the exploratory condition;
the dorsolateral prefrontal cortex, dorsomedial striatum, and lateral cerebellum in the model-based
condition; and the supplementary motor area, putamen, and anterior cerebellum in the motor-memory
implements the model-based RL action selection strategy.
Using exploration and reward feedback, humans and other animals have a remarkable capacity to learn new
motor behaviors without explicit teaching1
. Throughout most of our lives, however, we depend on explicit or
implicit knowledge, based upon past experiences, such as a map of the area or properties of the musculoskeletal
system, to enable focused exploration and efficient learning2,3
. After repeated practice, a motor behavior becomes
stereotyped and can be executed with little mental load4
. What brain mechanisms enable animals to employ
different learning strategies and to select or integrate them in a given situation? In this paper, we take a new
behavioral paradigm that captures different stages of motor learning during a single experimental session5
, and
using fMRI we explore brain structures that are specifically involved in implementing different learning strategies.
6
R
A
P
OPEN
(Alan Fermin)
20
n 脳内シミュレーションには
⼤脳⽪質/基底核/⼩脳が関与
1
3 2
1 2 3
+
dition 3 Condition 2 Condition 3B
D
Condition 1 Condition 3A Condition 2 Condition 3B
C D
C D

視床
⿊質
下オリーブ核
⼤脳⽪質
⼤脳
基底核
⼩脳
⽬標出⼒
誤差信号
+
-
出⼒⼊⼒
⼩脳：教師あり学習：内部モデル
報酬信号
出⼒⼊⼒
⼤脳基底核：強化学習：報酬予測
⼤脳⽪質：教師なし学習：状態表現
出⼒⼊⼒
学習アルゴリズムによる分化
(Doya, 1999)

n 頭頂葉ニューロン活動の
⼆光⼦顕微鏡による観測
n ⾳響仮想空間
l 間⽋的な感覚情報
n ニューロン集団の確率的デコーディング
l 感覚情報がなくてもゴール位置を予測
©2016NatureAmerica,Inc.Allrightsreserved.
Animals have to act despite limited sensory information because of
factors such as interfering background noise or occluded vision. Thus,
the ability to estimate the current state of the outside world from a
sequence of sensory observations and their own actions is essential.
This process is optimally realized by dynamic Bayesian inference,
such as a Kalman filter1, which predicts the state with an internal state
transition model and updates the prediction with new sensory inputs.
For example, when a mouse navigates in darkness, it must keep track
of the location of its destination based on both its movement (predic-
tion) and sensory signals such as calls from its nest mate (updating).
We hypothesized that dynamic Bayesian inference is implemented in
cerebral neocortex and investigated the plausibility of this idea using
two-photon microscopy2.
Most areas of the cerebral neocortex receive ascending sensory
inputs (feedforward streams) and descending inputs (feedback
3
hypothesis that dynamic Bayesian inference is implemented in pyram-
idal neurons of layers 2, 3 and 5, with increasing action dependence
in deeper layers.
To test this hypothesis, we trained mice to perform an auditory
virtual navigation task and imaged neuronal activity in layers 2, 3
and 5 of the PPC and the PM located posterior to PPC11–13. The task
required mice to approach a water reward site (goal) by estimating
the distance on the basis of sound cues and their own locomotion.
PPC is involved in spatial navigation by representing route maps,
head directions, turning locations and locomotory accelerations with
egocentric and allocentric representations14–17. PPC lesions disrupt
navigation on the basis of self-motion information (path integra-
tion)18,19. PM also represents egocentric and allocentric reference
frames20. Both PPC and PM receive inputs from auditory cortex
and secondary motor cortex (M2)21,22, but PM receives fewer feed-
13,23,24
Neural substrate of dynamic Bayesian inference in the
cerebral cortex
Akihiro Funamizu1,2, Bernd Kuhn2 & Kenji Doya1
Dynamic Bayesian inference allows a system to infer the environmental state under conditions of limited sensory observation.
Using a goal-reaching task, we found that posterior parietal cortex (PPC) and adjacent posteromedial cortex (PM) implemented
the two fundamental features of dynamic Bayesian inference: prediction of hidden states using an internal state transition model
and updating the prediction with new sensory evidence. We optically imaged the activity of neurons in mouse PPC and PM layers
2, 3 and 5 in an acoustic virtual-reality system. As mice approached a reward site, anticipatory licking increased even when
sound cues were intermittently presented; this was disturbed by PPC silencing. Probabilistic population decoding revealed that
neurons in PPC and PM represented goal distances during sound omission (prediction), particularly in PPC layers 3 and 5, and
prediction improved with the observation of cue sounds (updating). Our results illustrate how cerebral cortex realizes mental
simulation using an action-dependent dynamic model.
0.12
NormalizedmeanR/R
0
1
PPC: N = 43
Normalized
meanR/R
Normalized
likelihood
-10
15
Posterior
Goal distance (cm)
67 33 0
33 0 33 0
0.04
0
0
Goal distance (cm)
67 33 0 67 33 0 67 67
Sound
zone
a
figure4
Hypothetical cortical algorithm of model-based state prediction
with Bayesian inference
Probabilistic population code in PPC
0327-mouse23: PPC Layer2 0416-mouse23: PM Layer2
Decoding_Bayes_160324_17_detail
0327-mouse23: State2(17),71 trial, --300, 200, -100, 0
0416-mouse23: State2(16),
132 trial, -200, -100, 0
Layer2
PPC
Continuous Intermittent1 Intermittent2
b
0

脳の回路はなぜ柔軟に活性化/接続され得るのか？
計算理論
n ゲーティングの学習
(Jacobs, Jordan, 1991; Singh 1992)
n 予測誤差による選択/統合
l MOSAIC architecture
(Wolpert, Kawato, 1998; Sugimoto et al., 2012)
n 不確かさによる選択/統合
l Bayesian cue integration
(Deneve et al., 2001; Daw et al., 2005)
神経機構
n ⼤脳基底核によるゲーティング
(Eliasmith et al., 2012)
n 視床/前障 (claustrum)
(Crick 1984; Crick, Koch 2005)
n 樹状突起/軸索の抑制
(Wang & Yang, 2018)
n 脳波/スパイク同期
(Malsburg; Singer)
MOSAIC for Multiple-Reward Environments 581
Figure 1: Schematic diagram of MOSAIC-MR. Reward modules (top left) pre-
dict the immediate reward (ˆri), and forward modules (bottom left) predict the
local dynamics of environment (ˆ˙xj). Reward and forward modules decompose
a complex environment into subenvironments on the basis of prediction errors
(r − ˆri) and (˙x − ˆ˙xj), respectively. RL modules (displayed at right) output action
(uk) appropriate for each subenvironment. The learning and control of each
RL module are weighted based on TD error (δk), that is, on how well each RL
controller predicts the expected discounted future rewards.
modules is performed on the basis of the prediction errors of the reward
and the dynamics, respectively. The RL module, which receives the reward
and dynamics predictions (ˆr(t), ˆ˙x(t)) and the weights of each module in
80 Neurobiology of behavior
Figure 4
Input pathway 2
ThalamusBasal Ganglia
Output pathway 1
Output pathway 2
Input
gating
Output
gating
Sync. gating
Recurrent
gating
Striatal/Thalamic
gating
Input pathway 1
STC DTC
ITC
Current Opinion in Neurobiology
Various mechanisms for information gating in the brain. Input gating can be achieved by dendrite-targeting interneurons that selectively control
inputs to pyramidal dendrites. In the synchronous gating mechanism, communication between two areas depends on the degree of temporal
synchrony of neural activity between the source and target areas. Recurrent gating mechanism involves selective integration of inputs based on
context-dependent dynamics of the network. Output gating is instantiated with perisoma-targeting interneurons that specifically inhibit pyramidal
neurons projecting to one pathway but not others. Gating may also involve subcortical structures, especially basal ganglia and thalamus.

強化学習
n 報酬の予測: 価値関数
l V(s) = E[ r(t) + gr(t+1) + g2r(t+2)…| s(t)=s]
l Q(s,a) = E[ r(t) + gr(t+1) + g2r(t+2)…| s(t)=s, a(t)=a]
n ⾏動の選択
l greedy: a = argmax Q(s,a)
l Boltzmann: P(a|s) µ exp[ b Q(s,a)]
n 予測の修正: TD 誤差
l d(t) = r(t) + gV(s(t+1)) - V(s(t))
l DV(s(t)) = a d(t)
l DQ(s(t),a(t)) = a d(t)
これらのステップを実現する回路機構は?
これらのパラメタの制御機構は?

将来報酬の割引率 g
n ⼤きなg
l 遠くの電池も取りに⾏く
n ⼩さなg
l 近くの電池しか取らない

報酬予測の時間スケール
V(t) = E[ r(t) + gr(t+1) + g2r(t+2) + …]
割引係数 g: どれぐらい先の報酬まで予測するか
1 2 3 4 step
-20
+100
-20 -20
1
0
1 2 3 4 step
-20
+100
-20 -20
1
0
1 2 3 4 step
+50
-100
1
0
1 2 3 4 step
+50
-100
1
0
g 大 g 小
ついやってしまう
苦労してもやる！
危ないことはやらない
やらないほうがまし
V =18.7
V = -22.9
V =-25.1
V = 47.3
うつ病︖
衝動性︖
セロトニン︖

神経修飾物質系とメタ学習 (Doya, 2002)
n 脳幹から⼩脳，基底核，⼤脳⽪質に広範に投射
l 学習の⼤域変数／パラメタを制御？
ドーパミン: 報酬予測誤差 d
アセチルコリン: 学習速度 a
ノルアドレナリン: 探索の絞り b
セロトニン: 予測の時間スケール g

セロトニンニューロンの光刺激
(Miyazaki et al., 2014, Current Biology)
n エサ待ち課題（3, 6, 9, ∞秒）
l ⻩⾊光：刺激なし: 12.1秒
l ⻘⾊光：刺激あり: 20.8秒

データ効率
l メタ学習

ロボットの報酬系の設計
報酬とは何か？… ⽣物にとって最も重要な機能のため
n ⾷欲、痛み：⾃⼰保存
n 愛情、性欲：⾃⼰複製
Cyber Rodents
l 電池パックの捕獲と充電
l ⾚外線でのソフトのコピー

学習し進化するロボット集団
n ⽣存
lバッテリー捕獲
n 複製
l“遺伝⼦”のコピー

ロボットの学習のしかたの進化
n ⼀⽣(約5分)の間に交配して遺伝⼦を残す
l 交配の成功は、充電レベルに⽐例
(Elfwing et al. 2011)
Robots
Virtual agents
15-25
Population
w1, w2, …, wn
Genes
Weights for top layer NN
Weights shaping rewards
Meta-parameters
v1, v2, …, vn
αγλτkτ 0

進化により獲得された視覚報酬
n バッテリー n 他のロボットの顔

行動戦略の多型性
(Elfwing et al. 2015)
n 貪⾷ロボと好⾊ロボ
n 進化的安定性
Average mating performance (ts/mating)
Figure 2. The Correlation between the average mating learning performance and the
average fitness in the final 20 generations in all experiments. The learning performance was
estimated as the number of time steps the mating behavior was selected divided with number of mating
events. The seven types of markers indicate the number of energy sources in the environment for each
simulation.
(a) (b)
Figure 3. Example trajectories of the learned behaviors for the roamer strategy and the
stayer strategy. (a) The roamer ignores the tail-lamp of the mating partner and executes the learned
foraging behavior to capture the energy source. (b) The stayer executes the learned waiting behavior
and adjusts its position according to the trajectory of the mating partner.
15
−0.5 0 0.5 1
1.6
2
2.4
2.8
w1
w5
Roamers
Stayers
(a)
0 0.2 0.4 0.6 0.8 1
0%
4%
8%
12%
¯Em
Percentofpopulation
Roamers
Stayers
(b)
80%
100%
e
Roamers
Stayers
顔への重み
交配⾏動のバイアス交配⾏動を取る充電レベル
貪⾷型
Forager
好⾊型
Tracker
Foragers
Trackers
Foragers
Trackers
⽐率
0 0.25 0.5 0.75 1
10
11
12
Stayer proportion
Average#matingevents
Roamers
Stayers
Population
(a)
0.25 0.5 0.75
0.25
0.5
0.75
Stayer proportion
¯Mr→s/ ¯Mr
¯Ms→s/ ¯Ms
¯Mr/ ¯Ms
(b)
0 0.25 0.5 0.75 1
0.6
0.7
0.8
0.9
1
Averagematingenergylevel
Stayer proportion
Roamers
Stayers
(c)
0 0.25 0.5 0.75 1
16
17
18
19
20
21
Stayer proportion
Averagefitness
Roamers
Stayers
(d)
Figure 5. Average number of number of mating events, average proportion of mating
events with stayer mating partners, average energy level at the mating events, and average
fitness, as functions of the stayer proportion in the population, for the roamer (green solid
lines with circles) and stayer (red solid lines with circles) subpopulations. (a) The dotted
lines show the best linear fit for the two subpopulations and the black line shows average values for the
population as a whole. (b) The dotted lines show the best linear fit for the two subpopulations and the
black line shows average ratio of the number of roamer mating events to the number of stayer mating
events. (c) The dotted lines show the constant approximations as the average values over all phenotype
proportions. (d) The dotted lines show the estimated fitness values using Equations 6 and 8.
Tracker proportion
Foragers
Trackers

自律AIエージェントの脅威？
創造的なAIエージェント
n ⾃らの⽬標を発⾒し実現する
n 新たな科学・技術・⽂化・産業の創⽣の可能性
危険性の評価と対処
n 誤動作
n 副作⽤
n 野⼼や敵意を持った個⼈や集団による利⽤

futureoflife.org/ai-principles
53
Research Issues
1) Research Goal: The goal of AI research should be to create not undirected intelligence, but beneficial
intelligence.
2) Research Funding: Investments in AI should be accompanied by funding for research on ensuring its
beneficial use, including thorny questions in computer science, economics, law, ethics, and social studies,
such as:
How can we make future AI systems highly robust, so that they do what we want without malfunctioning or
getting hacked?
How can we grow our prosperity through automation while maintaining people’s resources and purpose?
How can we update our legal systems to be more fair and efficient, to keep pace with AI, and to manage
the risks associated with AI?
What set of values should AI be aligned with, and what legal and ethical status should it have?
3) Science-Policy Link: There should be constructive and healthy exchange between AI researchers and
policy-makers.
4) Research Culture: A culture of cooperation, trust, and transparency should be fostered among
researchers and developers of AI.
5) Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on
safety standards.
Ethics and Values
6) Safety: AI systems should be safe and secure throughout their operational lifetime, and verifiably so
where applicable and feasible.
7) Failure Transparency: If an AI system causes harm, it should be possible to ascertain why.
8) Judicial Transparency: Any involvement by an autonomous system in judicial decision-making should
provide a satisfactory explanation auditable by a competent human authority.
9) Responsibility: Designers and builders of advanced AI systems are stakeholders in the moral implications
of their use, misuse, and actions, with a responsibility and opportunity to shape those implications.
10) Value Alignment: Highly autonomous AI systems should be designed so that their goals and behaviors
can be assured to align with human values throughout their operation.
11) Human Values: AI systems should be designed and operated so as to be compatible with ideals of
human dignity, rights, freedoms, and cultural diversity.
12) Personal Privacy: People should have the right to access, manage and control the data they generate,
given AI systems’ power to analyze and utilize that data.
13) Liberty and Privacy: The application of AI to personal data must not unreasonably curtail people’s real or
perceived liberty.
14) Shared Benefit: AI technologies should benefit and empower as many people as possible.
15) Shared Prosperity: The economic prosperity created by AI should be shared broadly, to benefit all of
humanity.
16) Human Control: Humans should choose how and whether to delegate decisions to AI systems, to
accomplish human-chosen objectives.
17) Non-subversion: The power conferred by control of highly advanced AI systems should respect and
improve, rather than subvert, the social and civic processes on which the health of society depends.
18) AI Arms Race: An arms race in lethal autonomous weapons should be avoided.
Longer-term Issues
19) Capability Caution: There being no consensus, we should avoid strong assumptions regarding upper
limits on future AI capabilities.
20) Importance: Advanced AI could represent a profound change in the history of life on Earth, and should
be planned for and managed with commensurate care and resources.
21) Risks: Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning
and mitigation efforts commensurate with their expected impact.
22) Recursive Self-Improvement: AI systems designed to recursively self-improve or self-replicate in a
manner that could lead to rapidly increasing quality or quantity must be subject to strict safety and control
measures.
23) Common Good: Superintelligence should only be developed in the service of widely shared ethical ideals,
and for the benefit of all humanity rather than one state or organization.

人間社会に学ぶ
n ⼈間は地球上で最も危険な⽣物
⺠主主義という知恵：個⼈や特定の集団に無制限の権⼒を託さない
n 政治
l 公選任期制
l 三権分⽴
l 地⽅⾃治
n 経済
l 独占禁⽌法
l スト権
n 科学
l 相互評価 (peer review)
複数のオープンソース説明可能AIエージェントによる相互評価と連携

社会的価値志向性と扁桃体/前頭前野
merica,Inc.Allrightsreserved.
B R I E F COM MU N I CAT I ON S
Individual social value orientation1 determines behavior in economic
games and many real-life situations2. Prosocials are defined as those
who like to maximize the sum of resources for the self and the other,
while simultaneously minimizing the difference between the two.
By contrast, individualists like to maximize resources for the self,
while competitors like to maximize the difference between the two.
Methods). In the magnetic resonance imaging (MRI) scanner, the
subjects were presented with a pair of rewards for the self and the
other. They were asked to evaluate the desirability of the reward pair
on four different levels (least preferable (1) to most preferable (4))
by a button press.
Each subject’s evaluation was linearly regressed with the reward for
the subject (Rs), the reward for the other (Ro) and the absolute value
of their difference (Da) as implicated in the theory of social value
orientation1, yielding coefficients WRs, WRo and WD, respectively
(Supplementary Fig. 1). The two groups differed markedly in their
response to the absolute reward difference, WD (t34 = 6.68, P <
0.00001; Fig. 1c). The prosocials disliked large absolute differences
in distributions (inequity aversion), whereas the individualists were
unaffected by such differences. In contrast, individualists preferred
higher self rewards (WRs; t34 = 6.54, P < 0.00001; Fig. 1c), but were
indifferent to other rewards (WRo; Fig. 1c).
To identify the neural substrates corresponding to these behavio-
ral differences, we conducted a regression analysis of functional MRI
(fMRI) data at the time of reward pair presentation with three explana-
tory variables: the reward for the subject (Rs), the reward for the other
(Ro) and the absolute value of their difference (Da) using statistical
parameter mapping9 (Supplementary Methods) and looked for brain
structures whose activity distinguished between prosocials and indi-
Activity in the amygdala elicited
by unfair divisions predicts
social value orientation
Masahiko Haruno1,2 & Christopher D Frith3,4
‘Social value orientation’ characterizes individual differences
in anchoring attitudes toward the division of resources. Here,
by contrasting people with prosocial and individualistic
orientations using functional magnetic resonance imaging, we
demonstrate that degree of inequity aversion in prosocials is
predictable from amygdala activity and unaffected by cognitive
load. This result suggests that automatic emotional processing
in the amygdala lies at the core of prosocial value orientation.
Individual social value orientation1 determines behavior in economic
games and many real-life situations2. Prosocials are defined as those
who like to maximize the sum of resources for the self and the other,
while simultaneously minimizing the difference between the two.
By contrast, individualists like to maximize resources for the self,
while competitors like to maximize the difference between the two.
The question we address here is whether prosocial attitudes depend
upon deliberate top-down control of selfish
impulses3 or on automatic inequity aversion4,
and what their neural substrates are. In the
former case, we would expect prosocial atti-
tudes to be associated with greater prefrontal
activity3, whereas the insula5,6 or amygdala7,8
would be involved in the latter case.
We used a behavioral task to identify peo-
ple with different social value orientations.
Methods). In the magnetic resonance imaging (MRI) scanner, the
subjects were presented with a pair of rewards for the self and the
other. They were asked to evaluate the desirability of the reward pair
on four different levels (least preferable (1) to most preferable (4))
by a button press.
Each subject’s evaluation was linearly regressed with the reward for
the subject (Rs), the reward for the other (Ro) and the absolute value
of their difference (Da) as implicated in the theory of social value
orientation1, yielding coefficients WRs, WRo and WD, respectively
(Supplementary Fig. 1). The two groups differed markedly in their
response to the absolute reward difference, WD (t34 = 6.68, P <
vidualists (see Supplementary Tables 1 (Da) and 2 (Rs and Ro)).
Activity in the amygdala elicited
by unfair divisions predicts
social value orientation
Masahiko Haruno1,2 & Christopher D Frith3,4
‘Social value orientation’ characterizes individual differences
in anchoring attitudes toward the division of resources. Here,
by contrasting people with prosocial and individualistic
orientations using functional magnetic resonance imaging, we
demonstrate that degree of inequity aversion in prosocials is
predictable from amygdala activity and unaffected by cognitive
load. This result suggests that automatic emotional processing
in the amygdala lies at the core of prosocial value orientation.
a b
c (36,177)
20.0 ± 2.0 S
Self (yen) Other (yen)
1 100 100
2 110 60
3 100 20
–0.020
Prosocials Individualists
WD
0.005
0
–0.005
–0.010
–0.015
0.02
Other
200
100
200100
Self
Feedback
...
x 36 Trials
Reward
conomic
as those
he other,
the two.
the self,
the two.
s depend
vidualists (see Supplementary Tables 1 (Da) and 2 (Rs and Ro)).
ng, we
als is
ognitive
cessing
ation.
design and behavior. (a) Triple-dominance measure task. Subjects chose one of three
prosocial (1), individualistic (2) and competitive (3) distributions of money between the
nknown other. 90 yen $1. (b) Reward pair evaluation task. (c) Comparison of model’s
b
(36,177)
Trial
20.0 ± 2.0 S
Self (yen) Other (yen)
100 100
110 60
100 20
20
05
0
05
10
15
02
01
0
0
01
01
Other
200
100
200
16.0 ± 2.0
13.0 ± 2.0
Self
36
Self
36
Other
177
Other
177
100
Self
Beep
Reward presentation
Feedback
...
x 36 Trials
Subject's
total reward
Total: 36
Subject's reward Other's reward
0.0
2.0
Rating 1–4
Reward
The dorsal amygdala10 was the only area where the groups differed in
the correlation of activity with the absolute value of reward difference Da
(Fig. 2a, P < 0.001; uncorrected). This activity was positively correlated
with Da in prosocials but slightly negatively correlated in individualists.
The amygdala activity increased only in prosocials when the absolute
value of reward difference Da was large (Fig.2b, significantly positive 5 s
after the reward pair onset (P < 0.01); and Fig.2c). The same effect was
seen in the left amygdala with weaker significance (P < 0.005, uncor-
rected; Supplementary Fig.2). In addition, the activity in the amygdala
Twenty-four prosocials and eight individualists were instructed to
evaluate the reward pair and press a corresponding button as soon as
possible. They were also required to memorize a five-digit number
sequence before the presentation of each reward pair, and their mem-
ory was probed after each evaluation. A high-load, random number
sequence was contrasted with a low-load, fixed number sequence
(01234). There were no differences between the groups in their per-
formance of the memory tasks (Supplementary Fig. 9). As in the
imaging study, the prosocials showed inequity aversion, whereas the
individualists did not (F1,30 = 12.98, P = 0.0006). However, there was
no effect of cognitive load on inequity aversion for the prosocials
(t23 = 0.08; P = 0.94). In contrast, the individualists became slightly
more competitive under cognitive load, showing greater dislike
of higher reward to the other (paired t-test; t7 = 1.94, P < 0.05;
Supplementary Fig. 10). Evaluation times were also unaffected by
cognitive load (Supplementary Fig. 10).
These results suggest that prosocial value orientation is driven by an
intuitive aversion for the iniquitable division of resources. Our find-
ings highlight an important role for automatic intuitive processing in
social interaction, in addition to the slow strategic processes that were
the focus of previous studies5–7,15 using interactive games.
Note: Supplementary information is available on the Nature Neuroscience website.
ACKNOWLEDGMENTS
We thank S. Tada and Y. Furukawa for technical assistance. This research was
supported by the Strategic Research Program for Brain Sciences (SRPBS) and
Tamagawa University Global Center of Excellence grant from the Ministry of
Figure 2 Difference between prosocials and individualists in the correlation
of brain activity with the absolute value of reward difference Da. (a) Coronal
section of the right amygdala, where the two groups showed a significant
difference in correlation with the absolute value of reward difference
Da (Montreal Neurological Institute coordinates 20, –2, –12). (b) BOLD
signal increase at the amygdala peak during 12 trials with the largest Da
values (solid line) and 12 trials with the smallest Da values (dotted line) for
prosocials and individualists. Time 0 specifies the presentation of reward pair.
Error bars, s.e. over subjects. (c) Left: beta values of each prosocial (gray) and
individualist (black) in correlation with Da. Right: correlation of each subject’s
beta value and WD (the subject’s dislike of Da) (gray, prosocials; black,
individualists). t-test for a regression coefficient significantly different from 0.
a
b
c
–2
–1
0
1
2
3
4
× 10–3 × 10–3
–0.03 –0.02 –0.01 0 0.01
–2
0
2
4
Prosocials
Individualists
Betavalue
Subjects
WD
R = 0.12
P = 0.69
Betavalue
R = –0.45
P = 0 .029
–1
1
3Signalincrease(%)
Small Da
0
0.1
0 2 4 6 8 10 12
–0.1
Scan time (s)
0 2 4 6 8 10 12
0
0.1
–0.1
Large Da
Small Da
Large Da
0
1
2
3
4
Scan time (s)
1Scientific RepoRts
Representation of economic
preferences in the structure and
function of the amygdala and
prefrontal cortex
Alan S. R. Fermin , Masamichi Sakagami ,Toko Kiyonari ,Yang Li ,Yoshie Matsumoto &
ToshioYamagishi
Social value orientations (SVOs) are economic preferences for the distribution of resources – prosocial
individuals are more cooperative and egalitarian than are proselfs. Despite the social and economic
implications of SVOs, no systematic studies have examined their neural correlates.We investigated the
amygdala and dorsolateral prefrontal cortex (DLPFC) structures and functions in prosocials and proselfs
by functional magnetic resonance imaging and evaluated cooperative behavior in the Prisoner’s
In everyday life, humans experience social dilemmas regarding whether to follow social norms and cooperate
with others at some personal cost or behave selfishly and maximize their own welfare. Social and economic stud-
ies have demonstrated that economic decisions are considerably influenced by individual differences in social
value orientation (SVO)1–6
, a social preference where individuals are classified as either prosocials or proselfs
based on weights they assign to the distribution of resources between oneself and others2–5
. Prosocials prefer a
distribution of resources in which they and their partners jointly earn the most. In contrast, proselfs prefer the
distribution that gives themselves the highest earnings, regardless of the partner’s payoff. SVO is consistently
related to behavior in economic games3,6
and relates to self-sacrifice in real-life social relations7
as well as dona-
tion to charity8
. Despite the strong implications of SVO on society, it has not yet been established whether these
decisional dispositions have distinct structural and functional representations in the brain.
A wealth of behavioral evidence demonstrates that humans use distinct decision-making strategies for self-
ish and prosocial behaviors. Normative prosocial behaviors such as fairness, cooperation, spontaneous giving,
and helping are increased by a number of factors that reduce deliberation, including the seriousness of social
decisions9
, cognitive load10
, priming intuition11
, and time pressure12,13
. In addition, prosocial decisions occur
significantly more quickly than selfish ones do12,13
, while subjects make more selfish choices when a time delay is
available for deliberation12,14
. These findings suggest that humans may have an initial automatic impulse to behave
prosocially that is sometimes overridden by deliberative processes necessary to implement selfish decisions.
Neuroscience studies support the existence of distinct neural networks for automatic and deliberative decision
strategies in humans and animals15,16
. Of special interest are the dorsolateral prefrontal cortex (DLPFC) and the
amygdala. The role of the DLPFC has been demonstrated in the control of deliberative behaviors such as strategic
decision-making17
, inference, and reasoning18,19
. On the other hand, the amygdala has been implicated in the con-
trol of automatic behaviors such as the expression of innate responses20
, the acquisition of conditioned reactions
to biologically significant stimuli21
, and has recently been implicated in automatic social decision processes22–24
.
Brain
School of Social Informatics, Aoyama Gakuin
r
A
P
OPENRepresentation of economic
preferences in the structure and
function of the amygdala and
prefrontal cortex
Alan S. R. Fermin , Masamichi Sakagami ,Toko Kiyonari ,Yang Li ,Yoshie Matsumoto &
ToshioYamagishi
Social value orientations (SVOs) are economic preferences for the distribution of resources – prosocial
individuals are more cooperative and egalitarian than are proselfs. Despite the social and economic
implications of SVOs, no systematic studies have examined their neural correlates.We investigated the
amygdala and dorsolateral prefrontal cortex (DLPFC) structures and functions in prosocials and proselfs
by functional magnetic resonance imaging and evaluated cooperative behavior in the Prisoner’s
In everyday life, humans experience social dilemmas regarding whether to follow social norms and cooperate
with others at some personal cost or behave selfishly and maximize their own welfare. Social and economic stud-
ies have demonstrated that economic decisions are considerably influenced by individual differences in social
value orientation (SVO)1–6
, a social preference where individuals are classified as either prosocials or proselfs
based on weights they assign to the distribution of resources between oneself and others2–5
. Prosocials prefer a
distribution of resources in which they and their partners jointly earn the most. In contrast, proselfs prefer the
distribution that gives themselves the highest earnings, regardless of the partner’s payoff. SVO is consistently
related to behavior in economic games3,6
and relates to self-sacrifice in real-life social relations7
as well as dona-
tion to charity8
. Despite the strong implications of SVO on society, it has not yet been established whether these
decisional dispositions have distinct structural and functional representations in the brain.
A wealth of behavioral evidence demonstrates that humans use distinct decision-making strategies for self-
ish and prosocial behaviors. Normative prosocial behaviors such as fairness, cooperation, spontaneous giving,
and helping are increased by a number of factors that reduce deliberation, including the seriousness of social
decisions9
, cognitive load10
, priming intuition11
, and time pressure12,13
. In addition, prosocial decisions occur
significantly more quickly than selfish ones do12,13
, while subjects make more selfish choices when a time delay is
available for deliberation12,14
. These findings suggest that humans may have an initial automatic impulse to behave
prosocially that is sometimes overridden by deliberative processes necessary to implement selfish decisions.
Neuroscience studies support the existence of distinct neural networks for automatic and deliberative decision
strategies in humans and animals15,16
. Of special interest are the dorsolateral prefrontal cortex (DLPFC) and the
amygdala. The role of the DLPFC has been demonstrated in the control of deliberative behaviors such as strategic
decision-making17
, inference, and reasoning18,19
. On the other hand, the amygdala has been implicated in the con-
trol of automatic behaviors such as the expression of innate responses20
, the acquisition of conditioned reactions
to biologically significant stimuli21
, and has recently been implicated in automatic social decision processes22–24
.
Brain
School of Social Informatics, Aoyama Gakuin
r
A
P
OPEN
www.nature.com/scientificreports/
Figure 3. Correlation between amygdala and dorsolateral prefrontal cortex (DLPFC) gray matter volumes
with social value orientation (SVO) and cooperative behavior in the Prisoner’s Dilemma game. (a) Left
amygdala volume was significantly larger in prosocials than it was in proselfs (positive correlation with SVO,
66 voxels, x= −17, y= −9, z= −12, t= 2.61, P< 0.05 family-wise error (FWE) corrected) and (b,c) positively
www.nature.com/scientificreports/

データ効率
l メタ学習
(銅⾕, 松尾, Brain and Nerve, 2019; groups.oist.jp/ncu/publications)
C D
!"#$%&'(
C D
!"#$%&'(
C D
!"#$%&'(
14
16 18 20 22 24 26 28
15
20
25
30
Average mating performance (ts/mating)
Averagefitness
4 b.
6 b.
8 b.
10 b.
12 b.
14 b.
16 b.
R = −0.84
p < 0.0001
Figure 2. The Correlation between the average mating learning performance and the
average fitness in the final 20 generations in all experiments. The learning performance was
estimated as the number of time steps the mating behavior was selected divided with number of mating
events. The seven types of markers indicate the number of energy sources in the environment for each
simulation.
(a) (b)
Figure 3. Example trajectories of the learned behaviors for the roamer strategy and the
stayer strategy. (a) The roamer ignores the tail-lamp of the mating partner and executes the learned
foraging behavior to capture the energy source. (b) The stayer executes the learned waiting behavior
and adjusts its position according to the trajectory of the mating partner.
respectively). The remainder of differences were not significant
(Supplementary Table 2). Subsequently, we tested variability of
waiting time ratio among mice. We compared the obtained mixed
model with the model including a fixed effect of reward-delay
condition, but not a random effect of MI. To evaluate likelihood
Bayesian decision model of waiting. Can these effects of ser-
otonin on waiting, depending on the RP and timing uncertainty,
be explained in a coherent way? Here we consider the possibility
that serotonin signals the prior probability of reward delivery in a
Bayesian model of repeated decisions to wait or to quit. In this
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
Waiting time (s)
Waiting time (s)
Waiting time (s)
Waiting time (s)
NumberoftimesNumberoftimesNumberoftimesNumberoftimes
Delay 6 test
Delay 4-6-8 test
Delay 2-6-10 test
Delay 10 test
a
b
c
d
Serotonin no-activation
Serotonin activation
Fig. 5 Optogenetic activation of DRN serotonin neurons enhances waiting for temporally uncertain rewards. a Distribution of waiting time during omission
trials in the D6 test. b Distribution of waiting time during omission trials in the D4-6-8 test. c Distribution of waiting time during omission trials in the D2-
6-10 test. d Distribution of waiting time during omission trials in the D10 test. Orange circles illustrate the timing and number of food pellets presented in
rewarded trials. White circles denote omission trials
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04496-y
?
0.12
NormalizedmeanR/R
0
1
PPC: N = 43
Normalized
meanR/R
Normalized
likelihood
-10
15
Posterior
Goal distance (cm)
67 33 0
33 0 33 0
0.04
0
0
Goal distance (cm)
67 33 0 67 33 0 67 67
Sound
zone
a
Estimateddistance(cm)
Layer2
PM
PPC
Continuous Intermittent1 Intermittent2
b
67
33
0
33
0
Sound zoneSound zone
Decod
[0 0.0
Deco

謝辞
n 強化学習ロボット
l Paavo Parmas
l Jiexin Wang (ATR)
l 吉⽥尚⼈ (U Tokyo)
l Stefan Elfwing (ATR)
l 内部英治 (ATR)
l 森本淳 (ATR)
n ⾏動価値／線条体
l 伊藤真 (Progress Technologies)
l 吉澤知彦 (⽟川⼤)
l Charles Gerfen (NIH)
l 鮫島和⾏（⽟川⼤）
l ⽊村實（⽟川⼤）
l 上⽥康雅（関⻄医⼤）
n 2光⼦イメージング
l 船⽔章⼤ (CSHL)
l Bernd Kuhn
l Sergey Zobnin
l Yuzhe Li
n セロトニン
l 宮崎勝彦
l 宮崎佳代⼦ (JSPS)
l 濱⽥太陽
l ⽥中謙⼆（慶應⼤）
l ⼭中章宏（名古屋⼤）
n fMRI
l Alan Fermin（⽟川⼤）
l 吉⽥岳彦
l ⽥中沙織（ATR）
l Nicolas Schweighofer (USC)
l 吉本潤⼀郎（NAIST）
l 清⽔優
l 徳⽥智磯（NAIST
l ⼭脇成⼈（広島⼤）
l 川⼈光男（ATR）
n ⼤脳基底核−視床−⼤脳⽪質モデル
l 庄野修（HRI）
l 五⼗嵐潤（理研）
l Jan Moren
l Jean Leinard
l Calos Gutierrez
l Benoit Girard, Daphne Heraiz (UPMC)
l 橘吉寿（神⼾⼤）
l 南部篤（⽣理研）
l ⾼⽊周（東⼤）
n 新学術領域研究「予測と意思決定」
n 新学術領域研究「⼈⼯知能と脳科学」
n ポスト京萌芽的課題「全脳シミュレーションと⼈⼯知能」
n ⾰新的技術による脳機能ネットワークの全容解明プロジェクト
n 脳科学研究戦略推進プログラム

International Symposium on AI and Brain Science
n October 10–12, 2020, Online www.brain-ai.jp/symposium2020
59

Special Issue on
Artificial Intelligence and Brain Science
60
www.journals.elsevier.com/neural-networks/
n Timeline
l Manuscript submission due: January 10, 2021
l First review completed: March 31, 2021
l Revised manuscript due: May 31, 2021
l Final decisions to authors: June 30, 2021
n Guest Editors:
l Kenji Doya, Okinawa Institute of Science and Technology (ncus@oist.jp)
l Karl Friston, University College London
l Masashi Sugiyama, RIKEN AIP and The University of Tokyo IRCN
l Josh Tenenbaum, Massachusetts Institute of Technology

Neuro 2022
Japanese Neuroscience/Neurochemistry/Neural Network Societies
June 30–July 3, 2022 in Okinawa Convention Center
61

脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム：銅谷賢治

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a 脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム：銅谷賢治

Semelhante a 脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム：銅谷賢治 (20)

Mais de The Whole Brain Architecture Initiative

Mais de The Whole Brain Architecture Initiative (20)

Último

Último (20)