SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
脳とAIの接点から何を学びうるのか
銅⾕ 賢治 doya@oist.jp
神経計算ユニット groups.oist.jp/ncu
沖縄科学技術⼤学院⼤学
第5回全脳アーキテクチャシンポジウム
脳と人工知能
電⼦回路で知能を実現するために
脳のしくみにとらわれる必要はない。
脳のような⾼度な知能の実現例がある
のだからそれに学ばない⼿はない。
前世紀の⼈⼯知能:専⾨家の知識をプログラム化
今⽇の⼈⼯知能:ビッグデータからの統計的機械学習
⾼性能を追い求めた結果、ディープラーニング
のような脳を模した学習の強みが再認識された。
脳科学と人工知能の共進化:視覚
脳科学 ⼈⼯知能
多層教師あり学習
(Amari 1967)
ネオコグニトロン
(Fukushima 1980)
ConvNet(Krizhevsky, Sutskever, Hinton, 2012)
GoogleBrain(2012)
場所細胞
(O’Keefe 1976)
顔細胞(Bruce, Desimone, Gross1981)
(Sugase et al. 1999)
e calculated the time
from the number of
indow of 50 ms, and
th the x2
test13
(see
mation analysis for the
ron coded ‘significant’
he earliest information
d rapidly, correspond-
nse (beige histogram),
2 (monkey expression)
peaks before levelling off. F1 (monkey identity) and F4 (human
expression) were encoded at a much lower level than the others. The
response latency of this neuron was 53 ms, and the latencies for the
information about each category were as follows: G, 45; F1, 157; F2,
93; F3, 125; and F4, 125 ms. Thus, the earliest transient discharge
conveyed global information, and the later sustained discharge
encoded one category of the fine information best14
.
Of the 86 face-responsive neurons, 11 neurons (13%) did not
encode significant information in any category, 43 (50%) coded
either global or fine category information, and 32 (37%) coded both
categories. To clarify how single neurons encoded both global and
D
A B C D
A B C D
1
2
3
c
d
HIPPOCAMPAL PLACE UNITS 87
WALL
-213-4-j
lRACK
FIG. 2. Place fields for all place units except 21342 and those from animal 217.
distributed around the maze. The concentration of fields from the other
animals in arm B may have reflected the fact that many of the rats spent
their “free time” in this arm. The fact that the initial search for units
was conducted there might also have introduced a bias towards units
active in that area. In any case, it was clear that the majority of fields
were not located in those places which contained the rewards or other
FIG. 3. Place fields for place units from animal 217.
経験依存学習
(Blakemore & Cooper 1970)
RECEPTIVE FIELDS IN CAT STRIATE CORTEX 579
found by changing the size, shape and orientation of the stimulus until a clear
response was evoked. Often when a region with excitatory or inhibitory
responses was established the neighbouring opposing areas in the receptive
field could only be demonstrated indirectly. Such an indirect method is
illustrated in Fig. 3B, where two flanking areas are indicated by using a short
slit in various positions like the hand of a clock, always including the very
A B
+7 -!
mm -I
- m_
aS~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~T T T
Fig. 3. Same unit as in Fig. 2. A, responses to shinling a rectangular light spot, 1° x 8°; centre of
slit superimposed on centre of receptive field; successive stimuli rotated clockwise, as shown
to left of figure. B, responses to a 1° x 5° slit oriented in various directions, with one end
always coveringthe centre ofthe receptive field: note that this central region evoked responses
when stimulated alone (Fig. 2a). Stimulus and background intensities as in Fig. 1; stimulus
duration 1 sec.
centre of the field. The findings thus agree qualitatively with those obtained
with a small spot (Fig. 2a).
Receptive fields having a central area and opposing flanks represented a
common pattern, but several variations were seen. Some fields had longnarrow
central regions with extensive flanking areas (Figs. 1-3): others had a large
central area and concentrated slit-shaped flanks (Figs. 6, 9, 10). In many
fields the two flanking regions were asymmetrical, differing in size and shape;
in these a given spot gave unequal responses in symmetrically corresponding
37 PHYSIO. CXL,VIIT) by guest on October 20, 2012jp.physoc.orgDownloaded from J Physiol (
特徴抽出細胞
(Hubel & Wiesel 1959)
パーセプトロン
(Rosenblatt 1962)
脳科学と人工知能の共進化:強化学習
脳科学 ⼈⼯知能
0
1
delay)3s delay)6s delay)9s
**
古典的条件付け(Pavlov 1903)
オペラント条件付け
(Thorndike 1898)
Deep Q network (Mnihet al. 2015)difficult and engaging for human players. We used the same network
architecture, hyperparameter values (see Extended Data Table 1) and
learningprocedurethroughout—takinghigh-dimensionaldata(210|160
colour video at 60 Hz) as input—to demonstrate that our approach
robustly learns successful policies over a variety of games based solely
We compared DQN with the best performing methods from the
reinforcement learning literature on the 49 games where results were
available12,15
.Inaddition tothe learned agents,wealsoreportscoresfor
aprofessionalhumangamestesterplayingundercontrolledconditions
and a policy that selects actions uniformly at random (Extended Data
Convolution Convolution Fully connected Fully connected
No input
Figure 1 | Schematic illustration of the convolutional neural network. The
details of the architecture are explained in the Methods. The input to the neural
network consists of an 8438434 image produced by the preprocessing
map w, followed by three convolutional layers (note: snaking blue line
symbolizes sliding of each filter across input image) and two fully connected
layers with a single output for each valid action. Each hidden layer is followed
by a rectifier nonlinearity (that is, max 0,xð Þ).
RESEARCH LETTER
TD学習(Sutton et al. 1983)
ドーパミン細胞の報酬
予測誤差応答
(Schultzetal.1993,1997)
ドーパミン依存
シナプス可塑性
(Wickensetal.2000)
W. SCHULTZ4
fails to occur, even in the absence of an immediately preced-
ing stimulus (Fig. 2, bottom). This is observed when animals
fail to obtain reward because of erroneous behavior, when
liquid flow is stopped by the experimenter despite correct
behavior, or when a valve opens audibly without delivering
liquid (Hollerman and Schultz 1996; Ljungberg et al. 1991;
Schultz et al. 1993). When reward delivery is delayed for
0.5 or 1.0 s, a depression of neuronal activity occurs at the
regular time of the reward, and an activation follows the
reward at the new time (Hollerman and Schultz 1996). Both
responses occur only during a few repetitions until the new
time of reward delivery becomes predicted again. By con-
trast, delivering reward earlier than habitual results in an
activation at the new time of reward but fails to induce a
depression at the habitual time. This suggests that unusually
early reward delivery cancels the reward prediction for the
habitual time. Thus dopamine neurons monitor both the oc-
currence and the time of reward. In the absence of stimuli
immediately preceding the omitted reward, the depressions
do not constitute a simple neuronal response but reflect an
expectation process based on an internal clock tracking the
precise time of predicted reward.
Activation by conditioned, reward-predicting stimuli
About 55–70% of dopamine neurons are activated by
conditioned visual and auditory stimuli in the various classi-
cally or instrumentally conditioned tasks described earlier
(Fig. 2, middle and bottom) (Hollerman and Schultz 1996;
Ljungberg et al. 1991, 1992; Mirenowicz and Schultz 1994;
Schultz 1986; Schultz and Romo 1990; P. Waelti, J. Mire-
nowicz, and W. Schultz, unpublished data). The first dopa-
mine responses to conditioned light were reported by Miller
et al. (1981) in rats treated with haloperidol, which increased
the incidence and spontaneous activity of dopamine neurons
but resulted in more sustained responses than in undrugged
animals. Although responses occur close to behavioral reac-
tions (Nishino et al. 1987), they are unrelated to arm and
eye movements themselves, as they occur also ipsilateral toFIG. 2. Dopamine neurons report rewards according to an error in re-
ward prediction. Top: drop of liquid occurs although no reward is predicted the moving arm and in trials without arm or eye movements
at this time. Occurrence of reward thus constitutes a positive error in the (Schultz and Romo 1990). Conditioned stimuli are some-
prediction of reward. Dopamine neuron is activated by the unpredicted
what less effective than primary rewards in terms of response
線条体の⾏動価値表現 (Samejima et al. 2005)
⼤脳基底核TD学習仮説
(Bartoet al. 1995,
Montague et al. 1996)
!
V(s) Q(s,a)
www.brain-ai.jp
人工知能は脳からさらに何を学ぶべきか
エネルギー効率
データ効率
l 世界モデルと脳内シミュレーション
l モジュール⾃⼰組織化
l メタ学習
⾃律性と社会性
(銅⾕, 松尾, Brain and Nerve, 2019; groups.oist.jp/ncu/publications)
モデルフリー/モデルベースの行動
モデルフリー
n 「⾏動の価値」を記憶
n 直感的・反射的⾏動選択
n やってみた経験から学ぶ
処理は単純/反復経験が必要
モデルベース
n ⾏動の結果を予測する内部モデ
ル
n 先読みによる⾏動選択
l 「脳内シミュレーション」
n やってみる前に考える
柔軟な適応/処理は複雑
PILCO: モデルベース強化学習
(Paavo Parmas)
初回 2回⽬ 8回⽬
脳内シミュレーション
⾏動による⾝体や環境の状態変化の予測モデル
s’=f(s,a) または P(s’|s,a)
を使った認知⾏動機構
n 過去の状態と⾏動から、現在の状態を推定
l ノイズ、ディレイ、遮蔽への対処
n 現在の状態から、想定した⾏動の結果を予測
l モデルベース意思決定、⾏動計画、…
n 想定したの状態から、⾏動の結果や原因を予測
l 思考、推論、⾔語、科学、…
Model-based action planning
involves cortico-cerebellar and
basal ganglia networks
Alan S. R. Fermin , ,
,TakehikoYoshida ,
, JunichiroYoshimoto ,
, Makoto Ito ,
SaoriC.Tanaka & Kenji Doya , , ,
Humans can select actions by learning, planning, or retrieving motor memories. Reinforcement
Learning (RL) associates these processes with three major classes of strategies for action selection:
exploratory RL learns state-action values by exploration, model-based RL uses internal models to
simulate future states reached by hypothetical actions, and motor-memory RL selects past successful
state-action mapping. In order to investigate the neural substrates that implement these strategies,
we conducted a functional magnetic resonance imaging (fMRI) experiment while humans performed a
ventromedial prefrontal cortex and ventral striatum increased activity in the exploratory condition;
the dorsolateral prefrontal cortex, dorsomedial striatum, and lateral cerebellum in the model-based
condition; and the supplementary motor area, putamen, and anterior cerebellum in the motor-memory
implements the model-based RL action selection strategy.
Using exploration and reward feedback, humans and other animals have a remarkable capacity to learn new
motor behaviors without explicit teaching1
. Throughout most of our lives, however, we depend on explicit or
implicit knowledge, based upon past experiences, such as a map of the area or properties of the musculoskeletal
system, to enable focused exploration and efficient learning2,3
. After repeated practice, a motor behavior becomes
stereotyped and can be executed with little mental load4
. What brain mechanisms enable animals to employ
different learning strategies and to select or integrate them in a given situation? In this paper, we take a new
behavioral paradigm that captures different stages of motor learning during a single experimental session5
, and
using fMRI we explore brain structures that are specifically involved in implementing different learning strategies.
6
R
A
P
OPEN
(Alan Fermin)
20
n 脳内シミュレーションには
⼤脳⽪質/基底核/⼩脳が関与
1
3 2
1 2 3
+
dition 3 Condition 2 Condition 3B
D
Condition 1 Condition 3A Condition 2 Condition 3B
C D
Condition 1 Condition 3A Condition 2 Condition 3B
C D
視床
⿊質
下オリーブ核
⼤脳⽪質
⼤脳
基底核
⼩脳
⽬標出⼒
誤差信号
+
-
出⼒⼊⼒
⼩脳:教師あり学習:内部モデル
報酬信号
出⼒⼊⼒
⼤脳基底核:強化学習:報酬予測
⼤脳⽪質:教師なし学習:状態表現
出⼒⼊⼒
学習アルゴリズムによる分化
(Doya, 1999)
n 頭頂葉ニューロン活動の
⼆光⼦顕微鏡による観測
n ⾳響仮想空間
l 間⽋的な感覚情報
n ニューロン集団の確率的デコーディング
l 感覚情報がなくてもゴール位置を予測
©2016NatureAmerica,Inc.Allrightsreserved.
Animals have to act despite limited sensory information because of
factors such as interfering background noise or occluded vision. Thus,
the ability to estimate the current state of the outside world from a
sequence of sensory observations and their own actions is essential.
This process is optimally realized by dynamic Bayesian inference,
such as a Kalman filter1, which predicts the state with an internal state
transition model and updates the prediction with new sensory inputs.
For example, when a mouse navigates in darkness, it must keep track
of the location of its destination based on both its movement (predic-
tion) and sensory signals such as calls from its nest mate (updating).
We hypothesized that dynamic Bayesian inference is implemented in
cerebral neocortex and investigated the plausibility of this idea using
two-photon microscopy2.
Most areas of the cerebral neocortex receive ascending sensory
inputs (feedforward streams) and descending inputs (feedback
3
hypothesis that dynamic Bayesian inference is implemented in pyram-
idal neurons of layers 2, 3 and 5, with increasing action dependence
in deeper layers.
To test this hypothesis, we trained mice to perform an auditory
virtual navigation task and imaged neuronal activity in layers 2, 3
and 5 of the PPC and the PM located posterior to PPC11–13. The task
required mice to approach a water reward site (goal) by estimating
the distance on the basis of sound cues and their own locomotion.
PPC is involved in spatial navigation by representing route maps,
head directions, turning locations and locomotory accelerations with
egocentric and allocentric representations14–17. PPC lesions disrupt
navigation on the basis of self-motion information (path integra-
tion)18,19. PM also represents egocentric and allocentric reference
frames20. Both PPC and PM receive inputs from auditory cortex
and secondary motor cortex (M2)21,22, but PM receives fewer feed-
13,23,24
Neural substrate of dynamic Bayesian inference in the
cerebral cortex
Akihiro Funamizu1,2, Bernd Kuhn2 & Kenji Doya1
Dynamic Bayesian inference allows a system to infer the environmental state under conditions of limited sensory observation.
Using a goal-reaching task, we found that posterior parietal cortex (PPC) and adjacent posteromedial cortex (PM) implemented
the two fundamental features of dynamic Bayesian inference: prediction of hidden states using an internal state transition model
and updating the prediction with new sensory evidence. We optically imaged the activity of neurons in mouse PPC and PM layers
2, 3 and 5 in an acoustic virtual-reality system. As mice approached a reward site, anticipatory licking increased even when
sound cues were intermittently presented; this was disturbed by PPC silencing. Probabilistic population decoding revealed that
neurons in PPC and PM represented goal distances during sound omission (prediction), particularly in PPC layers 3 and 5, and
prediction improved with the observation of cue sounds (updating). Our results illustrate how cerebral cortex realizes mental
simulation using an action-dependent dynamic model.
0.12
NormalizedmeanR/R
0
1
PPC: N = 43
Normalized
meanR/R
Normalized
likelihood
-10
15
Posterior
Goal distance (cm)
67 33 0
33 0 33 0
0.04
0
0
Goal distance (cm)
67 33 0 67 33 0 67 67
Sound
zone
a
figure4
Hypothetical cortical algorithm of model-based state prediction
with Bayesian inference
Probabilistic population code in PPC
0327-mouse23: PPC Layer2 0416-mouse23: PM Layer2
Decoding_Bayes_160324_17_detail
0327-mouse23: State2(17),71 trial, --300, 200, -100, 0
0416-mouse23: State2(16),
132 trial, -200, -100, 0
Layer2
PPC
Continuous Intermittent1 Intermittent2
b
0
脳の回路はなぜ柔軟に活性化/接続され得るのか?
計算理論
n ゲーティングの学習
(Jacobs, Jordan, 1991; Singh 1992)
n 予測誤差による選択/統合
l MOSAIC architecture
(Wolpert, Kawato, 1998; Sugimoto et al., 2012)
n 不確かさによる選択/統合
l Bayesian cue integration
(Deneve et al., 2001; Daw et al., 2005)
神経機構
n ⼤脳基底核によるゲーティング
(Eliasmith et al., 2012)
n 視床/前障 (claustrum)
(Crick 1984; Crick, Koch 2005)
n 樹状突起/軸索の抑制
(Wang & Yang, 2018)
n 脳波/スパイク同期
(Malsburg; Singer)
MOSAIC for Multiple-Reward Environments 581
Figure 1: Schematic diagram of MOSAIC-MR. Reward modules (top left) pre-
dict the immediate reward (ˆri), and forward modules (bottom left) predict the
local dynamics of environment (ˆ˙xj). Reward and forward modules decompose
a complex environment into subenvironments on the basis of prediction errors
(r − ˆri) and (˙x − ˆ˙xj), respectively. RL modules (displayed at right) output action
(uk) appropriate for each subenvironment. The learning and control of each
RL module are weighted based on TD error (δk), that is, on how well each RL
controller predicts the expected discounted future rewards.
modules is performed on the basis of the prediction errors of the reward
and the dynamics, respectively. The RL module, which receives the reward
and dynamics predictions (ˆr(t), ˆ˙x(t)) and the weights of each module in
80 Neurobiology of behavior
Figure 4
Input pathway 2
ThalamusBasal Ganglia
Output pathway 1
Output pathway 2
Input
gating
Output
gating
Sync. gating
Recurrent
gating
Striatal/Thalamic
gating
Input pathway 1
STC DTC
ITC
Current Opinion in Neurobiology
Various mechanisms for information gating in the brain. Input gating can be achieved by dendrite-targeting interneurons that selectively control
inputs to pyramidal dendrites. In the synchronous gating mechanism, communication between two areas depends on the degree of temporal
synchrony of neural activity between the source and target areas. Recurrent gating mechanism involves selective integration of inputs based on
context-dependent dynamics of the network. Output gating is instantiated with perisoma-targeting interneurons that specifically inhibit pyramidal
neurons projecting to one pathway but not others. Gating may also involve subcortical structures, especially basal ganglia and thalamus.
強化学習
n 報酬の予測: 価値関数
l V(s) = E[ r(t) + gr(t+1) + g2r(t+2)…| s(t)=s]
l Q(s,a) = E[ r(t) + gr(t+1) + g2r(t+2)…| s(t)=s, a(t)=a]
n ⾏動の選択
l greedy: a = argmax Q(s,a)
l Boltzmann: P(a|s) µ exp[ b Q(s,a)]
n 予測の修正: TD 誤差
l d(t) = r(t) + gV(s(t+1)) - V(s(t))
l DV(s(t)) = a d(t)
l DQ(s(t),a(t)) = a d(t)
これらのステップを実現する回路機構は?
これらのパラメタの制御機構は?
将来報酬の割引率 g
n ⼤きなg
l 遠くの電池も取りに⾏く
n ⼩さなg
l 近くの電池しか取らない
報酬予測の時間スケール
V(t) = E[ r(t) + gr(t+1) + g2r(t+2) + …]
割引係数 g: どれぐらい先の報酬まで予測するか
1 2 3 4 step
-20
+100
-20 -20
1
0
1 2 3 4 step
-20
+100
-20 -20
1
0
1 2 3 4 step
+50
-100
1
0
1 2 3 4 step
+50
-100
1
0
g 大 g 小
ついやってしまう
苦労してもやる!
危ないことはやらない
やらないほうがまし
V =18.7
V = -22.9
V =-25.1
V = 47.3
うつ病︖
衝動性︖
セロトニン︖
神経修飾物質系とメタ学習 (Doya, 2002)
n 脳幹から⼩脳,基底核,⼤脳⽪質に広範に投射
l 学習の⼤域変数/パラメタを制御?
ドーパミン: 報酬予測誤差 d
アセチルコリン: 学習速度 a
ノルアドレナリン: 探索の絞り b
セロトニン: 予測の時間スケール g
セロトニンニューロンの光刺激
(Miyazaki et al., 2014, Current Biology)
n エサ待ち課題(3, 6, 9, ∞秒)
l ⻩⾊光:刺激なし: 12.1秒
l ⻘⾊光:刺激あり: 20.8秒
人工知能は脳からさらに何を学ぶべきか
エネルギー効率
データ効率
l 世界モデルと脳内シミュレーション
l モジュール⾃⼰組織化
l メタ学習
⾃律性と社会性
ロボットの報酬系の設計
報酬とは何か?… ⽣物にとって最も重要な機能のため
n ⾷欲、痛み:⾃⼰保存
n 愛情、性欲:⾃⼰複製
Cyber Rodents
l 電池パックの捕獲と充電
l ⾚外線でのソフトのコピー
学習し進化するロボット集団
n ⽣存
lバッテリー捕獲
n 複製
l“遺伝⼦”のコピー
ロボットの学習のしかたの進化
n ⼀⽣(約5分)の間に交配して遺伝⼦を残す
l 交配の成功は、充電レベルに⽐例
(Elfwing et al. 2011)
Robots
Virtual agents
15-25
Population
w1, w2, …, wn
Genes
Weights for top layer NN
Weights shaping rewards
Meta-parameters
v1, v2, …, vn
αγλτkτ 0
進化により獲得された視覚報酬
n バッテリー n 他のロボットの顔
行動戦略の多型性
(Elfwing et al. 2015)
n 貪⾷ロボと好⾊ロボ
n 進化的安定性
Average mating performance (ts/mating)
Figure 2. The Correlation between the average mating learning performance and the
average fitness in the final 20 generations in all experiments. The learning performance was
estimated as the number of time steps the mating behavior was selected divided with number of mating
events. The seven types of markers indicate the number of energy sources in the environment for each
simulation.
(a) (b)
Figure 3. Example trajectories of the learned behaviors for the roamer strategy and the
stayer strategy. (a) The roamer ignores the tail-lamp of the mating partner and executes the learned
foraging behavior to capture the energy source. (b) The stayer executes the learned waiting behavior
and adjusts its position according to the trajectory of the mating partner.
15
−0.5 0 0.5 1
1.6
2
2.4
2.8
w1
w5
Roamers
Stayers
(a)
0 0.2 0.4 0.6 0.8 1
0%
4%
8%
12%
¯Em
Percentofpopulation
Roamers
Stayers
(b)
80%
100%
e
Roamers
Stayers
顔への重み
交配⾏動のバイアス 交配⾏動を取る充電レベル
貪⾷型
Forager
好⾊型
Tracker
Foragers
Trackers
Foragers
Trackers
⽐率
0 0.25 0.5 0.75 1
10
11
12
Stayer proportion
Average#matingevents
Roamers
Stayers
Population
(a)
0.25 0.5 0.75
0.25
0.5
0.75
Stayer proportion
¯Mr→s/ ¯Mr
¯Ms→s/ ¯Ms
¯Mr/ ¯Ms
(b)
0 0.25 0.5 0.75 1
0.6
0.7
0.8
0.9
1
Averagematingenergylevel
Stayer proportion
Roamers
Stayers
(c)
0 0.25 0.5 0.75 1
16
17
18
19
20
21
Stayer proportion
Averagefitness
Roamers
Stayers
(d)
Figure 5. Average number of number of mating events, average proportion of mating
events with stayer mating partners, average energy level at the mating events, and average
fitness, as functions of the stayer proportion in the population, for the roamer (green solid
lines with circles) and stayer (red solid lines with circles) subpopulations. (a) The dotted
lines show the best linear fit for the two subpopulations and the black line shows average values for the
population as a whole. (b) The dotted lines show the best linear fit for the two subpopulations and the
black line shows average ratio of the number of roamer mating events to the number of stayer mating
events. (c) The dotted lines show the constant approximations as the average values over all phenotype
proportions. (d) The dotted lines show the estimated fitness values using Equations 6 and 8.
Tracker proportion
Foragers
Trackers
自律AIエージェントの脅威?
創造的なAIエージェント
n ⾃らの⽬標を発⾒し実現する
n 新たな科学・技術・⽂化・産業の創⽣の可能性
危険性の評価と対処
n 誤動作
n 副作⽤
n 野⼼や敵意を持った個⼈や集団による利⽤
futureoflife.org/ai-principles
53
Research Issues
1) Research Goal: The goal of AI research should be to create not undirected intelligence, but beneficial
intelligence.
2) Research Funding: Investments in AI should be accompanied by funding for research on ensuring its
beneficial use, including thorny questions in computer science, economics, law, ethics, and social studies,
such as:
How can we make future AI systems highly robust, so that they do what we want without malfunctioning or
getting hacked?
How can we grow our prosperity through automation while maintaining people’s resources and purpose?
How can we update our legal systems to be more fair and efficient, to keep pace with AI, and to manage
the risks associated with AI?
What set of values should AI be aligned with, and what legal and ethical status should it have?
3) Science-Policy Link: There should be constructive and healthy exchange between AI researchers and
policy-makers.
4) Research Culture: A culture of cooperation, trust, and transparency should be fostered among
researchers and developers of AI.
5) Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on
safety standards.
Ethics and Values
6) Safety: AI systems should be safe and secure throughout their operational lifetime, and verifiably so
where applicable and feasible.
7) Failure Transparency: If an AI system causes harm, it should be possible to ascertain why.
8) Judicial Transparency: Any involvement by an autonomous system in judicial decision-making should
provide a satisfactory explanation auditable by a competent human authority.
9) Responsibility: Designers and builders of advanced AI systems are stakeholders in the moral implications
of their use, misuse, and actions, with a responsibility and opportunity to shape those implications.
10) Value Alignment: Highly autonomous AI systems should be designed so that their goals and behaviors
can be assured to align with human values throughout their operation.
11) Human Values: AI systems should be designed and operated so as to be compatible with ideals of
human dignity, rights, freedoms, and cultural diversity.
12) Personal Privacy: People should have the right to access, manage and control the data they generate,
given AI systems’ power to analyze and utilize that data.
13) Liberty and Privacy: The application of AI to personal data must not unreasonably curtail people’s real or
perceived liberty.
14) Shared Benefit: AI technologies should benefit and empower as many people as possible.
15) Shared Prosperity: The economic prosperity created by AI should be shared broadly, to benefit all of
humanity.
16) Human Control: Humans should choose how and whether to delegate decisions to AI systems, to
accomplish human-chosen objectives.
17) Non-subversion: The power conferred by control of highly advanced AI systems should respect and
improve, rather than subvert, the social and civic processes on which the health of society depends.
18) AI Arms Race: An arms race in lethal autonomous weapons should be avoided.
Longer-term Issues
19) Capability Caution: There being no consensus, we should avoid strong assumptions regarding upper
limits on future AI capabilities.
20) Importance: Advanced AI could represent a profound change in the history of life on Earth, and should
be planned for and managed with commensurate care and resources.
21) Risks: Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning
and mitigation efforts commensurate with their expected impact.
22) Recursive Self-Improvement: AI systems designed to recursively self-improve or self-replicate in a
manner that could lead to rapidly increasing quality or quantity must be subject to strict safety and control
measures.
23) Common Good: Superintelligence should only be developed in the service of widely shared ethical ideals,
and for the benefit of all humanity rather than one state or organization.
人間社会に学ぶ
n ⼈間は地球上で最も危険な⽣物
⺠主主義という知恵:個⼈や特定の集団に無制限の権⼒を託さない
n 政治
l 公選任期制
l 三権分⽴
l 地⽅⾃治
n 経済
l 独占禁⽌法
l スト権
n 科学
l 相互評価 (peer review)
複数のオープンソース説明可能AIエージェントによる相互評価と連携
社会的価値志向性と扁桃体/前頭前野
merica,Inc.Allrightsreserved.
B R I E F COM MU N I CAT I ON S
Individual social value orientation1 determines behavior in economic
games and many real-life situations2. Prosocials are defined as those
who like to maximize the sum of resources for the self and the other,
while simultaneously minimizing the difference between the two.
By contrast, individualists like to maximize resources for the self,
while competitors like to maximize the difference between the two.
Methods). In the magnetic resonance imaging (MRI) scanner, the
subjects were presented with a pair of rewards for the self and the
other. They were asked to evaluate the desirability of the reward pair
on four different levels (least preferable (1) to most preferable (4))
by a button press.
Each subject’s evaluation was linearly regressed with the reward for
the subject (Rs), the reward for the other (Ro) and the absolute value
of their difference (Da) as implicated in the theory of social value
orientation1, yielding coefficients WRs, WRo and WD, respectively
(Supplementary Fig. 1). The two groups differed markedly in their
response to the absolute reward difference, WD (t34 = 6.68, P <
0.00001; Fig. 1c). The prosocials disliked large absolute differences
in distributions (inequity aversion), whereas the individualists were
unaffected by such differences. In contrast, individualists preferred
higher self rewards (WRs; t34 = 6.54, P < 0.00001; Fig. 1c), but were
indifferent to other rewards (WRo; Fig. 1c).
To identify the neural substrates corresponding to these behavio-
ral differences, we conducted a regression analysis of functional MRI
(fMRI) data at the time of reward pair presentation with three explana-
tory variables: the reward for the subject (Rs), the reward for the other
(Ro) and the absolute value of their difference (Da) using statistical
parameter mapping9 (Supplementary Methods) and looked for brain
structures whose activity distinguished between prosocials and indi-
Activity in the amygdala elicited
by unfair divisions predicts
social value orientation
Masahiko Haruno1,2 & Christopher D Frith3,4
‘Social value orientation’ characterizes individual differences
in anchoring attitudes toward the division of resources. Here,
by contrasting people with prosocial and individualistic
orientations using functional magnetic resonance imaging, we
demonstrate that degree of inequity aversion in prosocials is
predictable from amygdala activity and unaffected by cognitive
load. This result suggests that automatic emotional processing
in the amygdala lies at the core of prosocial value orientation.
©2009NatureAmerica,Inc.Allrightsreserved.
B R I E F COM MU N I CAT I ON S
Individual social value orientation1 determines behavior in economic
games and many real-life situations2. Prosocials are defined as those
who like to maximize the sum of resources for the self and the other,
while simultaneously minimizing the difference between the two.
By contrast, individualists like to maximize resources for the self,
while competitors like to maximize the difference between the two.
The question we address here is whether prosocial attitudes depend
upon deliberate top-down control of selfish
impulses3 or on automatic inequity aversion4,
and what their neural substrates are. In the
former case, we would expect prosocial atti-
tudes to be associated with greater prefrontal
activity3, whereas the insula5,6 or amygdala7,8
would be involved in the latter case.
We used a behavioral task to identify peo-
ple with different social value orientations.
Methods). In the magnetic resonance imaging (MRI) scanner, the
subjects were presented with a pair of rewards for the self and the
other. They were asked to evaluate the desirability of the reward pair
on four different levels (least preferable (1) to most preferable (4))
by a button press.
Each subject’s evaluation was linearly regressed with the reward for
the subject (Rs), the reward for the other (Ro) and the absolute value
of their difference (Da) as implicated in the theory of social value
orientation1, yielding coefficients WRs, WRo and WD, respectively
(Supplementary Fig. 1). The two groups differed markedly in their
response to the absolute reward difference, WD (t34 = 6.68, P <
0.00001; Fig. 1c). The prosocials disliked large absolute differences
in distributions (inequity aversion), whereas the individualists were
unaffected by such differences. In contrast, individualists preferred
higher self rewards (WRs; t34 = 6.54, P < 0.00001; Fig. 1c), but were
indifferent to other rewards (WRo; Fig. 1c).
To identify the neural substrates corresponding to these behavio-
ral differences, we conducted a regression analysis of functional MRI
(fMRI) data at the time of reward pair presentation with three explana-
tory variables: the reward for the subject (Rs), the reward for the other
(Ro) and the absolute value of their difference (Da) using statistical
parameter mapping9 (Supplementary Methods) and looked for brain
structures whose activity distinguished between prosocials and indi-
vidualists (see Supplementary Tables 1 (Da) and 2 (Rs and Ro)).
Activity in the amygdala elicited
by unfair divisions predicts
social value orientation
Masahiko Haruno1,2 & Christopher D Frith3,4
‘Social value orientation’ characterizes individual differences
in anchoring attitudes toward the division of resources. Here,
by contrasting people with prosocial and individualistic
orientations using functional magnetic resonance imaging, we
demonstrate that degree of inequity aversion in prosocials is
predictable from amygdala activity and unaffected by cognitive
load. This result suggests that automatic emotional processing
in the amygdala lies at the core of prosocial value orientation.
a b
c (36,177)
20.0 ± 2.0 S
Self (yen) Other (yen)
1 100 100
2 110 60
3 100 20
–0.020
Prosocials Individualists
WD
0.005
0
–0.005
–0.010
–0.015
0.02
Other
200
100
200100
Self
Feedback
...
x 36 Trials
Reward
conomic
as those
he other,
the two.
the self,
the two.
s depend
0.00001; Fig. 1c). The prosocials disliked large absolute differences
in distributions (inequity aversion), whereas the individualists were
unaffected by such differences. In contrast, individualists preferred
higher self rewards (WRs; t34 = 6.54, P < 0.00001; Fig. 1c), but were
indifferent to other rewards (WRo; Fig. 1c).
To identify the neural substrates corresponding to these behavio-
ral differences, we conducted a regression analysis of functional MRI
(fMRI) data at the time of reward pair presentation with three explana-
tory variables: the reward for the subject (Rs), the reward for the other
(Ro) and the absolute value of their difference (Da) using statistical
parameter mapping9 (Supplementary Methods) and looked for brain
structures whose activity distinguished between prosocials and indi-
vidualists (see Supplementary Tables 1 (Da) and 2 (Rs and Ro)).
ng, we
als is
ognitive
cessing
ation.
design and behavior. (a) Triple-dominance measure task. Subjects chose one of three
prosocial (1), individualistic (2) and competitive (3) distributions of money between the
nknown other. 90 yen $1. (b) Reward pair evaluation task. (c) Comparison of model’s
b
(36,177)
Trial
20.0 ± 2.0 S
Self (yen) Other (yen)
100 100
110 60
100 20
20
Prosocials Individualists
Prosocials Individualists
Prosocials Individualists
05
0
05
10
15
02
01
0
0
01
01
Other
200
100
200
16.0 ± 2.0
13.0 ± 2.0
Self
36
Self
36
Other
177
Other
177
100
Self
Beep
Reward presentation
Feedback
...
x 36 Trials
Subject's
total reward
Total: 36
Subject's reward Other's reward
0.0
2.0
Rating 1–4
Reward
©2009NatureAmerica,Inc.Allrightsreserved.
B R I E F COM MU N I CAT I ON S
The dorsal amygdala10 was the only area where the groups differed in
the correlation of activity with the absolute value of reward difference Da
(Fig. 2a, P < 0.001; uncorrected). This activity was positively correlated
with Da in prosocials but slightly negatively correlated in individualists.
The amygdala activity increased only in prosocials when the absolute
value of reward difference Da was large (Fig.2b, significantly positive 5 s
after the reward pair onset (P < 0.01); and Fig.2c). The same effect was
seen in the left amygdala with weaker significance (P < 0.005, uncor-
rected; Supplementary Fig.2). In addition, the activity in the amygdala
Twenty-four prosocials and eight individualists were instructed to
evaluate the reward pair and press a corresponding button as soon as
possible. They were also required to memorize a five-digit number
sequence before the presentation of each reward pair, and their mem-
ory was probed after each evaluation. A high-load, random number
sequence was contrasted with a low-load, fixed number sequence
(01234). There were no differences between the groups in their per-
formance of the memory tasks (Supplementary Fig. 9). As in the
imaging study, the prosocials showed inequity aversion, whereas the
individualists did not (F1,30 = 12.98, P = 0.0006). However, there was
no effect of cognitive load on inequity aversion for the prosocials
(t23 = 0.08; P = 0.94). In contrast, the individualists became slightly
more competitive under cognitive load, showing greater dislike
of higher reward to the other (paired t-test; t7 = 1.94, P < 0.05;
Supplementary Fig. 10). Evaluation times were also unaffected by
cognitive load (Supplementary Fig. 10).
These results suggest that prosocial value orientation is driven by an
intuitive aversion for the iniquitable division of resources. Our find-
ings highlight an important role for automatic intuitive processing in
social interaction, in addition to the slow strategic processes that were
the focus of previous studies5–7,15 using interactive games.
Note: Supplementary information is available on the Nature Neuroscience website.
ACKNOWLEDGMENTS
We thank S. Tada and Y. Furukawa for technical assistance. This research was
supported by the Strategic Research Program for Brain Sciences (SRPBS) and
Tamagawa University Global Center of Excellence grant from the Ministry of
Figure 2 Difference between prosocials and individualists in the correlation
of brain activity with the absolute value of reward difference Da. (a) Coronal
section of the right amygdala, where the two groups showed a significant
difference in correlation with the absolute value of reward difference
Da (Montreal Neurological Institute coordinates 20, –2, –12). (b) BOLD
signal increase at the amygdala peak during 12 trials with the largest Da
values (solid line) and 12 trials with the smallest Da values (dotted line) for
prosocials and individualists. Time 0 specifies the presentation of reward pair.
Error bars, s.e. over subjects. (c) Left: beta values of each prosocial (gray) and
individualist (black) in correlation with Da. Right: correlation of each subject’s
beta value and WD (the subject’s dislike of Da) (gray, prosocials; black,
individualists). t-test for a regression coefficient significantly different from 0.
a
b
c
–2
–1
0
1
2
3
4
× 10–3 × 10–3
–0.03 –0.02 –0.01 0 0.01
–2
0
2
4
Prosocials
Individualists
Betavalue
Subjects
WD
R = 0.12
P = 0.69
Betavalue
R = –0.45
P = 0 .029
–1
1
3Signalincrease(%)
Prosocials Individualists
Small Da
0
0.1
0 2 4 6 8 10 12
–0.1
Scan time (s)
0 2 4 6 8 10 12
0
0.1
–0.1
Large Da
Small Da
Large Da
0
1
2
3
4
Scan time (s)
1Scientific RepoRts
Representation of economic
preferences in the structure and
function of the amygdala and
prefrontal cortex
Alan S. R. Fermin , Masamichi Sakagami ,Toko Kiyonari ,Yang Li ,Yoshie Matsumoto &
ToshioYamagishi
Social value orientations (SVOs) are economic preferences for the distribution of resources – prosocial
individuals are more cooperative and egalitarian than are proselfs. Despite the social and economic
implications of SVOs, no systematic studies have examined their neural correlates.We investigated the
amygdala and dorsolateral prefrontal cortex (DLPFC) structures and functions in prosocials and proselfs
by functional magnetic resonance imaging and evaluated cooperative behavior in the Prisoner’s
In everyday life, humans experience social dilemmas regarding whether to follow social norms and cooperate
with others at some personal cost or behave selfishly and maximize their own welfare. Social and economic stud-
ies have demonstrated that economic decisions are considerably influenced by individual differences in social
value orientation (SVO)1–6
, a social preference where individuals are classified as either prosocials or proselfs
based on weights they assign to the distribution of resources between oneself and others2–5
. Prosocials prefer a
distribution of resources in which they and their partners jointly earn the most. In contrast, proselfs prefer the
distribution that gives themselves the highest earnings, regardless of the partner’s payoff. SVO is consistently
related to behavior in economic games3,6
and relates to self-sacrifice in real-life social relations7
as well as dona-
tion to charity8
. Despite the strong implications of SVO on society, it has not yet been established whether these
decisional dispositions have distinct structural and functional representations in the brain.
A wealth of behavioral evidence demonstrates that humans use distinct decision-making strategies for self-
ish and prosocial behaviors. Normative prosocial behaviors such as fairness, cooperation, spontaneous giving,
and helping are increased by a number of factors that reduce deliberation, including the seriousness of social
decisions9
, cognitive load10
, priming intuition11
, and time pressure12,13
. In addition, prosocial decisions occur
significantly more quickly than selfish ones do12,13
, while subjects make more selfish choices when a time delay is
available for deliberation12,14
. These findings suggest that humans may have an initial automatic impulse to behave
prosocially that is sometimes overridden by deliberative processes necessary to implement selfish decisions.
Neuroscience studies support the existence of distinct neural networks for automatic and deliberative decision
strategies in humans and animals15,16
. Of special interest are the dorsolateral prefrontal cortex (DLPFC) and the
amygdala. The role of the DLPFC has been demonstrated in the control of deliberative behaviors such as strategic
decision-making17
, inference, and reasoning18,19
. On the other hand, the amygdala has been implicated in the con-
trol of automatic behaviors such as the expression of innate responses20
, the acquisition of conditioned reactions
to biologically significant stimuli21
, and has recently been implicated in automatic social decision processes22–24
.
Brain
School of Social Informatics, Aoyama Gakuin
r
A
P
OPENRepresentation of economic
preferences in the structure and
function of the amygdala and
prefrontal cortex
Alan S. R. Fermin , Masamichi Sakagami ,Toko Kiyonari ,Yang Li ,Yoshie Matsumoto &
ToshioYamagishi
Social value orientations (SVOs) are economic preferences for the distribution of resources – prosocial
individuals are more cooperative and egalitarian than are proselfs. Despite the social and economic
implications of SVOs, no systematic studies have examined their neural correlates.We investigated the
amygdala and dorsolateral prefrontal cortex (DLPFC) structures and functions in prosocials and proselfs
by functional magnetic resonance imaging and evaluated cooperative behavior in the Prisoner’s
In everyday life, humans experience social dilemmas regarding whether to follow social norms and cooperate
with others at some personal cost or behave selfishly and maximize their own welfare. Social and economic stud-
ies have demonstrated that economic decisions are considerably influenced by individual differences in social
value orientation (SVO)1–6
, a social preference where individuals are classified as either prosocials or proselfs
based on weights they assign to the distribution of resources between oneself and others2–5
. Prosocials prefer a
distribution of resources in which they and their partners jointly earn the most. In contrast, proselfs prefer the
distribution that gives themselves the highest earnings, regardless of the partner’s payoff. SVO is consistently
related to behavior in economic games3,6
and relates to self-sacrifice in real-life social relations7
as well as dona-
tion to charity8
. Despite the strong implications of SVO on society, it has not yet been established whether these
decisional dispositions have distinct structural and functional representations in the brain.
A wealth of behavioral evidence demonstrates that humans use distinct decision-making strategies for self-
ish and prosocial behaviors. Normative prosocial behaviors such as fairness, cooperation, spontaneous giving,
and helping are increased by a number of factors that reduce deliberation, including the seriousness of social
decisions9
, cognitive load10
, priming intuition11
, and time pressure12,13
. In addition, prosocial decisions occur
significantly more quickly than selfish ones do12,13
, while subjects make more selfish choices when a time delay is
available for deliberation12,14
. These findings suggest that humans may have an initial automatic impulse to behave
prosocially that is sometimes overridden by deliberative processes necessary to implement selfish decisions.
Neuroscience studies support the existence of distinct neural networks for automatic and deliberative decision
strategies in humans and animals15,16
. Of special interest are the dorsolateral prefrontal cortex (DLPFC) and the
amygdala. The role of the DLPFC has been demonstrated in the control of deliberative behaviors such as strategic
decision-making17
, inference, and reasoning18,19
. On the other hand, the amygdala has been implicated in the con-
trol of automatic behaviors such as the expression of innate responses20
, the acquisition of conditioned reactions
to biologically significant stimuli21
, and has recently been implicated in automatic social decision processes22–24
.
Brain
School of Social Informatics, Aoyama Gakuin
r
A
P
OPEN
www.nature.com/scientificreports/
Figure 3. Correlation between amygdala and dorsolateral prefrontal cortex (DLPFC) gray matter volumes
with social value orientation (SVO) and cooperative behavior in the Prisoner’s Dilemma game. (a) Left
amygdala volume was significantly larger in prosocials than it was in proselfs (positive correlation with SVO,
66 voxels, x= −17, y= −9, z= −12, t= 2.61, P< 0.05 family-wise error (FWE) corrected) and (b,c) positively
www.nature.com/scientificreports/
人工知能は脳からさらに何を学ぶべきか
エネルギー効率
データ効率
l 世界モデルと脳内シミュレーション
l モジュール⾃⼰組織化
l メタ学習
⾃律性と社会性
(銅⾕, 松尾, Brain and Nerve, 2019; groups.oist.jp/ncu/publications)
C D
!"#$%&'(
Condition 1 Condition 3A Condition 2 Condition 3B
C D
!"#$%&'(
Condition 1 Condition 3A Condition 2 Condition 3B
C D
!"#$%&'(
14
16 18 20 22 24 26 28
15
20
25
30
Average mating performance (ts/mating)
Averagefitness
4 b.
6 b.
8 b.
10 b.
12 b.
14 b.
16 b.
R = −0.84
p < 0.0001
Figure 2. The Correlation between the average mating learning performance and the
average fitness in the final 20 generations in all experiments. The learning performance was
estimated as the number of time steps the mating behavior was selected divided with number of mating
events. The seven types of markers indicate the number of energy sources in the environment for each
simulation.
(a) (b)
Figure 3. Example trajectories of the learned behaviors for the roamer strategy and the
stayer strategy. (a) The roamer ignores the tail-lamp of the mating partner and executes the learned
foraging behavior to capture the energy source. (b) The stayer executes the learned waiting behavior
and adjusts its position according to the trajectory of the mating partner.
respectively). The remainder of differences were not significant
(Supplementary Table 2). Subsequently, we tested variability of
waiting time ratio among mice. We compared the obtained mixed
model with the model including a fixed effect of reward-delay
condition, but not a random effect of MI. To evaluate likelihood
Bayesian decision model of waiting. Can these effects of ser-
otonin on waiting, depending on the RP and timing uncertainty,
be explained in a coherent way? Here we consider the possibility
that serotonin signals the prior probability of reward delivery in a
Bayesian model of repeated decisions to wait or to quit. In this
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
0
10
20
30
40
50
60
70
Waiting time (s)
Waiting time (s)
Waiting time (s)
Waiting time (s)
NumberoftimesNumberoftimesNumberoftimesNumberoftimes
Delay 6 test
Delay 4-6-8 test
Delay 2-6-10 test
Delay 10 test
a
b
c
d
Serotonin no-activation
Serotonin activation
Serotonin no-activation
Serotonin activation
Serotonin no-activation
Serotonin activation
Serotonin no-activation
Serotonin activation
Fig. 5 Optogenetic activation of DRN serotonin neurons enhances waiting for temporally uncertain rewards. a Distribution of waiting time during omission
trials in the D6 test. b Distribution of waiting time during omission trials in the D4-6-8 test. c Distribution of waiting time during omission trials in the D2-
6-10 test. d Distribution of waiting time during omission trials in the D10 test. Orange circles illustrate the timing and number of food pellets presented in
rewarded trials. White circles denote omission trials
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04496-y
?
0.12
NormalizedmeanR/R
0
1
PPC: N = 43
Normalized
meanR/R
Normalized
likelihood
-10
15
Posterior
Goal distance (cm)
67 33 0
33 0 33 0
0.04
0
0
Goal distance (cm)
67 33 0 67 33 0 67 67
Sound
zone
a
Estimateddistance(cm)
Layer2
PM
PPC
Continuous Intermittent1 Intermittent2
b
67
33
0
33
0
Sound zoneSound zone
Decod
[0 0.0
Deco
謝辞
n 強化学習ロボット
l Paavo Parmas
l Jiexin Wang (ATR)
l 吉⽥尚⼈ (U Tokyo)
l Stefan Elfwing (ATR)
l 内部英治 (ATR)
l 森本淳 (ATR)
n ⾏動価値/線条体
l 伊藤真 (Progress Technologies)
l 吉澤知彦 (⽟川⼤)
l Charles Gerfen (NIH)
l 鮫島和⾏(⽟川⼤)
l ⽊村實(⽟川⼤)
l 上⽥康雅(関⻄医⼤)
n 2光⼦イメージング
l 船⽔章⼤ (CSHL)
l Bernd Kuhn
l Sergey Zobnin
l Yuzhe Li
n セロトニン
l 宮崎勝彦
l 宮崎佳代⼦ (JSPS)
l 濱⽥太陽
l ⽥中謙⼆(慶應⼤)
l ⼭中章宏(名古屋⼤)
n fMRI
l Alan Fermin(⽟川⼤)
l 吉⽥岳彦
l ⽥中沙織(ATR)
l Nicolas Schweighofer (USC)
l 吉本潤⼀郎(NAIST)
l 清⽔優
l 徳⽥智磯(NAIST
l ⼭脇成⼈(広島⼤)
l 川⼈光男(ATR)
n ⼤脳基底核−視床−⼤脳⽪質モデル
l 庄野修(HRI)
l 五⼗嵐潤(理研)
l Jan Moren
l Jean Leinard
l Calos Gutierrez
l Benoit Girard, Daphne Heraiz (UPMC)
l 橘吉寿(神⼾⼤)
l 南部篤(⽣理研)
l ⾼⽊周(東⼤)
n 新学術領域研究「予測と意思決定」
n 新学術領域研究「⼈⼯知能と脳科学」
n ポスト京萌芽的課題「全脳シミュレーションと⼈⼯知能」
n ⾰新的技術による脳機能ネットワークの全容解明プロジェクト
n 脳科学研究戦略推進プログラム
International Symposium on AI and Brain Science
n October 10–12, 2020, Online www.brain-ai.jp/symposium2020
59
Special Issue on
Artificial Intelligence and Brain Science
60
www.journals.elsevier.com/neural-networks/
n Timeline
l Manuscript submission due: January 10, 2021
l First review completed: March 31, 2021
l Revised manuscript due: May 31, 2021
l Final decisions to authors: June 30, 2021
n Guest Editors:
l Kenji Doya, Okinawa Institute of Science and Technology (ncus@oist.jp)
l Karl Friston, University College London
l Masashi Sugiyama, RIKEN AIP and The University of Tokyo IRCN
l Josh Tenenbaum, Massachusetts Institute of Technology
Neuro 2022
Japanese Neuroscience/Neurochemistry/Neural Network Societies
June 30–July 3, 2022 in Okinawa Convention Center
61

Mais conteúdo relacionado

Mais procurados

グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門
Kawamoto_Kazuhiko
 
パターン認識と機械学習入門
パターン認識と機械学習入門パターン認識と機械学習入門
パターン認識と機械学習入門
Momoko Hayamizu
 

Mais procurados (20)

[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
[DL輪読会]Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
自由エネルギー原理と視覚的意識 2019-06-08
自由エネルギー原理と視覚的意識 2019-06-08自由エネルギー原理と視覚的意識 2019-06-08
自由エネルギー原理と視覚的意識 2019-06-08
 
データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析データサイエンス概論第一 5 時系列データの解析
データサイエンス概論第一 5 時系列データの解析
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
因果推論を用いた 群衆移動の誘導における介入効果推定
因果推論を用いた 群衆移動の誘導における介入効果推定因果推論を用いた 群衆移動の誘導における介入効果推定
因果推論を用いた 群衆移動の誘導における介入効果推定
 
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究【論文調査】XAI技術の効能を ユーザ実験で評価する研究
【論文調査】XAI技術の効能を ユーザ実験で評価する研究
 
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
 
Solving Quantitative Reasoning Problems with Language Models
Solving Quantitative Reasoning Problems with Language ModelsSolving Quantitative Reasoning Problems with Language Models
Solving Quantitative Reasoning Problems with Language Models
 
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
 
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
第7回WBAシンポジウム:全脳確率的生成モデル(WB-PGM)〜世界モデルと推論に基づく汎用人工知能に向けて
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
【東工大・鈴木良郎】「画像生成用StyleGANの技術」を「3D形状の生成」に活用!! 新車のボディ形状を生成するAI
【東工大・鈴木良郎】「画像生成用StyleGANの技術」を「3D形状の生成」に活用!! 新車のボディ形状を生成するAI【東工大・鈴木良郎】「画像生成用StyleGANの技術」を「3D形状の生成」に活用!! 新車のボディ形状を生成するAI
【東工大・鈴木良郎】「画像生成用StyleGANの技術」を「3D形状の生成」に活用!! 新車のボディ形状を生成するAI
 
EEG analysis (nonlinear)
EEG analysis (nonlinear)EEG analysis (nonlinear)
EEG analysis (nonlinear)
 
Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)Curriculum Learning (関東CV勉強会)
Curriculum Learning (関東CV勉強会)
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
パターン認識と機械学習入門
パターン認識と機械学習入門パターン認識と機械学習入門
パターン認識と機械学習入門
 

Semelhante a 脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治

2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
pierstanislaopaolucc1
 
Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...
Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...
Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...
bigdatabm
 
epirobProceedings2009
epirobProceedings2009epirobProceedings2009
epirobProceedings2009
Takusen
 
Giuseppe Vitiello
Giuseppe VitielloGiuseppe Vitiello
Giuseppe Vitiello
agrilinea
 

Semelhante a 脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治 (20)

Computational neuropharmacology drug designing
Computational neuropharmacology drug designingComputational neuropharmacology drug designing
Computational neuropharmacology drug designing
 
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
2023-1113e-INFN-Seminari-Paolucci-BioInspiredSpikingLearningSleepCycles.pdf
 
Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...
Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...
Осадчий А.Е. Анализ многомерных магнито- и электроэнцефалографических данных ...
 
epirobProceedings2009
epirobProceedings2009epirobProceedings2009
epirobProceedings2009
 
Senior Thesis
Senior ThesisSenior Thesis
Senior Thesis
 
Giuseppe Vitiello
Giuseppe VitielloGiuseppe Vitiello
Giuseppe Vitiello
 
FYP
FYPFYP
FYP
 
Neural fields, a cognitive approach
Neural fields, a cognitive approachNeural fields, a cognitive approach
Neural fields, a cognitive approach
 
04.18.13
04.18.1304.18.13
04.18.13
 
Thesis
ThesisThesis
Thesis
 
Thesis
ThesisThesis
Thesis
 
Convolutional Networks
Convolutional NetworksConvolutional Networks
Convolutional Networks
 
Annintro
AnnintroAnnintro
Annintro
 
nature17435
nature17435nature17435
nature17435
 
Classification of EEG Signals for Brain-Computer Interface
Classification of EEG Signals for Brain-Computer InterfaceClassification of EEG Signals for Brain-Computer Interface
Classification of EEG Signals for Brain-Computer Interface
 
Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.
 
uantitative Morphometric Analysis of Immunochemistry Images of the Spinal Dor...
uantitative Morphometric Analysis of Immunochemistry Images of the Spinal Dor...uantitative Morphometric Analysis of Immunochemistry Images of the Spinal Dor...
uantitative Morphometric Analysis of Immunochemistry Images of the Spinal Dor...
 
Neural Netwrok
Neural NetwrokNeural Netwrok
Neural Netwrok
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
JACT 5-3_Christakis
JACT 5-3_ChristakisJACT 5-3_Christakis
JACT 5-3_Christakis
 

Mais de The Whole Brain Architecture Initiative

Mais de The Whole Brain Architecture Initiative (20)

第7回WBAシンポジウム:松嶋達也〜自己紹介と論点の提示〜スケーラブルなロボット学習システムに向けて
第7回WBAシンポジウム:松嶋達也〜自己紹介と論点の提示〜スケーラブルなロボット学習システムに向けて第7回WBAシンポジウム:松嶋達也〜自己紹介と論点の提示〜スケーラブルなロボット学習システムに向けて
第7回WBAシンポジウム:松嶋達也〜自己紹介と論点の提示〜スケーラブルなロボット学習システムに向けて
 
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
第7回WBAシンポジウム:予測符号化モデルとしての 深層予測学習とロボット知能化
 
第7回WBAシンポジウム:基調講演
第7回WBAシンポジウム:基調講演第7回WBAシンポジウム:基調講演
第7回WBAシンポジウム:基調講演
 
第7回WBAシンポジウム:WBAI活動報告
第7回WBAシンポジウム:WBAI活動報告第7回WBAシンポジウム:WBAI活動報告
第7回WBAシンポジウム:WBAI活動報告
 
BriCAプラットフォーム説明会(2022-05)
BriCAプラットフォーム説明会(2022-05)BriCAプラットフォーム説明会(2022-05)
BriCAプラットフォーム説明会(2022-05)
 
第3回WBAレクチャー:BRA評価
第3回WBAレクチャー:BRA評価第3回WBAレクチャー:BRA評価
第3回WBAレクチャー:BRA評価
 
第3回WBAレクチャー:BRAに基づく海馬体の確率的生成モデルの構築
第3回WBAレクチャー:BRAに基づく海馬体の確率的生成モデルの構築第3回WBAレクチャー:BRAに基づく海馬体の確率的生成モデルの構築
第3回WBAレクチャー:BRAに基づく海馬体の確率的生成モデルの構築
 
第3回WBAレクチャー:海馬体周辺におけるBRA駆動開発の進展
第3回WBAレクチャー:海馬体周辺におけるBRA駆動開発の進展第3回WBAレクチャー:海馬体周辺におけるBRA駆動開発の進展
第3回WBAレクチャー:海馬体周辺におけるBRA駆動開発の進展
 
第6回WBAシンポジウム:Humanity X.0 共生創発と情報の身体性
第6回WBAシンポジウム:Humanity X.0 共生創発と情報の身体性第6回WBAシンポジウム:Humanity X.0 共生創発と情報の身体性
第6回WBAシンポジウム:Humanity X.0 共生創発と情報の身体性
 
第6回WBAシンポジウム:人の手のひら AIの手のひら
第6回WBAシンポジウム:人の手のひら AIの手のひら第6回WBAシンポジウム:人の手のひら AIの手のひら
第6回WBAシンポジウム:人の手のひら AIの手のひら
 
第6回WBAシンポジウム:人間は動物を必要とするが、
AIは人間を必要とするか?
第6回WBAシンポジウム:人間は動物を必要とするが、
AIは人間を必要とするか?第6回WBAシンポジウム:人間は動物を必要とするが、
AIは人間を必要とするか?
第6回WBAシンポジウム:人間は動物を必要とするが、
AIは人間を必要とするか?
 
第6回WBAシンポジウム:脳参照アーキテクチャ 駆動開発からの AGI構築ロードマップ
第6回WBAシンポジウム:脳参照アーキテクチャ 駆動開発からの AGI構築ロードマップ第6回WBAシンポジウム:脳参照アーキテクチャ 駆動開発からの AGI構築ロードマップ
第6回WBAシンポジウム:脳参照アーキテクチャ 駆動開発からの AGI構築ロードマップ
 
第6回WBAシンポジウム:WBAI活動報告
第6回WBAシンポジウム:WBAI活動報告第6回WBAシンポジウム:WBAI活動報告
第6回WBAシンポジウム:WBAI活動報告
 
技術進展がもたらす進化戦略の終焉
技術進展がもたらす進化戦略の終焉技術進展がもたらす進化戦略の終焉
技術進展がもたらす進化戦略の終焉
 
The 5th WBA Hackathon Orientation -- Cerenaut Part
The 5th WBA Hackathon Orientation  -- Cerenaut PartThe 5th WBA Hackathon Orientation  -- Cerenaut Part
The 5th WBA Hackathon Orientation -- Cerenaut Part
 
Task Details of the 5th Whole Brain Architecture Hackathon
Task Details of the 5th Whole Brain Architecture HackathonTask Details of the 5th Whole Brain Architecture Hackathon
Task Details of the 5th Whole Brain Architecture Hackathon
 
Introduction to the 5th Whole Brain Architecture Hackathon Orientation
Introduction to the 5th Whole Brain Architecture Hackathon OrientationIntroduction to the 5th Whole Brain Architecture Hackathon Orientation
Introduction to the 5th Whole Brain Architecture Hackathon Orientation
 
WBAレクチャー#1BRAの審査と登録(山川宏)
WBAレクチャー#1BRAの審査と登録(山川宏)WBAレクチャー#1BRAの審査と登録(山川宏)
WBAレクチャー#1BRAの審査と登録(山川宏)
 
WBAレクチャー#1SCID法の実例 (布川絢子)
WBAレクチャー#1SCID法の実例 (布川絢子)WBAレクチャー#1SCID法の実例 (布川絢子)
WBAレクチャー#1SCID法の実例 (布川絢子)
 
WBAレクチャー#1脳機能の体系的理解を目指して(山川宏)
WBAレクチャー#1脳機能の体系的理解を目指して(山川宏)WBAレクチャー#1脳機能の体系的理解を目指して(山川宏)
WBAレクチャー#1脳機能の体系的理解を目指して(山川宏)
 

Último

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Último (20)

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

脳とAIの接点から何を学びうるのか@第5回WBAシンポジウム: 銅谷賢治

  • 1. 脳とAIの接点から何を学びうるのか 銅⾕ 賢治 doya@oist.jp 神経計算ユニット groups.oist.jp/ncu 沖縄科学技術⼤学院⼤学 第5回全脳アーキテクチャシンポジウム
  • 3. 脳科学と人工知能の共進化:視覚 脳科学 ⼈⼯知能 多層教師あり学習 (Amari 1967) ネオコグニトロン (Fukushima 1980) ConvNet(Krizhevsky, Sutskever, Hinton, 2012) GoogleBrain(2012) 場所細胞 (O’Keefe 1976) 顔細胞(Bruce, Desimone, Gross1981) (Sugase et al. 1999) e calculated the time from the number of indow of 50 ms, and th the x2 test13 (see mation analysis for the ron coded ‘significant’ he earliest information d rapidly, correspond- nse (beige histogram), 2 (monkey expression) peaks before levelling off. F1 (monkey identity) and F4 (human expression) were encoded at a much lower level than the others. The response latency of this neuron was 53 ms, and the latencies for the information about each category were as follows: G, 45; F1, 157; F2, 93; F3, 125; and F4, 125 ms. Thus, the earliest transient discharge conveyed global information, and the later sustained discharge encoded one category of the fine information best14 . Of the 86 face-responsive neurons, 11 neurons (13%) did not encode significant information in any category, 43 (50%) coded either global or fine category information, and 32 (37%) coded both categories. To clarify how single neurons encoded both global and D A B C D A B C D 1 2 3 c d HIPPOCAMPAL PLACE UNITS 87 WALL -213-4-j lRACK FIG. 2. Place fields for all place units except 21342 and those from animal 217. distributed around the maze. The concentration of fields from the other animals in arm B may have reflected the fact that many of the rats spent their “free time” in this arm. The fact that the initial search for units was conducted there might also have introduced a bias towards units active in that area. In any case, it was clear that the majority of fields were not located in those places which contained the rewards or other FIG. 3. Place fields for place units from animal 217. 経験依存学習 (Blakemore & Cooper 1970) RECEPTIVE FIELDS IN CAT STRIATE CORTEX 579 found by changing the size, shape and orientation of the stimulus until a clear response was evoked. Often when a region with excitatory or inhibitory responses was established the neighbouring opposing areas in the receptive field could only be demonstrated indirectly. Such an indirect method is illustrated in Fig. 3B, where two flanking areas are indicated by using a short slit in various positions like the hand of a clock, always including the very A B +7 -! mm -I - m_ aS~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~T T T Fig. 3. Same unit as in Fig. 2. A, responses to shinling a rectangular light spot, 1° x 8°; centre of slit superimposed on centre of receptive field; successive stimuli rotated clockwise, as shown to left of figure. B, responses to a 1° x 5° slit oriented in various directions, with one end always coveringthe centre ofthe receptive field: note that this central region evoked responses when stimulated alone (Fig. 2a). Stimulus and background intensities as in Fig. 1; stimulus duration 1 sec. centre of the field. The findings thus agree qualitatively with those obtained with a small spot (Fig. 2a). Receptive fields having a central area and opposing flanks represented a common pattern, but several variations were seen. Some fields had longnarrow central regions with extensive flanking areas (Figs. 1-3): others had a large central area and concentrated slit-shaped flanks (Figs. 6, 9, 10). In many fields the two flanking regions were asymmetrical, differing in size and shape; in these a given spot gave unequal responses in symmetrically corresponding 37 PHYSIO. CXL,VIIT) by guest on October 20, 2012jp.physoc.orgDownloaded from J Physiol ( 特徴抽出細胞 (Hubel & Wiesel 1959) パーセプトロン (Rosenblatt 1962)
  • 4. 脳科学と人工知能の共進化:強化学習 脳科学 ⼈⼯知能 0 1 delay)3s delay)6s delay)9s ** 古典的条件付け(Pavlov 1903) オペラント条件付け (Thorndike 1898) Deep Q network (Mnihet al. 2015)difficult and engaging for human players. We used the same network architecture, hyperparameter values (see Extended Data Table 1) and learningprocedurethroughout—takinghigh-dimensionaldata(210|160 colour video at 60 Hz) as input—to demonstrate that our approach robustly learns successful policies over a variety of games based solely We compared DQN with the best performing methods from the reinforcement learning literature on the 49 games where results were available12,15 .Inaddition tothe learned agents,wealsoreportscoresfor aprofessionalhumangamestesterplayingundercontrolledconditions and a policy that selects actions uniformly at random (Extended Data Convolution Convolution Fully connected Fully connected No input Figure 1 | Schematic illustration of the convolutional neural network. The details of the architecture are explained in the Methods. The input to the neural network consists of an 8438434 image produced by the preprocessing map w, followed by three convolutional layers (note: snaking blue line symbolizes sliding of each filter across input image) and two fully connected layers with a single output for each valid action. Each hidden layer is followed by a rectifier nonlinearity (that is, max 0,xð Þ). RESEARCH LETTER TD学習(Sutton et al. 1983) ドーパミン細胞の報酬 予測誤差応答 (Schultzetal.1993,1997) ドーパミン依存 シナプス可塑性 (Wickensetal.2000) W. SCHULTZ4 fails to occur, even in the absence of an immediately preced- ing stimulus (Fig. 2, bottom). This is observed when animals fail to obtain reward because of erroneous behavior, when liquid flow is stopped by the experimenter despite correct behavior, or when a valve opens audibly without delivering liquid (Hollerman and Schultz 1996; Ljungberg et al. 1991; Schultz et al. 1993). When reward delivery is delayed for 0.5 or 1.0 s, a depression of neuronal activity occurs at the regular time of the reward, and an activation follows the reward at the new time (Hollerman and Schultz 1996). Both responses occur only during a few repetitions until the new time of reward delivery becomes predicted again. By con- trast, delivering reward earlier than habitual results in an activation at the new time of reward but fails to induce a depression at the habitual time. This suggests that unusually early reward delivery cancels the reward prediction for the habitual time. Thus dopamine neurons monitor both the oc- currence and the time of reward. In the absence of stimuli immediately preceding the omitted reward, the depressions do not constitute a simple neuronal response but reflect an expectation process based on an internal clock tracking the precise time of predicted reward. Activation by conditioned, reward-predicting stimuli About 55–70% of dopamine neurons are activated by conditioned visual and auditory stimuli in the various classi- cally or instrumentally conditioned tasks described earlier (Fig. 2, middle and bottom) (Hollerman and Schultz 1996; Ljungberg et al. 1991, 1992; Mirenowicz and Schultz 1994; Schultz 1986; Schultz and Romo 1990; P. Waelti, J. Mire- nowicz, and W. Schultz, unpublished data). The first dopa- mine responses to conditioned light were reported by Miller et al. (1981) in rats treated with haloperidol, which increased the incidence and spontaneous activity of dopamine neurons but resulted in more sustained responses than in undrugged animals. Although responses occur close to behavioral reac- tions (Nishino et al. 1987), they are unrelated to arm and eye movements themselves, as they occur also ipsilateral toFIG. 2. Dopamine neurons report rewards according to an error in re- ward prediction. Top: drop of liquid occurs although no reward is predicted the moving arm and in trials without arm or eye movements at this time. Occurrence of reward thus constitutes a positive error in the (Schultz and Romo 1990). Conditioned stimuli are some- prediction of reward. Dopamine neuron is activated by the unpredicted what less effective than primary rewards in terms of response 線条体の⾏動価値表現 (Samejima et al. 2005) ⼤脳基底核TD学習仮説 (Bartoet al. 1995, Montague et al. 1996) ! V(s) Q(s,a)
  • 6.
  • 8. モデルフリー/モデルベースの行動 モデルフリー n 「⾏動の価値」を記憶 n 直感的・反射的⾏動選択 n やってみた経験から学ぶ 処理は単純/反復経験が必要 モデルベース n ⾏動の結果を予測する内部モデ ル n 先読みによる⾏動選択 l 「脳内シミュレーション」 n やってみる前に考える 柔軟な適応/処理は複雑
  • 10. 脳内シミュレーション ⾏動による⾝体や環境の状態変化の予測モデル s’=f(s,a) または P(s’|s,a) を使った認知⾏動機構 n 過去の状態と⾏動から、現在の状態を推定 l ノイズ、ディレイ、遮蔽への対処 n 現在の状態から、想定した⾏動の結果を予測 l モデルベース意思決定、⾏動計画、… n 想定したの状態から、⾏動の結果や原因を予測 l 思考、推論、⾔語、科学、…
  • 11. Model-based action planning involves cortico-cerebellar and basal ganglia networks Alan S. R. Fermin , , ,TakehikoYoshida , , JunichiroYoshimoto , , Makoto Ito , SaoriC.Tanaka & Kenji Doya , , , Humans can select actions by learning, planning, or retrieving motor memories. Reinforcement Learning (RL) associates these processes with three major classes of strategies for action selection: exploratory RL learns state-action values by exploration, model-based RL uses internal models to simulate future states reached by hypothetical actions, and motor-memory RL selects past successful state-action mapping. In order to investigate the neural substrates that implement these strategies, we conducted a functional magnetic resonance imaging (fMRI) experiment while humans performed a ventromedial prefrontal cortex and ventral striatum increased activity in the exploratory condition; the dorsolateral prefrontal cortex, dorsomedial striatum, and lateral cerebellum in the model-based condition; and the supplementary motor area, putamen, and anterior cerebellum in the motor-memory implements the model-based RL action selection strategy. Using exploration and reward feedback, humans and other animals have a remarkable capacity to learn new motor behaviors without explicit teaching1 . Throughout most of our lives, however, we depend on explicit or implicit knowledge, based upon past experiences, such as a map of the area or properties of the musculoskeletal system, to enable focused exploration and efficient learning2,3 . After repeated practice, a motor behavior becomes stereotyped and can be executed with little mental load4 . What brain mechanisms enable animals to employ different learning strategies and to select or integrate them in a given situation? In this paper, we take a new behavioral paradigm that captures different stages of motor learning during a single experimental session5 , and using fMRI we explore brain structures that are specifically involved in implementing different learning strategies. 6 R A P OPEN (Alan Fermin) 20 n 脳内シミュレーションには ⼤脳⽪質/基底核/⼩脳が関与 1 3 2 1 2 3 + dition 3 Condition 2 Condition 3B D Condition 1 Condition 3A Condition 2 Condition 3B C D Condition 1 Condition 3A Condition 2 Condition 3B C D
  • 13. n 頭頂葉ニューロン活動の ⼆光⼦顕微鏡による観測 n ⾳響仮想空間 l 間⽋的な感覚情報 n ニューロン集団の確率的デコーディング l 感覚情報がなくてもゴール位置を予測 ©2016NatureAmerica,Inc.Allrightsreserved. Animals have to act despite limited sensory information because of factors such as interfering background noise or occluded vision. Thus, the ability to estimate the current state of the outside world from a sequence of sensory observations and their own actions is essential. This process is optimally realized by dynamic Bayesian inference, such as a Kalman filter1, which predicts the state with an internal state transition model and updates the prediction with new sensory inputs. For example, when a mouse navigates in darkness, it must keep track of the location of its destination based on both its movement (predic- tion) and sensory signals such as calls from its nest mate (updating). We hypothesized that dynamic Bayesian inference is implemented in cerebral neocortex and investigated the plausibility of this idea using two-photon microscopy2. Most areas of the cerebral neocortex receive ascending sensory inputs (feedforward streams) and descending inputs (feedback 3 hypothesis that dynamic Bayesian inference is implemented in pyram- idal neurons of layers 2, 3 and 5, with increasing action dependence in deeper layers. To test this hypothesis, we trained mice to perform an auditory virtual navigation task and imaged neuronal activity in layers 2, 3 and 5 of the PPC and the PM located posterior to PPC11–13. The task required mice to approach a water reward site (goal) by estimating the distance on the basis of sound cues and their own locomotion. PPC is involved in spatial navigation by representing route maps, head directions, turning locations and locomotory accelerations with egocentric and allocentric representations14–17. PPC lesions disrupt navigation on the basis of self-motion information (path integra- tion)18,19. PM also represents egocentric and allocentric reference frames20. Both PPC and PM receive inputs from auditory cortex and secondary motor cortex (M2)21,22, but PM receives fewer feed- 13,23,24 Neural substrate of dynamic Bayesian inference in the cerebral cortex Akihiro Funamizu1,2, Bernd Kuhn2 & Kenji Doya1 Dynamic Bayesian inference allows a system to infer the environmental state under conditions of limited sensory observation. Using a goal-reaching task, we found that posterior parietal cortex (PPC) and adjacent posteromedial cortex (PM) implemented the two fundamental features of dynamic Bayesian inference: prediction of hidden states using an internal state transition model and updating the prediction with new sensory evidence. We optically imaged the activity of neurons in mouse PPC and PM layers 2, 3 and 5 in an acoustic virtual-reality system. As mice approached a reward site, anticipatory licking increased even when sound cues were intermittently presented; this was disturbed by PPC silencing. Probabilistic population decoding revealed that neurons in PPC and PM represented goal distances during sound omission (prediction), particularly in PPC layers 3 and 5, and prediction improved with the observation of cue sounds (updating). Our results illustrate how cerebral cortex realizes mental simulation using an action-dependent dynamic model. 0.12 NormalizedmeanR/R 0 1 PPC: N = 43 Normalized meanR/R Normalized likelihood -10 15 Posterior Goal distance (cm) 67 33 0 33 0 33 0 0.04 0 0 Goal distance (cm) 67 33 0 67 33 0 67 67 Sound zone a figure4 Hypothetical cortical algorithm of model-based state prediction with Bayesian inference Probabilistic population code in PPC 0327-mouse23: PPC Layer2 0416-mouse23: PM Layer2 Decoding_Bayes_160324_17_detail 0327-mouse23: State2(17),71 trial, --300, 200, -100, 0 0416-mouse23: State2(16), 132 trial, -200, -100, 0 Layer2 PPC Continuous Intermittent1 Intermittent2 b 0
  • 14. 脳の回路はなぜ柔軟に活性化/接続され得るのか? 計算理論 n ゲーティングの学習 (Jacobs, Jordan, 1991; Singh 1992) n 予測誤差による選択/統合 l MOSAIC architecture (Wolpert, Kawato, 1998; Sugimoto et al., 2012) n 不確かさによる選択/統合 l Bayesian cue integration (Deneve et al., 2001; Daw et al., 2005) 神経機構 n ⼤脳基底核によるゲーティング (Eliasmith et al., 2012) n 視床/前障 (claustrum) (Crick 1984; Crick, Koch 2005) n 樹状突起/軸索の抑制 (Wang & Yang, 2018) n 脳波/スパイク同期 (Malsburg; Singer) MOSAIC for Multiple-Reward Environments 581 Figure 1: Schematic diagram of MOSAIC-MR. Reward modules (top left) pre- dict the immediate reward (ˆri), and forward modules (bottom left) predict the local dynamics of environment (ˆ˙xj). Reward and forward modules decompose a complex environment into subenvironments on the basis of prediction errors (r − ˆri) and (˙x − ˆ˙xj), respectively. RL modules (displayed at right) output action (uk) appropriate for each subenvironment. The learning and control of each RL module are weighted based on TD error (δk), that is, on how well each RL controller predicts the expected discounted future rewards. modules is performed on the basis of the prediction errors of the reward and the dynamics, respectively. The RL module, which receives the reward and dynamics predictions (ˆr(t), ˆ˙x(t)) and the weights of each module in 80 Neurobiology of behavior Figure 4 Input pathway 2 ThalamusBasal Ganglia Output pathway 1 Output pathway 2 Input gating Output gating Sync. gating Recurrent gating Striatal/Thalamic gating Input pathway 1 STC DTC ITC Current Opinion in Neurobiology Various mechanisms for information gating in the brain. Input gating can be achieved by dendrite-targeting interneurons that selectively control inputs to pyramidal dendrites. In the synchronous gating mechanism, communication between two areas depends on the degree of temporal synchrony of neural activity between the source and target areas. Recurrent gating mechanism involves selective integration of inputs based on context-dependent dynamics of the network. Output gating is instantiated with perisoma-targeting interneurons that specifically inhibit pyramidal neurons projecting to one pathway but not others. Gating may also involve subcortical structures, especially basal ganglia and thalamus.
  • 15. 強化学習 n 報酬の予測: 価値関数 l V(s) = E[ r(t) + gr(t+1) + g2r(t+2)…| s(t)=s] l Q(s,a) = E[ r(t) + gr(t+1) + g2r(t+2)…| s(t)=s, a(t)=a] n ⾏動の選択 l greedy: a = argmax Q(s,a) l Boltzmann: P(a|s) µ exp[ b Q(s,a)] n 予測の修正: TD 誤差 l d(t) = r(t) + gV(s(t+1)) - V(s(t)) l DV(s(t)) = a d(t) l DQ(s(t),a(t)) = a d(t) これらのステップを実現する回路機構は? これらのパラメタの制御機構は?
  • 16. 将来報酬の割引率 g n ⼤きなg l 遠くの電池も取りに⾏く n ⼩さなg l 近くの電池しか取らない
  • 17. 報酬予測の時間スケール V(t) = E[ r(t) + gr(t+1) + g2r(t+2) + …] 割引係数 g: どれぐらい先の報酬まで予測するか 1 2 3 4 step -20 +100 -20 -20 1 0 1 2 3 4 step -20 +100 -20 -20 1 0 1 2 3 4 step +50 -100 1 0 1 2 3 4 step +50 -100 1 0 g 大 g 小 ついやってしまう 苦労してもやる! 危ないことはやらない やらないほうがまし V =18.7 V = -22.9 V =-25.1 V = 47.3 うつ病︖ 衝動性︖ セロトニン︖
  • 18. 神経修飾物質系とメタ学習 (Doya, 2002) n 脳幹から⼩脳,基底核,⼤脳⽪質に広範に投射 l 学習の⼤域変数/パラメタを制御? ドーパミン: 報酬予測誤差 d アセチルコリン: 学習速度 a ノルアドレナリン: 探索の絞り b セロトニン: 予測の時間スケール g
  • 19. セロトニンニューロンの光刺激 (Miyazaki et al., 2014, Current Biology) n エサ待ち課題(3, 6, 9, ∞秒) l ⻩⾊光:刺激なし: 12.1秒 l ⻘⾊光:刺激あり: 20.8秒
  • 21. ロボットの報酬系の設計 報酬とは何か?… ⽣物にとって最も重要な機能のため n ⾷欲、痛み:⾃⼰保存 n 愛情、性欲:⾃⼰複製 Cyber Rodents l 電池パックの捕獲と充電 l ⾚外線でのソフトのコピー
  • 23. ロボットの学習のしかたの進化 n ⼀⽣(約5分)の間に交配して遺伝⼦を残す l 交配の成功は、充電レベルに⽐例 (Elfwing et al. 2011) Robots Virtual agents 15-25 Population w1, w2, …, wn Genes Weights for top layer NN Weights shaping rewards Meta-parameters v1, v2, …, vn αγλτkτ 0
  • 25. 行動戦略の多型性 (Elfwing et al. 2015) n 貪⾷ロボと好⾊ロボ n 進化的安定性 Average mating performance (ts/mating) Figure 2. The Correlation between the average mating learning performance and the average fitness in the final 20 generations in all experiments. The learning performance was estimated as the number of time steps the mating behavior was selected divided with number of mating events. The seven types of markers indicate the number of energy sources in the environment for each simulation. (a) (b) Figure 3. Example trajectories of the learned behaviors for the roamer strategy and the stayer strategy. (a) The roamer ignores the tail-lamp of the mating partner and executes the learned foraging behavior to capture the energy source. (b) The stayer executes the learned waiting behavior and adjusts its position according to the trajectory of the mating partner. 15 −0.5 0 0.5 1 1.6 2 2.4 2.8 w1 w5 Roamers Stayers (a) 0 0.2 0.4 0.6 0.8 1 0% 4% 8% 12% ¯Em Percentofpopulation Roamers Stayers (b) 80% 100% e Roamers Stayers 顔への重み 交配⾏動のバイアス 交配⾏動を取る充電レベル 貪⾷型 Forager 好⾊型 Tracker Foragers Trackers Foragers Trackers ⽐率 0 0.25 0.5 0.75 1 10 11 12 Stayer proportion Average#matingevents Roamers Stayers Population (a) 0.25 0.5 0.75 0.25 0.5 0.75 Stayer proportion ¯Mr→s/ ¯Mr ¯Ms→s/ ¯Ms ¯Mr/ ¯Ms (b) 0 0.25 0.5 0.75 1 0.6 0.7 0.8 0.9 1 Averagematingenergylevel Stayer proportion Roamers Stayers (c) 0 0.25 0.5 0.75 1 16 17 18 19 20 21 Stayer proportion Averagefitness Roamers Stayers (d) Figure 5. Average number of number of mating events, average proportion of mating events with stayer mating partners, average energy level at the mating events, and average fitness, as functions of the stayer proportion in the population, for the roamer (green solid lines with circles) and stayer (red solid lines with circles) subpopulations. (a) The dotted lines show the best linear fit for the two subpopulations and the black line shows average values for the population as a whole. (b) The dotted lines show the best linear fit for the two subpopulations and the black line shows average ratio of the number of roamer mating events to the number of stayer mating events. (c) The dotted lines show the constant approximations as the average values over all phenotype proportions. (d) The dotted lines show the estimated fitness values using Equations 6 and 8. Tracker proportion Foragers Trackers
  • 27. futureoflife.org/ai-principles 53 Research Issues 1) Research Goal: The goal of AI research should be to create not undirected intelligence, but beneficial intelligence. 2) Research Funding: Investments in AI should be accompanied by funding for research on ensuring its beneficial use, including thorny questions in computer science, economics, law, ethics, and social studies, such as: How can we make future AI systems highly robust, so that they do what we want without malfunctioning or getting hacked? How can we grow our prosperity through automation while maintaining people’s resources and purpose? How can we update our legal systems to be more fair and efficient, to keep pace with AI, and to manage the risks associated with AI? What set of values should AI be aligned with, and what legal and ethical status should it have? 3) Science-Policy Link: There should be constructive and healthy exchange between AI researchers and policy-makers. 4) Research Culture: A culture of cooperation, trust, and transparency should be fostered among researchers and developers of AI. 5) Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards. Ethics and Values 6) Safety: AI systems should be safe and secure throughout their operational lifetime, and verifiably so where applicable and feasible. 7) Failure Transparency: If an AI system causes harm, it should be possible to ascertain why. 8) Judicial Transparency: Any involvement by an autonomous system in judicial decision-making should provide a satisfactory explanation auditable by a competent human authority. 9) Responsibility: Designers and builders of advanced AI systems are stakeholders in the moral implications of their use, misuse, and actions, with a responsibility and opportunity to shape those implications. 10) Value Alignment: Highly autonomous AI systems should be designed so that their goals and behaviors can be assured to align with human values throughout their operation. 11) Human Values: AI systems should be designed and operated so as to be compatible with ideals of human dignity, rights, freedoms, and cultural diversity. 12) Personal Privacy: People should have the right to access, manage and control the data they generate, given AI systems’ power to analyze and utilize that data. 13) Liberty and Privacy: The application of AI to personal data must not unreasonably curtail people’s real or perceived liberty. 14) Shared Benefit: AI technologies should benefit and empower as many people as possible. 15) Shared Prosperity: The economic prosperity created by AI should be shared broadly, to benefit all of humanity. 16) Human Control: Humans should choose how and whether to delegate decisions to AI systems, to accomplish human-chosen objectives. 17) Non-subversion: The power conferred by control of highly advanced AI systems should respect and improve, rather than subvert, the social and civic processes on which the health of society depends. 18) AI Arms Race: An arms race in lethal autonomous weapons should be avoided. Longer-term Issues 19) Capability Caution: There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities. 20) Importance: Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources. 21) Risks: Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning and mitigation efforts commensurate with their expected impact. 22) Recursive Self-Improvement: AI systems designed to recursively self-improve or self-replicate in a manner that could lead to rapidly increasing quality or quantity must be subject to strict safety and control measures. 23) Common Good: Superintelligence should only be developed in the service of widely shared ethical ideals, and for the benefit of all humanity rather than one state or organization.
  • 28. 人間社会に学ぶ n ⼈間は地球上で最も危険な⽣物 ⺠主主義という知恵:個⼈や特定の集団に無制限の権⼒を託さない n 政治 l 公選任期制 l 三権分⽴ l 地⽅⾃治 n 経済 l 独占禁⽌法 l スト権 n 科学 l 相互評価 (peer review) 複数のオープンソース説明可能AIエージェントによる相互評価と連携
  • 29. 社会的価値志向性と扁桃体/前頭前野 merica,Inc.Allrightsreserved. B R I E F COM MU N I CAT I ON S Individual social value orientation1 determines behavior in economic games and many real-life situations2. Prosocials are defined as those who like to maximize the sum of resources for the self and the other, while simultaneously minimizing the difference between the two. By contrast, individualists like to maximize resources for the self, while competitors like to maximize the difference between the two. Methods). In the magnetic resonance imaging (MRI) scanner, the subjects were presented with a pair of rewards for the self and the other. They were asked to evaluate the desirability of the reward pair on four different levels (least preferable (1) to most preferable (4)) by a button press. Each subject’s evaluation was linearly regressed with the reward for the subject (Rs), the reward for the other (Ro) and the absolute value of their difference (Da) as implicated in the theory of social value orientation1, yielding coefficients WRs, WRo and WD, respectively (Supplementary Fig. 1). The two groups differed markedly in their response to the absolute reward difference, WD (t34 = 6.68, P < 0.00001; Fig. 1c). The prosocials disliked large absolute differences in distributions (inequity aversion), whereas the individualists were unaffected by such differences. In contrast, individualists preferred higher self rewards (WRs; t34 = 6.54, P < 0.00001; Fig. 1c), but were indifferent to other rewards (WRo; Fig. 1c). To identify the neural substrates corresponding to these behavio- ral differences, we conducted a regression analysis of functional MRI (fMRI) data at the time of reward pair presentation with three explana- tory variables: the reward for the subject (Rs), the reward for the other (Ro) and the absolute value of their difference (Da) using statistical parameter mapping9 (Supplementary Methods) and looked for brain structures whose activity distinguished between prosocials and indi- Activity in the amygdala elicited by unfair divisions predicts social value orientation Masahiko Haruno1,2 & Christopher D Frith3,4 ‘Social value orientation’ characterizes individual differences in anchoring attitudes toward the division of resources. Here, by contrasting people with prosocial and individualistic orientations using functional magnetic resonance imaging, we demonstrate that degree of inequity aversion in prosocials is predictable from amygdala activity and unaffected by cognitive load. This result suggests that automatic emotional processing in the amygdala lies at the core of prosocial value orientation. ©2009NatureAmerica,Inc.Allrightsreserved. B R I E F COM MU N I CAT I ON S Individual social value orientation1 determines behavior in economic games and many real-life situations2. Prosocials are defined as those who like to maximize the sum of resources for the self and the other, while simultaneously minimizing the difference between the two. By contrast, individualists like to maximize resources for the self, while competitors like to maximize the difference between the two. The question we address here is whether prosocial attitudes depend upon deliberate top-down control of selfish impulses3 or on automatic inequity aversion4, and what their neural substrates are. In the former case, we would expect prosocial atti- tudes to be associated with greater prefrontal activity3, whereas the insula5,6 or amygdala7,8 would be involved in the latter case. We used a behavioral task to identify peo- ple with different social value orientations. Methods). In the magnetic resonance imaging (MRI) scanner, the subjects were presented with a pair of rewards for the self and the other. They were asked to evaluate the desirability of the reward pair on four different levels (least preferable (1) to most preferable (4)) by a button press. Each subject’s evaluation was linearly regressed with the reward for the subject (Rs), the reward for the other (Ro) and the absolute value of their difference (Da) as implicated in the theory of social value orientation1, yielding coefficients WRs, WRo and WD, respectively (Supplementary Fig. 1). The two groups differed markedly in their response to the absolute reward difference, WD (t34 = 6.68, P < 0.00001; Fig. 1c). The prosocials disliked large absolute differences in distributions (inequity aversion), whereas the individualists were unaffected by such differences. In contrast, individualists preferred higher self rewards (WRs; t34 = 6.54, P < 0.00001; Fig. 1c), but were indifferent to other rewards (WRo; Fig. 1c). To identify the neural substrates corresponding to these behavio- ral differences, we conducted a regression analysis of functional MRI (fMRI) data at the time of reward pair presentation with three explana- tory variables: the reward for the subject (Rs), the reward for the other (Ro) and the absolute value of their difference (Da) using statistical parameter mapping9 (Supplementary Methods) and looked for brain structures whose activity distinguished between prosocials and indi- vidualists (see Supplementary Tables 1 (Da) and 2 (Rs and Ro)). Activity in the amygdala elicited by unfair divisions predicts social value orientation Masahiko Haruno1,2 & Christopher D Frith3,4 ‘Social value orientation’ characterizes individual differences in anchoring attitudes toward the division of resources. Here, by contrasting people with prosocial and individualistic orientations using functional magnetic resonance imaging, we demonstrate that degree of inequity aversion in prosocials is predictable from amygdala activity and unaffected by cognitive load. This result suggests that automatic emotional processing in the amygdala lies at the core of prosocial value orientation. a b c (36,177) 20.0 ± 2.0 S Self (yen) Other (yen) 1 100 100 2 110 60 3 100 20 –0.020 Prosocials Individualists WD 0.005 0 –0.005 –0.010 –0.015 0.02 Other 200 100 200100 Self Feedback ... x 36 Trials Reward conomic as those he other, the two. the self, the two. s depend 0.00001; Fig. 1c). The prosocials disliked large absolute differences in distributions (inequity aversion), whereas the individualists were unaffected by such differences. In contrast, individualists preferred higher self rewards (WRs; t34 = 6.54, P < 0.00001; Fig. 1c), but were indifferent to other rewards (WRo; Fig. 1c). To identify the neural substrates corresponding to these behavio- ral differences, we conducted a regression analysis of functional MRI (fMRI) data at the time of reward pair presentation with three explana- tory variables: the reward for the subject (Rs), the reward for the other (Ro) and the absolute value of their difference (Da) using statistical parameter mapping9 (Supplementary Methods) and looked for brain structures whose activity distinguished between prosocials and indi- vidualists (see Supplementary Tables 1 (Da) and 2 (Rs and Ro)). ng, we als is ognitive cessing ation. design and behavior. (a) Triple-dominance measure task. Subjects chose one of three prosocial (1), individualistic (2) and competitive (3) distributions of money between the nknown other. 90 yen $1. (b) Reward pair evaluation task. (c) Comparison of model’s b (36,177) Trial 20.0 ± 2.0 S Self (yen) Other (yen) 100 100 110 60 100 20 20 Prosocials Individualists Prosocials Individualists Prosocials Individualists 05 0 05 10 15 02 01 0 0 01 01 Other 200 100 200 16.0 ± 2.0 13.0 ± 2.0 Self 36 Self 36 Other 177 Other 177 100 Self Beep Reward presentation Feedback ... x 36 Trials Subject's total reward Total: 36 Subject's reward Other's reward 0.0 2.0 Rating 1–4 Reward ©2009NatureAmerica,Inc.Allrightsreserved. B R I E F COM MU N I CAT I ON S The dorsal amygdala10 was the only area where the groups differed in the correlation of activity with the absolute value of reward difference Da (Fig. 2a, P < 0.001; uncorrected). This activity was positively correlated with Da in prosocials but slightly negatively correlated in individualists. The amygdala activity increased only in prosocials when the absolute value of reward difference Da was large (Fig.2b, significantly positive 5 s after the reward pair onset (P < 0.01); and Fig.2c). The same effect was seen in the left amygdala with weaker significance (P < 0.005, uncor- rected; Supplementary Fig.2). In addition, the activity in the amygdala Twenty-four prosocials and eight individualists were instructed to evaluate the reward pair and press a corresponding button as soon as possible. They were also required to memorize a five-digit number sequence before the presentation of each reward pair, and their mem- ory was probed after each evaluation. A high-load, random number sequence was contrasted with a low-load, fixed number sequence (01234). There were no differences between the groups in their per- formance of the memory tasks (Supplementary Fig. 9). As in the imaging study, the prosocials showed inequity aversion, whereas the individualists did not (F1,30 = 12.98, P = 0.0006). However, there was no effect of cognitive load on inequity aversion for the prosocials (t23 = 0.08; P = 0.94). In contrast, the individualists became slightly more competitive under cognitive load, showing greater dislike of higher reward to the other (paired t-test; t7 = 1.94, P < 0.05; Supplementary Fig. 10). Evaluation times were also unaffected by cognitive load (Supplementary Fig. 10). These results suggest that prosocial value orientation is driven by an intuitive aversion for the iniquitable division of resources. Our find- ings highlight an important role for automatic intuitive processing in social interaction, in addition to the slow strategic processes that were the focus of previous studies5–7,15 using interactive games. Note: Supplementary information is available on the Nature Neuroscience website. ACKNOWLEDGMENTS We thank S. Tada and Y. Furukawa for technical assistance. This research was supported by the Strategic Research Program for Brain Sciences (SRPBS) and Tamagawa University Global Center of Excellence grant from the Ministry of Figure 2 Difference between prosocials and individualists in the correlation of brain activity with the absolute value of reward difference Da. (a) Coronal section of the right amygdala, where the two groups showed a significant difference in correlation with the absolute value of reward difference Da (Montreal Neurological Institute coordinates 20, –2, –12). (b) BOLD signal increase at the amygdala peak during 12 trials with the largest Da values (solid line) and 12 trials with the smallest Da values (dotted line) for prosocials and individualists. Time 0 specifies the presentation of reward pair. Error bars, s.e. over subjects. (c) Left: beta values of each prosocial (gray) and individualist (black) in correlation with Da. Right: correlation of each subject’s beta value and WD (the subject’s dislike of Da) (gray, prosocials; black, individualists). t-test for a regression coefficient significantly different from 0. a b c –2 –1 0 1 2 3 4 × 10–3 × 10–3 –0.03 –0.02 –0.01 0 0.01 –2 0 2 4 Prosocials Individualists Betavalue Subjects WD R = 0.12 P = 0.69 Betavalue R = –0.45 P = 0 .029 –1 1 3Signalincrease(%) Prosocials Individualists Small Da 0 0.1 0 2 4 6 8 10 12 –0.1 Scan time (s) 0 2 4 6 8 10 12 0 0.1 –0.1 Large Da Small Da Large Da 0 1 2 3 4 Scan time (s) 1Scientific RepoRts Representation of economic preferences in the structure and function of the amygdala and prefrontal cortex Alan S. R. Fermin , Masamichi Sakagami ,Toko Kiyonari ,Yang Li ,Yoshie Matsumoto & ToshioYamagishi Social value orientations (SVOs) are economic preferences for the distribution of resources – prosocial individuals are more cooperative and egalitarian than are proselfs. Despite the social and economic implications of SVOs, no systematic studies have examined their neural correlates.We investigated the amygdala and dorsolateral prefrontal cortex (DLPFC) structures and functions in prosocials and proselfs by functional magnetic resonance imaging and evaluated cooperative behavior in the Prisoner’s In everyday life, humans experience social dilemmas regarding whether to follow social norms and cooperate with others at some personal cost or behave selfishly and maximize their own welfare. Social and economic stud- ies have demonstrated that economic decisions are considerably influenced by individual differences in social value orientation (SVO)1–6 , a social preference where individuals are classified as either prosocials or proselfs based on weights they assign to the distribution of resources between oneself and others2–5 . Prosocials prefer a distribution of resources in which they and their partners jointly earn the most. In contrast, proselfs prefer the distribution that gives themselves the highest earnings, regardless of the partner’s payoff. SVO is consistently related to behavior in economic games3,6 and relates to self-sacrifice in real-life social relations7 as well as dona- tion to charity8 . Despite the strong implications of SVO on society, it has not yet been established whether these decisional dispositions have distinct structural and functional representations in the brain. A wealth of behavioral evidence demonstrates that humans use distinct decision-making strategies for self- ish and prosocial behaviors. Normative prosocial behaviors such as fairness, cooperation, spontaneous giving, and helping are increased by a number of factors that reduce deliberation, including the seriousness of social decisions9 , cognitive load10 , priming intuition11 , and time pressure12,13 . In addition, prosocial decisions occur significantly more quickly than selfish ones do12,13 , while subjects make more selfish choices when a time delay is available for deliberation12,14 . These findings suggest that humans may have an initial automatic impulse to behave prosocially that is sometimes overridden by deliberative processes necessary to implement selfish decisions. Neuroscience studies support the existence of distinct neural networks for automatic and deliberative decision strategies in humans and animals15,16 . Of special interest are the dorsolateral prefrontal cortex (DLPFC) and the amygdala. The role of the DLPFC has been demonstrated in the control of deliberative behaviors such as strategic decision-making17 , inference, and reasoning18,19 . On the other hand, the amygdala has been implicated in the con- trol of automatic behaviors such as the expression of innate responses20 , the acquisition of conditioned reactions to biologically significant stimuli21 , and has recently been implicated in automatic social decision processes22–24 . Brain School of Social Informatics, Aoyama Gakuin r A P OPENRepresentation of economic preferences in the structure and function of the amygdala and prefrontal cortex Alan S. R. Fermin , Masamichi Sakagami ,Toko Kiyonari ,Yang Li ,Yoshie Matsumoto & ToshioYamagishi Social value orientations (SVOs) are economic preferences for the distribution of resources – prosocial individuals are more cooperative and egalitarian than are proselfs. Despite the social and economic implications of SVOs, no systematic studies have examined their neural correlates.We investigated the amygdala and dorsolateral prefrontal cortex (DLPFC) structures and functions in prosocials and proselfs by functional magnetic resonance imaging and evaluated cooperative behavior in the Prisoner’s In everyday life, humans experience social dilemmas regarding whether to follow social norms and cooperate with others at some personal cost or behave selfishly and maximize their own welfare. Social and economic stud- ies have demonstrated that economic decisions are considerably influenced by individual differences in social value orientation (SVO)1–6 , a social preference where individuals are classified as either prosocials or proselfs based on weights they assign to the distribution of resources between oneself and others2–5 . Prosocials prefer a distribution of resources in which they and their partners jointly earn the most. In contrast, proselfs prefer the distribution that gives themselves the highest earnings, regardless of the partner’s payoff. SVO is consistently related to behavior in economic games3,6 and relates to self-sacrifice in real-life social relations7 as well as dona- tion to charity8 . Despite the strong implications of SVO on society, it has not yet been established whether these decisional dispositions have distinct structural and functional representations in the brain. A wealth of behavioral evidence demonstrates that humans use distinct decision-making strategies for self- ish and prosocial behaviors. Normative prosocial behaviors such as fairness, cooperation, spontaneous giving, and helping are increased by a number of factors that reduce deliberation, including the seriousness of social decisions9 , cognitive load10 , priming intuition11 , and time pressure12,13 . In addition, prosocial decisions occur significantly more quickly than selfish ones do12,13 , while subjects make more selfish choices when a time delay is available for deliberation12,14 . These findings suggest that humans may have an initial automatic impulse to behave prosocially that is sometimes overridden by deliberative processes necessary to implement selfish decisions. Neuroscience studies support the existence of distinct neural networks for automatic and deliberative decision strategies in humans and animals15,16 . Of special interest are the dorsolateral prefrontal cortex (DLPFC) and the amygdala. The role of the DLPFC has been demonstrated in the control of deliberative behaviors such as strategic decision-making17 , inference, and reasoning18,19 . On the other hand, the amygdala has been implicated in the con- trol of automatic behaviors such as the expression of innate responses20 , the acquisition of conditioned reactions to biologically significant stimuli21 , and has recently been implicated in automatic social decision processes22–24 . Brain School of Social Informatics, Aoyama Gakuin r A P OPEN www.nature.com/scientificreports/ Figure 3. Correlation between amygdala and dorsolateral prefrontal cortex (DLPFC) gray matter volumes with social value orientation (SVO) and cooperative behavior in the Prisoner’s Dilemma game. (a) Left amygdala volume was significantly larger in prosocials than it was in proselfs (positive correlation with SVO, 66 voxels, x= −17, y= −9, z= −12, t= 2.61, P< 0.05 family-wise error (FWE) corrected) and (b,c) positively www.nature.com/scientificreports/
  • 30. 人工知能は脳からさらに何を学ぶべきか エネルギー効率 データ効率 l 世界モデルと脳内シミュレーション l モジュール⾃⼰組織化 l メタ学習 ⾃律性と社会性 (銅⾕, 松尾, Brain and Nerve, 2019; groups.oist.jp/ncu/publications) C D !"#$%&'( Condition 1 Condition 3A Condition 2 Condition 3B C D !"#$%&'( Condition 1 Condition 3A Condition 2 Condition 3B C D !"#$%&'( 14 16 18 20 22 24 26 28 15 20 25 30 Average mating performance (ts/mating) Averagefitness 4 b. 6 b. 8 b. 10 b. 12 b. 14 b. 16 b. R = −0.84 p < 0.0001 Figure 2. The Correlation between the average mating learning performance and the average fitness in the final 20 generations in all experiments. The learning performance was estimated as the number of time steps the mating behavior was selected divided with number of mating events. The seven types of markers indicate the number of energy sources in the environment for each simulation. (a) (b) Figure 3. Example trajectories of the learned behaviors for the roamer strategy and the stayer strategy. (a) The roamer ignores the tail-lamp of the mating partner and executes the learned foraging behavior to capture the energy source. (b) The stayer executes the learned waiting behavior and adjusts its position according to the trajectory of the mating partner. respectively). The remainder of differences were not significant (Supplementary Table 2). Subsequently, we tested variability of waiting time ratio among mice. We compared the obtained mixed model with the model including a fixed effect of reward-delay condition, but not a random effect of MI. To evaluate likelihood Bayesian decision model of waiting. Can these effects of ser- otonin on waiting, depending on the RP and timing uncertainty, be explained in a coherent way? Here we consider the possibility that serotonin signals the prior probability of reward delivery in a Bayesian model of repeated decisions to wait or to quit. In this 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 0 10 20 30 40 50 60 70 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 0 10 20 30 40 50 60 70 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 0 10 20 30 40 50 60 70 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 0 10 20 30 40 50 60 70 Waiting time (s) Waiting time (s) Waiting time (s) Waiting time (s) NumberoftimesNumberoftimesNumberoftimesNumberoftimes Delay 6 test Delay 4-6-8 test Delay 2-6-10 test Delay 10 test a b c d Serotonin no-activation Serotonin activation Serotonin no-activation Serotonin activation Serotonin no-activation Serotonin activation Serotonin no-activation Serotonin activation Fig. 5 Optogenetic activation of DRN serotonin neurons enhances waiting for temporally uncertain rewards. a Distribution of waiting time during omission trials in the D6 test. b Distribution of waiting time during omission trials in the D4-6-8 test. c Distribution of waiting time during omission trials in the D2- 6-10 test. d Distribution of waiting time during omission trials in the D10 test. Orange circles illustrate the timing and number of food pellets presented in rewarded trials. White circles denote omission trials ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04496-y ? 0.12 NormalizedmeanR/R 0 1 PPC: N = 43 Normalized meanR/R Normalized likelihood -10 15 Posterior Goal distance (cm) 67 33 0 33 0 33 0 0.04 0 0 Goal distance (cm) 67 33 0 67 33 0 67 67 Sound zone a Estimateddistance(cm) Layer2 PM PPC Continuous Intermittent1 Intermittent2 b 67 33 0 33 0 Sound zoneSound zone Decod [0 0.0 Deco
  • 31. 謝辞 n 強化学習ロボット l Paavo Parmas l Jiexin Wang (ATR) l 吉⽥尚⼈ (U Tokyo) l Stefan Elfwing (ATR) l 内部英治 (ATR) l 森本淳 (ATR) n ⾏動価値/線条体 l 伊藤真 (Progress Technologies) l 吉澤知彦 (⽟川⼤) l Charles Gerfen (NIH) l 鮫島和⾏(⽟川⼤) l ⽊村實(⽟川⼤) l 上⽥康雅(関⻄医⼤) n 2光⼦イメージング l 船⽔章⼤ (CSHL) l Bernd Kuhn l Sergey Zobnin l Yuzhe Li n セロトニン l 宮崎勝彦 l 宮崎佳代⼦ (JSPS) l 濱⽥太陽 l ⽥中謙⼆(慶應⼤) l ⼭中章宏(名古屋⼤) n fMRI l Alan Fermin(⽟川⼤) l 吉⽥岳彦 l ⽥中沙織(ATR) l Nicolas Schweighofer (USC) l 吉本潤⼀郎(NAIST) l 清⽔優 l 徳⽥智磯(NAIST l ⼭脇成⼈(広島⼤) l 川⼈光男(ATR) n ⼤脳基底核−視床−⼤脳⽪質モデル l 庄野修(HRI) l 五⼗嵐潤(理研) l Jan Moren l Jean Leinard l Calos Gutierrez l Benoit Girard, Daphne Heraiz (UPMC) l 橘吉寿(神⼾⼤) l 南部篤(⽣理研) l ⾼⽊周(東⼤) n 新学術領域研究「予測と意思決定」 n 新学術領域研究「⼈⼯知能と脳科学」 n ポスト京萌芽的課題「全脳シミュレーションと⼈⼯知能」 n ⾰新的技術による脳機能ネットワークの全容解明プロジェクト n 脳科学研究戦略推進プログラム
  • 32. International Symposium on AI and Brain Science n October 10–12, 2020, Online www.brain-ai.jp/symposium2020 59
  • 33. Special Issue on Artificial Intelligence and Brain Science 60 www.journals.elsevier.com/neural-networks/ n Timeline l Manuscript submission due: January 10, 2021 l First review completed: March 31, 2021 l Revised manuscript due: May 31, 2021 l Final decisions to authors: June 30, 2021 n Guest Editors: l Kenji Doya, Okinawa Institute of Science and Technology (ncus@oist.jp) l Karl Friston, University College London l Masashi Sugiyama, RIKEN AIP and The University of Tokyo IRCN l Josh Tenenbaum, Massachusetts Institute of Technology
  • 34. Neuro 2022 Japanese Neuroscience/Neurochemistry/Neural Network Societies June 30–July 3, 2022 in Okinawa Convention Center 61