5. 背景
• 創薬や材料化学の分野における最適分⼦の探索において、物性は重要な情報
– DFT(Density Functional Theory)などによる近似がよく利⽤される
– ⾮常に計算コストが⼤きく、⼗分な探索ができない課題があった
• 機械学習モデルで物性を⾼速かつ正確に予測できると有⽤
– DFTで得たデータを教師データとして、機械学習で物性を予測するタスクが近年盛んに
⾏われている
5
ssing for Quantum Chemistry
1
Patrick F. Riley 2
Oriol Vinyals 3
George E. Dahl 1
-
-
-
k
e
e
d
f
t
f
l
m
DFT
103
seconds
Message Passing Neural Net
10 2
seconds
E,!0, ...
Targets
Figure 1. A Message Passing Neural Network predicts quantum
図引⽤: Gilmer+, ICML2017
8. 関連研究
• Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017]
– ノードの次数が不規則なグラフに対して有効な特徴抽出法をGilmerらが⼀般化
– 各層では、各ノードに割り当てられた特徴ベクトルを、隣接するノードやエッジの
特徴ベクトルを使って更新する
– 上記をL層繰り返すと各ノードの特徴ベクトルはL近傍のノードやエッジの情報を反映した
ものとなる
8
ntum Chemistry
Oriol Vinyals 3
George E. Dahl 1
DFT
103
seconds
Message Passing Neural Net
10 2
seconds
E,!0, ...
Targets
Message Passing Neural Network predicts quantum
f an organic molecule by modeling a computationally
DFT calculation.
図引⽤: Gilmer+, ICML2017
9. 関連研究
• Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017]
– Message passing phase
• Message function: 𝑀"(ℎ%
" , ℎ'
" , 𝑒%')
– 各ノードが隣接するノードに対して伝搬させる情報を作成する
• Update function: 𝑈" ℎ%
" , 𝑚%
"-.
– 各ノードが隣接するノードから情報を貰い、⾃分⾃⾝の情報を更新する
9
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
11. 関連研究
• CNN for Learning Molecular Fingerprints [Duvenaud+, NIPS2015]
– Message passing phase
• Message function: 𝑀" ℎ%
" , ℎ'
" , 𝑒%' = 𝑐𝑜𝑛𝑐𝑎𝑡(ℎ'
" , 𝑒%')
• Update function: 𝑈" ℎ%
" , 𝑚%
"-. = 𝜎 𝐻"
ABC %
𝑚%
"-.
– 𝐻"
ABC (%)
: step 𝑡、頂点 𝑣 における次数 deg ( 𝑣) ごとに準備された重み
11
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
13. 関連研究
• Gated Graph Neural Networks (GG-NN) [Li+, ICLR2016]
– Message passing phase
– Message function: 𝑀" ℎ%
"
, ℎ'
"
, 𝑒%' = 𝐴N'ℎ'
"
» 𝐴N': エッジの種類(単結合、⼆重結合、etc.)ごとに定義された重み
– Update function: 𝑈" ℎ%
"
, 𝑚%
"-.
= GRU ℎ%
"
, 𝑚%
"-.
13
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
15. 関連研究
• Deep Tensor Neural Networks (DTNN) [Schütt+, Nature2017]
– Message passing phase
• Message function: 𝑀" ℎ%
" , ℎ'
" , 𝑒%' = tanh 𝑊Z[ 𝑊[Zℎ
" + 𝑏. ⊙ 𝑊_Z 𝑒% + 𝑏`
– 𝑊Z[
, 𝑊[Z
, 𝑊_Z
: それぞれ共有重み、𝑏., 𝑏`: バイアス項
• Update function: 𝑈" ℎ%
" , 𝑚%
"-. = ℎ%
" + 𝑚%
"-.
15
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
17. 関連研究
• Edge Network + Set2Set (enn-s2s) [Gilmer+, ICML2017]
– Message passing phase
• Message function: 𝑀" ℎ%
" , ℎ'
" , 𝑒%' = 𝐴(𝑒%)ℎ'
"
– 𝐴(𝑒%): エッジベクトル 𝑒% を変換するNN
• Update function: 𝑈" ℎ%
" , 𝑚%
"-. = GRU ℎ%
" , 𝑚%
"-.
– GGNN [Li+, ICLR2016] と同様
17
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
18. 関連研究
• Edge Network + Set2Set (enn-s2s) [Gilmer+, ICML2017]
– Readout phase
• Readout function: 𝑅 ℎ%
3
𝑣 ∈ 𝐺 = set2set ℎ%
3
𝑣 ∈ 𝐺
– set2set [Vinyals+, ICLR2016] によって作成された𝑞"
∗
を後のNNの⼊⼒にする
– 他にも⼊⼒特徴の作り⽅などに⼯夫あり
18
whilst preserving the right properties which we just discussed: a memory that increases with the
size of the set, and which is order invariant. In the next sections, we explain such a modification,
which could also be seen as a special case of a Memory Network (Weston et al., 2015) or Neural
Turing Machine (Graves et al., 2014) – with a computation flow as depicted in Figure 1.
4.2 ATTENTION MECHANISMS
Neural models with memories coupled to differentiable addressing mechanism have been success-
fully applied to handwriting generation and recognition (Graves, 2012), machine translation (Bah-
danau et al., 2015a), and more general computation machines (Graves et al., 2014; Weston et al.,
2015). Since we are interested in associative memories we employed a “content” based attention.
This has the property that the vector retrieved from our memory would not change if we randomly
shuffled the memory. This is crucial for proper treatment of the input set X as such. In particular,
our process block based on an attention mechanism uses the following:
qt = LSTM(q⇤
t 1) (3)
ei,t = f(mi, qt) (4)
ai,t =
exp(ei,t)
P
j exp(ej,t)
(5)
rt =
X
i
ai,tmi (6)
q⇤
t = [qt rt] (7)
Read
Process Write
Figure 1: The Read-Process-and-Write model.
where i indexes through each memory vector mi (typically equal to the cardinality of X), qt is
a query vector which allows us to read rt from the memories, f is a function that computes a
single scalar from mi and qt (e.g., a dot product), and LSTM is an LSTM which computes a
recurrent state but which takes no inputs. q⇤
is the state which this LSTM evolves, and is formed
図引⽤: Vinyals+, ICLR2016
21. 提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
21
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å µk 30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
22. 提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
22
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å µk 30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
Filter-generating Networks
𝜇. = 0.1Å, 𝜇` = 0.2Å, … 𝜇qWW = 30Å ,
𝛾 = 10Åでrbfカーネルを300個⽤意
⇓
𝑑ghに最も近い𝜇を持つカーネルは
1に近づき、遠ざかるに従い0に近づく
(ソフトな1-hot表現が得られる)
23. 提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
23
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å µk 30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
Filter-generating Networks
出⼒ベクトルとノードのembed vector
の要素積を取る
⇓
各ユニットのactivationでノードの
embed vectorをフィルタする
24. 提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離情報を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
24
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å µk 30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
25. 提案⼿法: Interaction block
• cfconv layerを含むmessage passing layer
– cfconv layerでノード間の相互作⽤を考慮して各ノードの特徴ベクトルを更新
– ノード距離に制限無く相互作⽤を表現することが可能(DTNNなどとの相違点)
25chNet with an architectural overview (left), the interaction block (middle)
convolution with filter-generating network (right). The shifted softplus is
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
26. 提案⼿法: SchNet
• interaction, atom-wise を挟み、最終的に各原⼦ごとに1次元のスカラー値を
出⼒する
• 出⼒されたスカラー値を原⼦数分⾜し合わせて分⼦全体の予測結果を得る
26
Figure 2: Illustration of SchNet with an architectural overview (left), the interaction block (middle)
and the continuous-filter convolution with filter-generating network (right). The shifted softplus is
defined as ssp(x) = ln(0.5ex
+ 0.5).
27. 提案⼿法: Loss
• ⼆種類のロスを⾜した損失関数を定義
–
• エネルギーの予測に関する⼆乗誤差
–
• 原⼦間⼒の予測に関する⼆乗誤差を原⼦毎に求め、⾜し合わせたもの
• 𝜌: 原⼦間⼒を重要視する度合いを表すハイパーパラメータ
• 原⼦間⼒の予測値は下記により計算で求める [Chmiela+, 2017]
27
110,462 0.31 – 0.45 0.33
We include the total energy E as well as forces Fi in the training loss to train a neural network that
performs well on both properties:
`( ˆE, (E, F1, . . . , Fn)) = kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
This kind of loss has been used before for fitting a restricted potential energy surfaces with MLPs [36].
In our experiments, we use ⇢ = 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined
energy and force training. The value of ⇢ was optimized empirically to account for different scales of
energy and forces.
Due to the relation of energies and forces reflected in the model, we expect to see improved gen-
eralization, however, at a computational cost. As we need to perform a full forward and backward
pass on the energy model to obtain the forces, the resulting force model is twice as deep and, hence,
requires about twice the amount of computation time.
Even though the GDML model captures this relationship between energies and forces, it is explicitly
optimized to predict the force field while the energy prediction is a by-product. Models such as
circular fingerprints [15], molecular graph convolutions or message-passing neural networks[19] for
property prediction across chemical compound space are only concerned with equilibrium molecules,
i.e., the special case where the forces are vanishing. They can not be trained with forces in a similar
manner, as they include discontinuities in their predicted potential energy surface caused by discrete
binning or the use of one-hot encoded bond type information.
0.34 0.84 – –
0.31 – 0.45 0.33
as well as forces Fi in the training loss to train a neural network that
es:
. , Fn)) = kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
before for fitting a restricted potential energy surfaces with MLPs [36].
= 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined
value of ⇢ was optimized empirically to account for different scales of
s and forces reflected in the model, we expect to see improved gen-
putational cost. As we need to perform a full forward and backward
btain the forces, the resulting force model is twice as deep and, hence,
nt of computation time.
captures this relationship between energies and forces, it is explicitly
field while the energy prediction is a by-product. Models such as
cular graph convolutions or message-passing neural networks[19] for
mical compound space are only concerned with equilibrium molecules,
y predictions in kcal/mol on the QM9 data set with given
NN [18] enn-s2s [19] enn-s2s-ens5 [19]
0.94 – –
0.84 – –
– 0.45 0.33
forces Fi in the training loss to train a neural network that
kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
fitting a restricted potential energy surfaces with MLPs [36].
5 for pure energy based training and ⇢ = 100 for combined
was optimized empirically to account for different scales of
es reflected in the model, we expect to see improved gen-
cost. As we need to perform a full forward and backward
rces, the resulting force model is twice as deep and, hence,
utation time.
his relationship between energies and forces, it is explicitly
e the energy prediction is a by-product. Models such as
h convolutions or message-passing neural networks[19] for
ound space are only concerned with equilibrium molecules,
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å µk 30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆFi(Z1, . . . , Zn, r1, . . . , rn) =
@ ˆE
@ri
(Z1, . . . , Zn, r1, . . . , rn). (4)
Chmiela et al. [17] pointed out that this leads to an energy-conserving force-field by construction.
As SchNet yields rotationally invariant energy predictions, the force predictions are rotationally
equivariant by construction. The model has to be at least twice differentiable to allow for gradient
descent of the force loss. We chose a shifted softplus ssp(x) = ln(0.5ex
+ 0.5) as non-linearity
29. 実験: QM9
• DFTで算出された分⼦の17種の物性値を含むデータセット
– そのうち⼀つの物性: U0(絶対零度での分⼦全体のエネルギー)のみを予測対象とする
– 平衡状態で原⼦間⼒はゼロであり、予測する必要が無い
• ⽐較⼿法
– DTNN [Schütt+, Nature2017], enn-s2s [Gilmer+, ICML2017],
enn-s2s-ens5(enn-s2sのアンサンブル)
• 実験結果
– SchNetが⼀貫してSOTAの結果を⽰した
– 訓練データ 110k 件でMean Absolute Errorが 0.31kcal/mol だった
29
Table 1: Mean absolute errors for energy predictions in kcal/mol on the QM9 data set with given
training set size N. Best model in bold.
N SchNet DTNN [18] enn-s2s [19] enn-s2s-ens5 [19]
50,000 0.59 0.94 – –
100,000 0.34 0.84 – –
110,462 0.31 – 0.45 0.33
We include the total energy E as well as forces Fi in the training loss to train a neural network that
performs well on both properties:
⇢
nX @ ˆE
! 2
30. 実験: MD17
• Molecular Dynamics (MD) シミュレーションを⾏ったデータセット
– ⼀つの分⼦(ベンゼンなど)に関する軌跡データ
• 8種の分⼦に関してデータを取り、別タスクとして学習する
• 同じ分⼦でもサンプルによって位置やエネルギー、原⼦間⼒が異なる
– 分⼦全体のエネルギー、原⼦間⼒をそれぞれ予測し、Mean Absolute Errorで評価
• ⽐較⼿法
– DTNN [Schütt+, Nature2017], GDML [Chmiela+, 2017]
30
• 実験結果
– N=1,000
• 多くのタスクでGDMLが上回った
• GDMLはカーネル回帰ベースのモデルであり、サンプル数 /
分⼦のノード数の⼆乗に⽐例して計算量が増加するため
N=50,000は学習できなかった
– N=50,000
• 多くのタスクでSchNetがDTNNを上回っている
• SchNetは(GDMLと⽐べて)スケーラビリティに優れており、
データ数の増加に従い精度も改善された
Table 2: Mean absolute errors for energy and force predictions in kcal/mol and kcal/mol/Å, respec-
tively. GDML and SchNet test errors for training with 1,000 and 50,000 examples of molecular
dynamics simulations of small, organic molecules are shown. SchNets were trained only on energies
as well as energies and forces combined. Best results in bold.
N = 1,000 N = 50,000
GDML [17] SchNet DTNN [18] SchNet
forces energy both energy energy both
Benzene
energy 0.07 1.19 0.08 0.04 0.08 0.07
forces 0.23 14.12 0.31 – 1.23 0.17
Toluene
energy 0.12 2.95 0.12 0.18 0.16 0.09
forces 0.24 22.31 0.57 – 1.79 0.09
Malonaldehyde
energy 0.16 2.03 0.13 0.19 0.13 0.08
forces 0.80 20.41 0.66 – 1.51 0.08
Salicylic acid
energy 0.12 3.27 0.20 0.41 0.25 0.10
forces 0.28 23.21 0.85 – 3.72 0.19
Aspirin
energy 0.27 4.20 0.37 – 0.25 0.12
forces 0.99 23.54 1.35 – 7.36 0.33
Ethanol
energy 0.15 0.93 0.08 – 0.07 0.05
forces 0.79 6.56 0.39 – 0.76 0.05
Uracil
energy 0.11 2.26 0.14 – 0.13 0.10
forces 0.24 20.08 0.56 – 3.28 0.11
Naphtalene
energy 0.12 3.58 0.16 – 0.20 0.11
forces 0.23 25.36 0.58 – 2.58 0.11
31. 実験: ISO17
• Molecular Dynamics (MD) シミュレーションを⾏ったデータセット
– C7O2H10の異性体129種類に関する軌跡データ
• MD17とは違い、別の分⼦のデータが同じタスクとして含まれる
– 2種のタスクを⽤意
• known molecules / unknown conformation:
– テストデータに既知の分⼦・未知の⽴体配座を利⽤
• unknown molecules / unknown conformation:
– テストデータに未知の分⼦・未知の⽴体配座を利⽤
– ⽐較⼿法
• mean predictor (訓練データの分⼦毎の平均?)
31
• 結果
– known molecules / unknown conformation
• energy+forcesはQM9での精度に匹敵
– unknown molecules / unknown conformation
• energy+forcesはenergyのみよりも優れていた
– 原⼦間⼒を学習に加えることは、単⼀の分⼦にフィット
しているわけではなく、化合物空間全体で⼀般化されていた
– known moleculesと⽐べると精度に隔たりがあり、
さらなる改善が必要
Table 3: Mean absolute errors on C7O2H10 isomers in kcal/mol.
mean predictor SchNet
energy energy+forces
known molecules / energy 14.89 0.52 0.36
unknown conformation forces 19.56 4.13 1.00
unknown molecules / energy 15.54 3.11 2.40
unknown conformation forces 19.15 5.71 2.18
Table 1: Mean absolute errors for energy predictions in kcal/mol on the QM9 data set with given
training set size N. Best model in bold.
N SchNet DTNN [18] enn-s2s [19] enn-s2s-ens5 [19]
50,000 0.59 0.94 – –
100,000 0.34 0.84 – –
110,462 0.31 – 0.45 0.33
We include the total energy E as well as forces Fi in the training loss to train a neural network that
performs well on both properties:
`( ˆE, (E, F1, . . . , Fn)) = kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
This kind of loss has been used before for fitting a restricted potential energy surfaces with MLPs [36].
In our experiments, we use ⇢ = 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined
energy and force training. The value of ⇢ was optimized empirically to account for different scales of
energy and forces.
Due to the relation of energies and forces reflected in the model, we expect to see improved gen-
eralization, however, at a computational cost. As we need to perform a full forward and backward
33. References
• SchNet
– Schütt, Kristof, et al. "SchNet: A continuous-filter convolutional neural network for modeling
quantum interactions." Advances in Neural Information Processing Systems. 2017.
• MPNN variants
– Gilmer, Justin, et al. "Neural message passing for quantum chemistry." In Proceedings of the
34th International Conference on Machine Learning, pages 1263–1272, 2017.
– Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular
fingerprints." Advances in neural information processing systems. 2015.
– Li, Yujia, Tarlow, Daniel, Brockschmidt, Marc, and Zemel, Richard. Gated graph sequence
neural networks. ICLR, 2016.
– Schütt, Kristof T., et al. "Quantum-chemical insights from deep tensor neural networks." Nature
communications 8 (2017): 13890.
• Others
– Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. "Order matters: Sequence to sequence for
sets." ICLR, 2016.
– Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., & Müller, K. R. (2017).
Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3(5),
e1603015.
33