1. Seminar:
Deep
networks
for
learning
molecular
representation
PRESENTER:
HAI
NGUYEN
INSTITUTE
OF
INFORMATION
TECHNOLOGY
VIETNAM
ACAD.
OF
SCI.
&
TECH.,
VIETNAM
01/08/2017 1
3. Research
topic
q Goal:
predicting
properties
of
molecules
by
Deep
Networks
E.g.,
Prediction
of
toxicity
of
a
molecule
qConventional
approach
E.g.,
Morgan
fingerprint
(aka:
ECFP)
-‐Fingerprint
is
a
feature
vector
to
encode
which
substructures
are
present
in
the
molecule
qEnd-‐to-‐end
learning
is
Better???
E.g.,
Molecular
Convolutional
Neural
Networks
-‐data-‐driven
features
are
more
interpretable
and
efficient
Fingerprint
Feature
extraction
End-‐to-‐end
learnable
fingerprint
01/08/2017 3
4. Related
works
q D.
Duvenaud et
al,
CNNs
on
Graphs
for
Learning
Molecular
Fingerprint
(NIPS2015)
qM.
Defferrardet
al,
CNNs on
Graphs
with
Fast Localized
Spectral
Filtering
(NIPS2016)
qKipf et
al,
Semi-‐supervised
Classification
with
Graph
Convolutional
Networks
(ICLR2017)
q J.
Gilmer
et
al,
Neural
Message
Passing
for
Quantum
Chemistry
(ICML2017)
01/08/2017 4
5. Related
works
q D.
Duvenaud et
al,
CNNs
on
Graphs
for
Learning
Molecular
Fingerprint
(NIPS2015)
qM.
Defferrardet
al,
CNNs on
Graphs
with
Fast Localized
Spectral
Filtering
(NIPS2016)
qKipf et
al,
Semi-‐supervised
Classification
with
Graph
Convolutional
Networks
(ICLR2017)
q J.
Gilmer
et
al,
Neural
Message
Passing
for
Quantum
Chemistry
(ICML2017)
01/08/2017 5
6. [CNNs
on
Graphs
for
Learning
Molecular
Fingerprint
(NIPS2015)]
Contribution
q provide
an
end-‐to-‐end
learning
framework
to
learn
fingerprint
with
better
predictive
performance,
the
inputs
are
graphs
with
arbitrary
size
and
shape
q Efficient
computation
1. Fixed
fingerprint
must
be
large
to
encode
all
possible
substructures
2. Neural
fingerprint
can
be
learned
to
encode
relevant
features
for
classification-‐>
reduce
the
size
qNeural
fingerprint
is
more
interpretable-‐>
meaningful
01/08/2017 6
14. Proposed
improvement
q Disadvantage
of
NFP:
oConsider
equally
different
bond
types
to
different
neighboring
atoms
oE.g.,
C-‐O
#
C=O
,
etc
oSoftmax output
of
all
substructruresare
averaged
o(Assumption:
properties
of
molecules
are
determined
by
very
few
subgraphs)
N
CC
C
O
+
W
H
NN
+
r1
rN
ra
v
ra
i
f
output
01/08/2017 14
16. Experiments
and
Results
q Data
sets:
Toxic21 (train:
10K,
test:
296),
Solubility
log
Mol/L
(#
samples:
1100)
q Goal:
comparison
of
ANFP
with
original
NFP
q Configuration:
same
for
two
methods
(NFP:
100x100),
MLP(100x100)
q Implementation:
Chainer,
Opt:
Adam.
Acc (%)
NFP 91.58
proposed 92.35
RMSE
NFP 0.64±0.05
proposed 0.53±0.06
Toxic21 Solubility
logMol/L
01/08/2017 16
17. [Message
Passing
Neural
Networks
for
quantum
chemistry]
q accepted
at
ICML2017
qA
general
framework
for
supervised
learning
for
graph
structured
data
q It
abstracts
the
commonalities
between
existing
neural
models
for
graph
qEasy
to
understand
the
general
ideas
of
different
proposed
models
and
come
up
with
new
variations
suitable
for
specific
data
type
01/08/2017 17
19. [Message
Passing
Neural
Networks
for
quantum
chemistry]
Forward
consists
2
phases:
q Message
Passing
1. Message
function
𝑚"
#$%
= ' 𝑀#(ℎ"
#
, ℎ,
#
, 𝑒",
#
)
,∈0(")
2. Update
function
ℎ"
#$%
= 𝑈#(ℎ"
#
, 𝑚"
#$%
)
q Readout
𝑦 = 𝑅({ℎ"
#
|𝑣 ∈ 𝐺})
ℎ,
#
ℎ"
#
𝑀#(ℎ"
#
, ℎ,
#
, 𝑒",
#
)
Message
passing
at
t-‐th step
Repeat
T
times
Note:
all
these
functions
are
learnable
and
differentiable
and
can
be
learned
by
backpropagation
01/08/2017 19
20. [[Message
Passing
Neural
Networks
for
quantum
chemistry]
[CNNs
for
learning
Molecular
Fingerprint]
is
a
specific
case
of
MPNN
q Message
Passing
1. Message
function
𝑀# ℎ"
#
, ℎ,
#
, 𝑒",
#
= ℎ,
#
2. Update
function
𝑈# ℎ"
#
, 𝑚"
#$%
= 𝜎 𝐻#
;<= "
𝑚"
#$%
q Readout
𝑦 = 𝑓(' 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊#ℎ"
# )
",#
)
01/08/2017 20
21. Next
work:
[distributed
representation
learning
for
molecules
(Mol2vec)]
q Motivation:
Ø learning
molecule
representation
without
labeled
or
with
limited
number
of
samples
ØUseful
for
many
tasks,
e.g.,
Kernel
learning
for
graph
structured
data
Proposed
idea:
based
on
word2vector (used
in
NLP)
and
Neural
Message
Passing
(NMP)
q Skip-‐gram
model
for
word2vec
and
doc2vec
𝑤F
𝑤FG#
𝑤FG%
𝑤F$%
𝑤F$#
𝑚𝑎𝑥 ' ' log Pr
(𝑤F$N|𝑤F)
#
NOG#,P
𝑑𝑜𝑐
𝑤%
𝑤S
𝑤N
𝑤T
𝑚𝑎𝑥 ' ' logPr
(𝑤N |𝑑𝑜𝑐)
#
,U∈VWFVWF
Correspondence
Ø Docs
<-‐>
molecules
Ø Atoms
<-‐>
words
Ø How
about
substructures???
word2vec doc2vec
01/08/2017 21
22. [distributed
representation
learning
for
molecules]
Objective
function
𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 ' ' logPr
(𝑠𝑢𝑏N |𝑚𝑜𝑙)
^_`U ∈a
bcdae"ea
aO%
= 𝑎𝑟𝑔𝑚𝑎𝑥 ' ' log
exp
(𝑣bWa
i
𝑣^_`N)
∑ exp
(𝑣bWa
i
𝑣^_`k
)caa
l^_`U∈a
bcdae"ea
aO%
Where
𝜃 are
parameters
to
be
trained
(vector
representation
of
molecules,
atoms,
and
substructures
as
well)
C
C
O
N
[N,
C,
C,
O]
Level
1
[N-‐C,
C-‐C,
C=O]
Level
2
[N-‐C-‐C,
C-‐C=O]
Level
3
etc
Model
for
learning
molecule
representation
Two
questions:
1. How
to
represent
substructures,
atoms,
and
molecules
in
vector
form?
2. How
to
maximize
the
above
objective
function?
(i.e.,
calculate
the
denominator
with
huge
number
of
possible
subgraphs)
01/08/2017 22
23. [distributed
representation
learning
for
molecules]
1. How
to
represent
substructures,
atoms,
and
molecules
in
vector
form?
Using
message
function
and
update
function
Level
1:
Atoms’
representation
Level
2:
substructures’
representation
Level
3:
substructures’
representation
with
larger
coverage
01/08/2017 23
Level
4:
Cover
the
whole
graph
Consider
this
atom
24. [distributed
representation
learning
for
molecules]
Given
representation
vectors
of
atoms,
how
to
represent
substructures
????
ℎc
#
01/08/2017 24
ℎ`
#
ℎF
#
ℎm
#
25. [distributed
representation
learning
for
molecules]
Given
representation
vectors
of
atoms,
how
to
represent
substructures
????
Message
and
update:
ℎm
#$%
= 𝑓(ℎm
#$%
+ ℎc
#
𝑊mc + ℎ`
#
𝑊m`+ℎF
#
𝑊mF)
Where
𝑓 is
non-‐linear
function
ℎc
#
01/08/2017 25
ℎ`
#
ℎF
#
ℎm
#
𝑊mc
𝑊m`
𝑊mF
26. [distributed
representation
learning
for
molecules]
Given
representation
vectors
of
atoms,
how
to
represent
substructures
????
Message
and
update:
ℎm
#$%
= 𝑓(ℎm
#$%
+ ℎc
#
𝑊mc + ℎ`
#
𝑊m`+ℎF
#
𝑊mF)
ℎc
#
01/08/2017 26
ℎ`
#
ℎF
#
ℎm
#
𝑊mc
𝑊m`
𝑊mF
At
this
step,
representation
at
atom
r
represent
the
subgraph (r,
a,
b,
c)
with
root
r
By
doing
so,
we
can
represent
any
substructure
based
on
atoms’
representation
27. [distributed
representation
learning
for
molecules]
2. How
to
maximize
the
above
objective
function?
(i.e.,
calculate
the
denominator
with
huge
number
of
possible
subgraphs)
Objective
function:
𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 ∑ ∑ log
<op
("qrs
t
"uvwU)
∑ <op
("qrs
t
"uvwk
)xss
k
^_`U∈a
bcdae"ea
aO%
Consider:
∑ log
<op
("qrs
t
"uvwU)
∑ <op
("qrs
t
"uvwk
)xss
k
^_`U∈a =∑ 𝑣bWa
i
𝑣^_`N − log∑ exp
(𝑣bWa
i
𝑣^_`k
)caa
l^_`U∈a
01/08/2017 27
Computationally
expensive
28. [distributed
representation
learning
for
molecules]
2. How
to
maximize
the
above
objective
function?
(i.e.,
calculate
the
denominator
with
huge
number
of
possible
subgraphs)
Objective
function:
𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 ∑ ∑ log
<op
("qrs
t "uvwU )
∑ <op
("qrs
t "uvwk
)xss
k
^_`U∈a
bcdae"ea
aO%
Consider:
∑ log
<op
("qrs
t "uvwU )
∑ <op
("qrs
t "uvwk
)xss
k
^_`U∈a =∑ 𝑣bWa
i
𝑣^_`N − log∑ exp
(𝑣bWa
i
𝑣^_`k
)caa
l^_`U∈a
Solution:
a) Compute
gradient
and
then
use
MCMC
to
obtain
approximate
gradients,
e.g.,
(adaptive)
importance
sampling
-‐
>
but
not
trivial
to
define
the
proposal
distribution
for
subgraphs (><)
b) Use
negative
sampling
-‐>
maybe
good
because
it
is
easy
to
sample
incorrect
subgraphs not
present
in
a
molecule
by
comparing
vector
representations.
(^^)
-‐>
You
do
not
need
to
compare
graph
which
is
NP-‐hard
problem
01/08/2017 28
29. Conclusion
q What
I
have
done:
v Covered
some
problems
and
solutions
on
application
of
CNNs
for
molecules.
vProposed
simple
ideas
to
improve
the
NFP
models
vProposed
supervised
variational models
for
predicting
molecules’
properties.
However,
this
is
lack
of
theoretical
correctness
-‐>
gave
up
vProposed
a
simple
unsupervised
learning
model
for
learning
vector
representation
for
molecules
q What
next?
v Implement
the
proposed
model
and
experiment
on
some
data
sets.
01/08/2017 29
31. [CNNs
with
fast
localized
spectral
filter]
q Convolution
on
graphs
1. Graph
Fourier
Transform:
𝑥 −> 𝑥{ = 𝑈i
𝑥
where
𝑈 = 𝑢%, … , 𝑢T is
a
set
of
eigenvectors
of
Laplacian 𝐿 (i.e.,
𝐿 = 𝑈Λ𝑈i
)
2. Convolution
with
𝜃: 𝑥{ −> 𝜃⨀𝑥{
3. Inverse
Graph
Fourier:
𝜃⨀𝑥{ → 𝑈(𝜃⨀𝑥{)
In
short,
convolution
process
with
filter
𝜃 can
be
summarized
as:
𝑥 −> 𝑈 𝜃⨀𝑈i
𝑥 = 𝑈𝑑𝑖𝑎𝑔(𝜃)𝑈i
𝑥
01/08/2017 31
32. [CNNs
with
fast
localized
spectral
filter
(2)]
q Convolution
with
filter
𝜃
q Replace
𝑑 𝑖𝑎𝑔(𝜃) with
eigenvalues
of
𝐿 = 𝑈Λ𝑈i
,
obtaining:
𝑥 −> 𝑈Λ𝑈i
𝑥 = 𝐿𝑥
q In
general,
𝑥 → 𝑔ƒ 𝐿 𝑥 = (' 𝜃„ 𝐿„
)𝑥
…G%
„O†
Where
𝜃
is
the
parameters
to
be
learnt
01/08/2017 32
33. [CNNs
with
fast
localized
spectral
filter
(3)]
q Given
signal
x,
the
filtered
signal
y
is
determined
by
y = 𝑔ƒ 𝐿 𝑥 = (∑ 𝜃„ 𝐿„
)𝑥…G%
„O† = 𝜃𝑥̅ Where
𝜃 = 𝜃†, … , 𝜃…G%
q 𝜃 can
be
learnt
by
applying
chain
rule
01/08/2017 33
34. [Message
Passing
Neural
Networks]
[CNN
for
graph
with
fast
localized
spectral
filtering]
is
a
specific
case
of
MPNN
q Message
Passing
1. Message
function
𝑀# ℎ"
#
,ℎ,
#
, 𝑒",
#
= 𝐶",
#
ℎ,
#
Where
matrices
𝐶",
#
are
parameterized
by
the
eigenvectors
of
the
graph
laplacian L
2. Update
function
𝑈# ℎ"
#
, 𝑚"
#$%
= 𝜎 𝑚"
#$%
01/08/2017 34