Seminar

Seminar:
Deep
networks

for
learning
molecular

representation
PRESENTER:
HAI
NGUYEN
INSTITUTE
OF
INFORMATION
TECHNOLOGY
VIETNAM
ACAD.
OF
SCI.
&
TECH.,
VIETNAM
01/08/2017 1

Outline
qResearch
topic
qRelated
works
qProposed
method
qExperiments
and
results
qNext
work
01/08/2017 2

Research
topic
q Goal:
predicting
properties
of
molecules
by
Deep
Networks
E.g.,
Prediction
of
toxicity
of
a
molecule
qConventional
approach
E.g.,
Morgan
fingerprint
(aka:
ECFP)
-‐Fingerprint
is
a
feature
vector
to
encode
which
substructures
are
present
in
the
molecule
qEnd-‐to-‐end
learning
is
Better???
E.g.,
Molecular
Convolutional
Neural
Networks
-‐data-‐driven
features
are
more
interpretable
and
efficient
Fingerprint
Feature
extraction
End-‐to-‐end
learnable

fingerprint
01/08/2017 3

Related
works
q D.
Duvenaud et
al,
CNNs
on
Graphs
for
Learning
Molecular
Fingerprint
(NIPS2015)
qM.
Defferrardet
al,
CNNs on
Graphs
with
Fast Localized
Spectral
Filtering
(NIPS2016)
qKipf et
al,
Semi-‐supervised
Classification
with
Graph
Convolutional
Networks
(ICLR2017)
q J.
Gilmer
et
al,
Neural
Message
Passing
for
Quantum
Chemistry
(ICML2017)
01/08/2017 4

Related
works
q D.
Duvenaud et
al,
CNNs
on
Graphs
for
Learning
Molecular
Fingerprint
(NIPS2015)
qM.
Defferrardet
al,
CNNs on
Graphs
with
Fast Localized
Spectral
Filtering
(NIPS2016)
qKipf et
al,
Semi-‐supervised
Classification
with
Graph
Convolutional
Networks
(ICLR2017)
q J.
Gilmer
et
al,
Neural
Message
Passing
for
Quantum
Chemistry
(ICML2017)
01/08/2017 5

[CNNs
on
Graphs
for
Learning
Molecular

Fingerprint
(NIPS2015)]
Contribution
q provide
an
end-‐to-‐end
learning
framework
to
learn
fingerprint
with

better
predictive
performance,
the
inputs
are
graphs
with
arbitrary
size

and
shape
q Efficient
computation
1. Fixed
fingerprint
must
be
large
to
encode
all
possible
substructures
2. Neural
fingerprint
can
be
learned
to
encode
relevant
features
for
classification-‐>

reduce
the
size
qNeural
fingerprint
is
more
interpretable-‐>
meaningful
01/08/2017 6

N
CC
C
O
r1
rN
ra
Algorithm
01/08/2017 7

N
CC
C
O
+
r1
rN
ra
v
Algorithm
01/08/2017 8

N
CC
C
O
+
H
r1
rN
ra
v
ra
Algorithm
Representation
at
atom
‘a’
represent
the

substructure
in
which
‘a’
is
a
root
01/08/2017 9

N
CC
C
O
+
W
H
NN
+
r1
rN
ra
v
ra
i
f
output
Algorithm
Transform
it
to

probability
vector
and

accumulated
to

fingerprint
01/08/2017 10

N
CC
C
O
+
W
H
NN
+
r1
rN
ra
v
ra
i
f
output
Algorithm
This
process
is
repeated
many

times
to
extract
substructures

of
different
levels

01/08/2017 11

N
CC
C
O
+
W
H
NN
+
r1
rN
ra
v
ra
i
f
output
Algorithm
01/08/2017 12

N
CC
C
O
+
W
H
NN
+
r1
rN
ra
v
ra
i
f
output
Algorithm
01/08/2017 13

Proposed
improvement
q Disadvantage
of
NFP:
oConsider
equally
different
bond
types
to

different
neighboring
atoms
oE.g.,
C-‐O
#
C=O
,
etc
oSoftmax output
of
all
substructruresare

averaged
o(Assumption:
properties
of
molecules
are
determined

by
very
few
subgraphs)
N
CC
C
O
+
W
H
NN
+
r1
rN
ra
v
ra
i
f
output
01/08/2017 14

Proposed
improvement
01/08/2017 15

Experiments
and
Results
q Data
sets:
Toxic21 (train:
10K,
test:
296),
Solubility
log
Mol/L
(#
samples:
1100)
q Goal:
comparison
of
ANFP
with
original
NFP
q Configuration:
same
for
two
methods
(NFP:
100x100),
MLP(100x100)
q Implementation:
Chainer,
Opt:
Adam.
Acc (%)
NFP 91.58
proposed 92.35
RMSE
NFP 0.64±0.05
proposed 0.53±0.06
Toxic21 Solubility
logMol/L
01/08/2017 16

[Message
Passing
Neural
Networks
for

quantum
chemistry]
q accepted
at
ICML2017
qA
general
framework
for
supervised
learning
for
graph
structured
data
q It
abstracts
the
commonalities
between
existing
neural
models
for
graph
qEasy
to
understand
the
general
ideas
of
different
proposed
models
and
come
up

with
new
variations
suitable
for
specific
data
type
01/08/2017 17

[Message
Passing
Neural
Networks
for

quantum
chemistry]
Forward
consists
2
phases:
q Message
Passing
1. Message
function
𝑚"
#$%
= ' 𝑀#(ℎ"
#
, ℎ,
#
, 𝑒",
#
)
,∈0(")
2. Update
function
ℎ"
#$%
= 𝑈#(ℎ"
#
, 𝑚"
#$%
)
q Readout
𝑦 = 𝑅({ℎ"
#
|𝑣 ∈ 𝐺})
ℎ,
#
ℎ"
#
𝑀#(ℎ"
#
, ℎ,
#
, 𝑒",
#
)
Message
passing
at
t-‐th step
Repeat
T
times
01/08/2017 18

[Message
Passing
Neural
Networks
for

quantum
chemistry]
Forward
consists
2
phases:
q Message
Passing
1. Message
function
𝑚"
#$%
= ' 𝑀#(ℎ"
#
, ℎ,
#
, 𝑒",
#
)
,∈0(")
2. Update
function
ℎ"
#$%
= 𝑈#(ℎ"
#
, 𝑚"
#$%
)
q Readout
𝑦 = 𝑅({ℎ"
#
|𝑣 ∈ 𝐺})
ℎ,
#
ℎ"
#
𝑀#(ℎ"
#
, ℎ,
#
, 𝑒",
#
)
Message
passing
at
t-‐th step
Repeat
T
times
Note:
all
these
functions
are
learnable
and
differentiable
and
can
be
learned
by
backpropagation
01/08/2017 19

[[Message
Passing
Neural
Networks
for

quantum
chemistry]
[CNNs
for
learning
Molecular
Fingerprint]
is
a
specific
case
of
MPNN
q Message
Passing
1. Message
function
𝑀# ℎ"
#
, ℎ,
#
, 𝑒",
#
= ℎ,
#
2. Update
function
𝑈# ℎ"
#
, 𝑚"
#$%
= 𝜎 𝐻#
;<= "
𝑚"
#$%
q Readout
𝑦 = 𝑓(' 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊#ℎ"
# )
",#
)
01/08/2017 20

Next
work:
[distributed
representation

learning
for
molecules
(Mol2vec)]
q Motivation:
Ø learning
molecule
representation
without
labeled
or
with
limited
number
of
samples
ØUseful
for
many
tasks,
e.g.,
Kernel
learning
for
graph
structured
data
Proposed
idea:
based
on
word2vector (used
in
NLP)
and
Neural
Message
Passing
(NMP)
q Skip-‐gram
model
for
word2vec
and
doc2vec
𝑤F
𝑤FG#
𝑤FG%
𝑤F$%
𝑤F$#
𝑚𝑎𝑥 ' ' log Pr
(𝑤F$N|𝑤F)
#
NOG#,P
𝑑𝑜𝑐
𝑤%
𝑤S
𝑤N
𝑤T
𝑚𝑎𝑥 ' ' logPr
(𝑤N |𝑑𝑜𝑐)
#
,U∈VWFVWF
Correspondence
Ø Docs

<-‐>
molecules
Ø Atoms
<-‐>
words
Ø How
about
substructures???
word2vec doc2vec
01/08/2017 21

[distributed
representation
learning
for
molecules]
Objective
function
𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 ' ' logPr
(𝑠𝑢𝑏N |𝑚𝑜𝑙)
^_`U ∈a
bcdae"ea
aO%
= 𝑎𝑟𝑔𝑚𝑎𝑥 ' ' log
exp
(𝑣bWa
i
𝑣^_`N)
∑ exp
(𝑣bWa
i
𝑣^_`k
)caa
l^_`U∈a
bcdae"ea
aO%
Where
𝜃 are
parameters
to
be
trained
(vector
representation

of
molecules,
atoms,
and
substructures
as
well)
C
C
O
N
[N,
C,
C,
O]

Level
1
[N-‐C,
C-‐C,
C=O]

Level
2
[N-‐C-‐C,
C-‐C=O]

Level
3
etc
Model
for
learning

molecule
representation
Two
questions:
1. How
to
represent
substructures,
atoms,
and
molecules
in
vector
form?
2. How
to
maximize
the
above
objective
function?
(i.e.,
calculate
the
denominator
with
huge

number
of
possible
subgraphs)
01/08/2017 22

[distributed
representation
learning
for
molecules]
1. How
to
represent
substructures,
atoms,
and
molecules
in
vector
form?
Using
message
function
and
update
function
Level
1:
Atoms’
representation
Level
2:
substructures’
representation
Level
3:
substructures’
representation

with
larger
coverage
01/08/2017 23
Level
4:
Cover
the
whole
graph
Consider
this
atom

[distributed
representation
learning
for

molecules]
Given
representation
vectors
of
atoms,
how
to
represent

substructures
????
ℎc
#
01/08/2017 24
ℎ`
#
ℎF
#
ℎm
#

[distributed
representation
learning
for

molecules]
Given
representation
vectors
of
atoms,
how
to
represent

substructures
????
Message
and
update:
ℎm
#$%
= 𝑓(ℎm
#$%
+ ℎc
#
𝑊mc + ℎ`
#
𝑊m`+ℎF
#
𝑊mF)
Where
𝑓 is
non-‐linear
function
ℎc
#
01/08/2017 25
ℎ`
#
ℎF
#
ℎm
#
𝑊mc
𝑊m`
𝑊mF

[distributed
representation
learning
for

molecules]
Given
representation
vectors
of
atoms,
how
to
represent

substructures
????
Message
and
update:
ℎm
#$%
= 𝑓(ℎm
#$%
+ ℎc
#
𝑊mc + ℎ`
#
𝑊m`+ℎF
#
𝑊mF)
ℎc
#
01/08/2017 26
ℎ`
#
ℎF
#
ℎm
#
𝑊mc
𝑊m`
𝑊mF
At
this
step,
representation
at
atom
r

represent
the
subgraph (r,
a,
b,
c)
with
root
r
By
doing
so,
we
can
represent
any

substructure
based
on
atoms’
representation

[distributed
representation
learning
for
molecules]
2. How
to
maximize
the
above
objective
function?
(i.e.,
calculate
the
denominator
with

huge
number
of
possible
subgraphs)
Objective
function:

𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 ∑ ∑ log
<op
("qrs
t
"uvwU)
∑ <op
("qrs
t
"uvwk
)xss
k
^_Ù∈a
bcdae"ea
aO%
Consider:

∑ log
<op
("qrs
t
"uvwU)
∑ <op
("qrs
t
"uvwk
)xss
k
^_Ù∈a =∑ 𝑣bWa
i
𝑣^_`N − log∑ exp
(𝑣bWa
i
𝑣^_`k
)caa
l^_Ù∈a
01/08/2017 27
Computationally
expensive

[distributed
representation
learning
for
molecules]
2. How
to
maximize
the
above
objective
function?
(i.e.,
calculate
the
denominator
with
huge

number
of
possible
subgraphs)
Objective
function:

𝜃 = 𝑎𝑟𝑔𝑚𝑎𝑥 ∑ ∑ log
<op
("qrs
t "uvwU )
∑ <op
("qrs
t "uvwk
)xss
k
^_Ù∈a
bcdae"ea
aO%
Consider:

∑ log
<op
("qrs
t "uvwU )
∑ <op
("qrs
t "uvwk
)xss
k
^_Ù∈a =∑ 𝑣bWa
i
𝑣^_`N − log∑ exp
(𝑣bWa
i
𝑣^_`k
)caa
l^_Ù∈a
Solution:

a) Compute
gradient
and
then
use
MCMC
to
obtain
approximate
gradients,
e.g.,
(adaptive)
importance
sampling
-‐
>
but
not
trivial
to
define
the
proposal
distribution
for
subgraphs (><)
b) Use
negative
sampling
-‐>
maybe
good
because
it
is
easy
to
sample
incorrect
subgraphs not
present
in
a

molecule
by
comparing
vector
representations.
(^^)
-‐>
You
do
not
need
to
compare
graph
which
is
NP-‐hard

problem
01/08/2017 28

Conclusion
q What
I
have
done:
v Covered
some
problems
and
solutions
on
application
of
CNNs
for
molecules.
vProposed
simple
ideas
to
improve
the
NFP
models
vProposed
supervised
variational models
for
predicting
molecules’
properties.

However,
this
is
lack
of
theoretical
correctness
-‐>
gave
up
vProposed
a
simple
unsupervised
learning
model
for
learning
vector

representation
for
molecules
q What
next?
v Implement
the
proposed
model
and
experiment
on
some
data
sets.
01/08/2017 29

Thanks
for
listening
01/08/2017 30

[CNNs
with
fast
localized
spectral
filter]
q Convolution
on
graphs
1. Graph
Fourier
Transform:
𝑥 −> 𝑥{ = 𝑈i
𝑥
where
𝑈 = 𝑢%, … , 𝑢T is
a
set
of
eigenvectors
of
Laplacian 𝐿 (i.e.,
𝐿 = 𝑈Λ𝑈i
)
2. Convolution
with
𝜃: 𝑥{ −> 𝜃⨀𝑥{
3. Inverse
Graph
Fourier:
𝜃⨀𝑥{ → 𝑈(𝜃⨀𝑥{)
In
short,
convolution
process
with
filter
𝜃 can
be
summarized
as:
𝑥 −> 𝑈 𝜃⨀𝑈i
𝑥 = 𝑈𝑑𝑖𝑎𝑔(𝜃)𝑈i
𝑥
01/08/2017 31

[CNNs
with
fast
localized
spectral
filter
(2)]
q Convolution
with
filter
𝜃
q Replace
𝑑 𝑖𝑎𝑔(𝜃) with
eigenvalues
of

𝐿 = 𝑈Λ𝑈i
,
obtaining:
𝑥 −> 𝑈Λ𝑈i
𝑥 = 𝐿𝑥
q In
general,

𝑥 → 𝑔ƒ 𝐿 𝑥 = (' 𝜃„ 𝐿„
)𝑥
…G%
„O†
Where
𝜃
is
the
parameters
to
be
learnt
01/08/2017 32

[CNNs
with
fast
localized
spectral
filter
(3)]
q Given
signal
x,
the
filtered
signal
y
is
determined
by
y = 𝑔ƒ 𝐿 𝑥 = (∑ 𝜃„ 𝐿„
)𝑥…G%
„O† = 𝜃𝑥̅ Where
𝜃 = 𝜃†, … , 𝜃…G%
q 𝜃 can
be
learnt
by
applying
chain
rule
01/08/2017 33

[Message
Passing
Neural
Networks]
[CNN
for
graph
with
fast
localized
spectral
filtering]
is
a
specific
case
of
MPNN
q Message
Passing
1. Message
function
𝑀# ℎ"
#
,ℎ,
#
, 𝑒",
#
= 𝐶",
#
ℎ,
#
Where
matrices
𝐶",
#
are
parameterized
by
the
eigenvectors
of
the
graph
laplacian L
2. Update
function
𝑈# ℎ"
#
, 𝑚"
#$%
= 𝜎 𝑚"
#$%
01/08/2017 34

Seminar

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Seminar

Semelhante a Seminar (20)

Mais de Dai-Hai Nguyen

Mais de Dai-Hai Nguyen (7)

Último

Último (20)

Seminar