SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
✓




CH3
N
H3C
H
NS
N
O
CH3
N
OH
x ˆy
ˆy = f✓(x)
N
NH
OO
HH
H
H H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
O
O
O
O
O
O
Cl
H
H
H
H
H
HH
H
H
H
H
H
H
H
H
H
H
Br
Br O P
O
O Br
Br
O
Br
Br
H
H
H
H
H
H
H
H
H
H
HH
H
HH
N
S
N
N
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
O
N
O
O
H
H
H
O
O
H
H
N
O
O
Cl
ClCl
H
H
H
H
H
H H
N
O
O
H
H
H
H
H
H
H H
H
N
O
O
H
H
H
H
H
H
H
N
H
N
O
O
N
O
O
H
H
H
H
H
H
H
H
N
CH3
O
O
H
N
Cl
Cl
Cl
Cl
Cl
H3C
O O
O
O
O
O
H3C
CH3
CH2
O
HN
O
O
NH
CH3
HO
OH
CH3
N
O
O
CH3
N
N
H
N
H
H3C
N
H3C
H3C
NH
O
N
O
NO
CH3
O N
NH2
O
CH3
Br
CH3
N
H3C
H
NS
N
O
CH3
N
OH
CH3
CH3N
N
N
CH3H3C
H2N NH2
H
OH
O
HO
CH3
H
H
O
CH3
H
O
OH3C HH
H
O
H3C
S
CH3
O
H
H
O
CH3
CH3
OO
HO
H3CH
HO
F
H
O
H3C
NH2
O
N
HO
HO
O
H
H
O
O
OH3C
O
O
O
CH3
O
CH3
HO
CH3
H
O
O
CH3
H
H
N
H
N O
H3C
O
O
O


CH3
N
H3C
H
NS
N
O
CH3
N
OH
x ˆy
ˆy = f✓(x)
{(x1, y1), (x2, y2), . . . , (xn, yn)}
n
f✓ ✓
min
✓
nX
i=1
error(yi, ˆyi) ˆyi = f✓(xi)


x 7! y




SVM, LogReg,
GPR, RF, etc.


…
…




SciTegic12231509382D
13 13 0 0 0 0 999 V2000
-2.5458 -9.4750 0.0000 C 0 0
-3.3708 -9.4750 0.0000 C 0 0
-2.2875 -8.6917 0.0000 C 0 0
-3.6208 -8.6917 0.0000 C 0 0 2 0 0 0
-2.9583 -8.2042 0.0000 O 0 0
-4.3583 -8.3125 0.0000 C 0 0 1 0 0 0
-1.5000 -8.4375 0.0000 O 0 0
-2.0583 -10.1417 0.0000 O 0 0
-3.8500 -10.1417 0.0000 O 0 0
-5.0500 -8.7542 0.0000 O 0 0
-3.6958 -7.0417 0.0000 O 0 0
-4.3958 -7.4875 0.0000 C 0 0
-4.2083 -9.2667 0.0000 H 0 0
2 1 2 0
3 1 1 0
4 2 1 0
5 3 1 0
6 4 1 0
7 3 2 0
8 1 1 0
9 2 1 0
6 10 1 1
11 12 1 0
12 6 1 0
4 13 1 6
5 4 1 0
M END
OC[C@H](O)[C@H]1OC(=O)C(=C1O)O
InChI=1S/C6H8O6/
c7-1-2(8)5-3(9)4(10)6(11)12-5/
h2,5,7-10H,1H2/t2-,5+/m0/s1
CIWBSHSKHKDKBQ-JLAZNSOCSA-N


Mrv0541 04051115152D
10 10 0 0 0 0 999 V2000
-0.9408 0.3707 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.6552 -0.0418 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-0.9408 1.1957 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-0.2263 -0.0418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.4882 0.3707 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2027 -0.0418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.4882 1.1957 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
1.2027 -0.8668 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.4882 -1.2793 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.2263 -0.8668 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
4 5 2 0 0 0 0
4 10 1 0 0 0 0
5 6 1 0 0 0 0
5 7 1 0 0 0 0
6 8 2 0 0 0 0
8 9 1 0 0 0 0
9 10 2 0 0 0 0
M END
1
2
3
4
5
6
7
8
9
102
C
1
2 3
4
5
6
7
8
9
10
1
1
2 1
2
2 1
1
1
C
O O
C
C
N
C
C
C
@<TRIPOS>MOLECULE
*****
10 10 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C -0.9408 0.3707 0.0000 C.2 1 UNL1 0.3891
2 O -1.6552 -0.0418 0.0000 O.co2 1 UNL1 -0.2405
3 O -0.9408 1.1957 0.0000 O.co2 1 UNL1 -0.2405
4 C -0.2263 -0.0418 0.0000 C.ar 1 UNL1 0.0965
5 C 0.4882 0.3707 0.0000 C.ar 1 UNL1 0.0954
6 C 1.2027 -0.0418 0.0000 C.ar 1 UNL1 0.0183
7 N 0.4882 1.1957 0.0000 N.pl3 1 UNL1 -0.1278
8 C 1.2027 -0.8668 0.0000 C.ar 1 UNL1 0.0014
9 C 0.4882 -1.2793 0.0000 C.ar 1 UNL1 0.0003
10 C -0.2263 -0.8668 0.0000 C.ar 1 UNL1 0.0079
@<TRIPOS>BOND
1 1 2 ar
2 1 3 ar
3 1 4 1
4 4 5 ar
5 4 10 ar
6 5 6 ar
7 5 7 1
8 6 8 ar
9 8 9 ar
10 9 10 ar
C.2
ar
ar
ar
ar
ar
ar
1
ar
1
ar
O.co2 O.co2
C.ar
C.ar
C.ar
C.ar
C.ar
C.ar
N.pl3
1
2 3
4
5
6
7
8
9
10
a
h
h
h
h
d
h
h
a
h
r
r
r
r
r
r
r
rr
r
r
r
C
O
N
S
CC
C
C C
C
C
C
C
C
C
C
C
C
C C
C
C
O2x
C1x
C1x
C1x
C1x
N1x
C1b C1b
S2a
C1c
C8y
C8y
C8x
C8x
C8x
C8x
C8x
C8xC8x
C8x
C8x
C8x
RA
L
L
Ar
Ar
A
Structure
diagram
Skeletal
topology
Atom/bond
labeled graph
KEGG atom
labeled graph
(KCF)
Pharmacophore
type labeled graph
(ChemAxon Screen)
Reduced graph
1
1
1 1
1
1
1
1
1
1 1
2
1
1
1
1
1
2
2
2
2
2
1
1
…
…
…
…
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
…
⋮ ⋮
y g








CH3
N
H3C
H
NS
N
O
CH3
N
OH
x ˆy
ˆy = f✓(x)
{(x1, y1), (x2, y2), . . . , (xn, yn)}
n
f✓ ✓
min
✓
nX
i=1
error(yi, ˆyi) ˆyi = f✓(xi)


x 7! y
x =

x1
x2
y
(z)
z
+1
1
tanh(z) 2 ( 1, 1)
x1
x2
y
11
h0
1
h0
2
h1
h2
h3
w0
ji
w00
i
wkj
i
0
j
ijk
✓ = (wkj, w0
ji, w00
i , 0
j, i)
y =
2X
i
w00
i (h0
i i) =
2X
i
w00
i
0
@
3X
j
w0
ji (hj
0
j) i
1
A
=
2X
i
w00
i
0
@
3X
j
w0
ji
2X
k=1
wkjxk
0
j
!
i
1
A
min
✓
L(✓)
L(✓) =
nX
i=1
error(yi, f✓(xi))
r✓L(✓t) =
2
6
4
@L(✓)/@✓1 |✓=✓t
@L(✓)/@✓2 |✓=✓t
...
3
7
5
✓t+1 ✓t ⌘ · r✓L(✓t)




✓t+1 ✓t ⌘ · r✓L(✓t) L(✓) =
Pn
i=1 error(yi, f✓(xi))
Li(✓) = error(yi, f✓(xi))✓t+1 ✓t ⌘ · r✓Li(✓t)
Lm
i (✓) =
Pi+m
k=i error(yk, f✓(xk))✓t+1 ✓t ⌘ · r✓Lm
i (✓t)
x =
2
6
6
6
6
4
x1
x2
x3
x4
x5
3
7
7
7
7
5
y =
2
4
y1
y2
y3
3
5
x 7! y
y = f✓(x)
x1
x2
x3
x4
x5
y1
y2
y3
wij ! wij + w yk ! yk + y
wij
@f✓(x)
@wij
=
@yk
@wij
a
c
b
d
e
add
mult
add
1
c = a + b
d = b + 1
e = c ⇤ d
add
mult
add
1
a 2
b 1
a = 2 b = 1
c = 3 d = 2
e = 6
add
mult
add
1a = 2 b = 1
c = 3 d = 2
e = 6
@e
@c
= 2
@c
@a
= 1
@c
@b
= 1
@d
@b
= 1
@e
@d
= 3
c = a + b
d = b + 1
e = c ⇤ d
add
mult
add
1
@e
@c
= 2
@c
@a
= 1
@c
@b
= 1
@d
@b
= 1
@e
@d
= 3
add
mult
add
1
@e
@c
= 2
@c
@a
= 1
@c
@b
= 1
@d
@b
= 1
@e
@d
= 3
@a
@b
= 0
@c
@b
= 1
@b
@b
= 1
@d
@b
= 1
@e
@b
= 5
@e
@e
= 1
@e
@c
= 2
@e
@d
= 3
@e
@b
= 5
@e
@a
= 2
@e
@b
=
@e
@c
@c
@b
+
@e
@d
@d
@b
x(1), x(2), . . . , x(t) y(1), y(2), . . . , y(t)7!
x(t)
h(t)
y(t)
x(1) x(2)
y(2)y(1)
h(0) h(1) h(2)
x(t)
y(t)
h(t)


h(t)
y(t)
x(t)
h(t 1)
x(t)
h(t 1)
⇥ +
⇥
tanh(·)
(·)tanh(·)(·)(·)
⇥
y(t)
x(t)
(·)(·) tanh(·)
1 · ⇥
+⇥
⇥
y(t)
x(t)
w1
w2
w3
w4
i j



f : Rn
! Rm
f✓
f✓
✓
x yx y
✓
CH3
N
H3C
H
NS
N
O
CH3
N
OH
x ˆy
ˆy = f✓(x)
{(x1, y1), (x2, y2), . . . , (xn, yn)}
n
f✓ ✓
min
✓
nX
i=1
error(yi, ˆyi) ˆyi = f✓(xi)


x 7! y
Vector Annotations for Atoms (RDKit defaults)
000100001010001000000010000100100101000010010001010010002
1
0
3
4
5
6
7
8
9
Layer-0 (diameter 0) Layer-1 (diameter 2) Layer-2 (diameter 4)
0 1 2
3 4 5 6
7 8 9
3
0
4
7
1
5
8
2
6
9
847957139 3217380708 3218693969
3218693969 3218693969 3218693969
864942730 2246699815 864662311
3217380708
1510328189 2784506312 1533864325
4158944142 2309124039 951226070
951226070 98513984 98513984
1083852209
2784506312 132611095 2784506312
916604632 3450167988 2987120039
1171638766 3999906991 3999906991
4158944142
folding into a fixed length




a1
a2
a3
a4
a5
a1 a2 a3 a4 a5 (a1, a2) (a1, a3) (a4, a5)
A P
(ai, aj)
ai aj
ai
fA fP
(A ! A)0
(A ! A)1
(A ! P)0 (P ! P)0
(P ! P)1
(P ! A)0
A P
v
hv
h(t 1)
v
a(t)
v
h(t 1)
v
a(t)
v
h(t)
v
v
tanh
X
v2V
(yv) tanh(zv)
!

hT
v
xv
yv
zv
M
M
vv
hv
N(v)
m
U
U
R
G = (V, E, W), L = diag(d) W, di =
P
j Wij
x 2 R|V |
ˆx = U0
x, L = U⇤U0
g✓ ⇤ x = Ug✓(⇤)U0
x
Frontiers of data-driven property prediction: molecular machine learning
Frontiers of data-driven property prediction: molecular machine learning

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

[Dl輪読会]A simple neural network module for relational reasoning
[Dl輪読会]A simple neural network module for relational reasoning[Dl輪読会]A simple neural network module for relational reasoning
[Dl輪読会]A simple neural network module for relational reasoning
 
Bioconductorも便利ですよ ~ConsensusClusterPlus(CCP)の紹介~
Bioconductorも便利ですよ ~ConsensusClusterPlus(CCP)の紹介~Bioconductorも便利ですよ ~ConsensusClusterPlus(CCP)の紹介~
Bioconductorも便利ですよ ~ConsensusClusterPlus(CCP)の紹介~
 
Demystifying Xgboost
Demystifying XgboostDemystifying Xgboost
Demystifying Xgboost
 
東大大学院 電子情報学特論講義資料「深層学習概論と理論解析の課題」大野健太
東大大学院 電子情報学特論講義資料「深層学習概論と理論解析の課題」大野健太東大大学院 電子情報学特論講義資料「深層学習概論と理論解析の課題」大野健太
東大大学院 電子情報学特論講義資料「深層学習概論と理論解析の課題」大野健太
 
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
科学と機械学習のあいだ:変量の設計・変換・選択・交互作用・線形性
 
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
 
MICの解説
MICの解説MICの解説
MICの解説
 
Deep forest
Deep forestDeep forest
Deep forest
 
Overview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostOverview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboost
 
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
 
機械学習 / Deep Learning 大全 (2) Deep Learning 基礎編
機械学習 / Deep Learning 大全 (2) Deep Learning 基礎編機械学習 / Deep Learning 大全 (2) Deep Learning 基礎編
機械学習 / Deep Learning 大全 (2) Deep Learning 基礎編
 
CNNの構造最適化手法について
CNNの構造最適化手法についてCNNの構造最適化手法について
CNNの構造最適化手法について
 
機械学習と機械発見:自然科学研究におけるデータ利活用の再考
機械学習と機械発見:自然科学研究におけるデータ利活用の再考機械学習と機械発見:自然科学研究におけるデータ利活用の再考
機械学習と機械発見:自然科学研究におけるデータ利活用の再考
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
 
『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会
 
クラシックな機械学習の入門  8. クラスタリング
クラシックな機械学習の入門  8. クラスタリングクラシックな機械学習の入門  8. クラスタリング
クラシックな機械学習の入門  8. クラスタリング
 
グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題
 
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ[DL輪読会]ドメイン転移と不変表現に関するサーベイ
[DL輪読会]ドメイン転移と不変表現に関するサーベイ
 
統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)統計的学習の基礎 5章前半(~5.6)
統計的学習の基礎 5章前半(~5.6)
 
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
 

Semelhante a Frontiers of data-driven property prediction: molecular machine learning

Ppt 1stelj Getallen
Ppt 1stelj GetallenPpt 1stelj Getallen
Ppt 1stelj Getallen
guest18f0ed
 
Calculo
CalculoCalculo
Calculo
Ju Lio
 
Formulario derivadas e integrales
Formulario derivadas e integralesFormulario derivadas e integrales
Formulario derivadas e integrales
Geovanny Jiménez
 
September 24, 2013 Special Meeting Agenda packet
September 24, 2013 Special Meeting Agenda packetSeptember 24, 2013 Special Meeting Agenda packet
September 24, 2013 Special Meeting Agenda packet
City of San Angelo Texas
 

Semelhante a Frontiers of data-driven property prediction: molecular machine learning (20)

ふわふわディスプレイの開発(FAN2011)
ふわふわディスプレイの開発(FAN2011)ふわふわディスプレイの開発(FAN2011)
ふわふわディスプレイの開発(FAN2011)
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Pratt truss optimization using
Pratt truss optimization usingPratt truss optimization using
Pratt truss optimization using
 
Ppt 1stelj Getallen
Ppt 1stelj GetallenPpt 1stelj Getallen
Ppt 1stelj Getallen
 
Formulario de Calculo Diferencial-Integral
Formulario de Calculo Diferencial-IntegralFormulario de Calculo Diferencial-Integral
Formulario de Calculo Diferencial-Integral
 
Trignometary
TrignometaryTrignometary
Trignometary
 
Trignometary
TrignometaryTrignometary
Trignometary
 
01 tabla normal
01 tabla normal01 tabla normal
01 tabla normal
 
Matriz ensamblada
Matriz ensambladaMatriz ensamblada
Matriz ensamblada
 
Formulas de calculo
Formulas de calculoFormulas de calculo
Formulas de calculo
 
Calculo
CalculoCalculo
Calculo
 
Tablas calculo
Tablas calculoTablas calculo
Tablas calculo
 
Formulario
FormularioFormulario
Formulario
 
Formulario derivadas e integrales
Formulario derivadas e integralesFormulario derivadas e integrales
Formulario derivadas e integrales
 
E1 f4 bộ binh
E1 f4 bộ binhE1 f4 bộ binh
E1 f4 bộ binh
 
FINITE DIFFERENCE METHOD FOR 2D- HEAT TRANSFER
FINITE DIFFERENCE METHOD FOR 2D- HEAT TRANSFERFINITE DIFFERENCE METHOD FOR 2D- HEAT TRANSFER
FINITE DIFFERENCE METHOD FOR 2D- HEAT TRANSFER
 
September 24, 2013 Special Meeting Agenda packet
September 24, 2013 Special Meeting Agenda packetSeptember 24, 2013 Special Meeting Agenda packet
September 24, 2013 Special Meeting Agenda packet
 
Formulario calculo
Formulario calculoFormulario calculo
Formulario calculo
 
Formulario cálculo
Formulario cálculoFormulario cálculo
Formulario cálculo
 
SUEC 高中 Adv Maths (Matrix) (Part 3).pptx
SUEC 高中 Adv Maths (Matrix) (Part 3).pptxSUEC 高中 Adv Maths (Matrix) (Part 3).pptx
SUEC 高中 Adv Maths (Matrix) (Part 3).pptx
 

Mais de Ichigaku Takigawa

(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから (2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
Ichigaku Takigawa
 

Mais de Ichigaku Takigawa (20)

機械学習と自動微分
機械学習と自動微分機械学習と自動微分
機械学習と自動微分
 
データ社会を生きる技術
〜機械学習の夢と現実〜
データ社会を生きる技術
〜機械学習の夢と現実〜データ社会を生きる技術
〜機械学習の夢と現実〜
データ社会を生きる技術
〜機械学習の夢と現実〜
 
機械学習を科学研究で使うとは?
機械学習を科学研究で使うとは?機械学習を科学研究で使うとは?
機械学習を科学研究で使うとは?
 
A Modern Introduction to Decision Tree Ensembles
A Modern Introduction to Decision Tree EnsemblesA Modern Introduction to Decision Tree Ensembles
A Modern Introduction to Decision Tree Ensembles
 
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
 
機械学習と機械発見:自然科学融合が誘起するデータ科学の新展開
機械学習と機械発見:自然科学融合が誘起するデータ科学の新展開機械学習と機械発見:自然科学融合が誘起するデータ科学の新展開
機械学習と機械発見:自然科学融合が誘起するデータ科学の新展開
 
小1にルービックキューブを教えてみた 〜群論スポーツの教育とパターン認知〜
小1にルービックキューブを教えてみた 〜群論スポーツの教育とパターン認知〜小1にルービックキューブを教えてみた 〜群論スポーツの教育とパターン認知〜
小1にルービックキューブを教えてみた 〜群論スポーツの教育とパターン認知〜
 
"データ化"する化学と情報技術・人工知能・データサイエンス
"データ化"する化学と情報技術・人工知能・データサイエンス"データ化"する化学と情報技術・人工知能・データサイエンス
"データ化"する化学と情報技術・人工知能・データサイエンス
 
自然科学における機械学習と機械発見
自然科学における機械学習と機械発見自然科学における機械学習と機械発見
自然科学における機械学習と機械発見
 
幾何と機械学習: A Short Intro
幾何と機械学習: A Short Intro幾何と機械学習: A Short Intro
幾何と機械学習: A Short Intro
 
決定森回帰の信頼区間推定, Benign Overfitting, 多変量木とReLUネットの入力空間分割
決定森回帰の信頼区間推定, Benign Overfitting, 多変量木とReLUネットの入力空間分割決定森回帰の信頼区間推定, Benign Overfitting, 多変量木とReLUネットの入力空間分割
決定森回帰の信頼区間推定, Benign Overfitting, 多変量木とReLUネットの入力空間分割
 
Machine Learning for Molecules: Lessons and Challenges of Data-Centric Chemistry
Machine Learning for Molecules: Lessons and Challenges of Data-Centric ChemistryMachine Learning for Molecules: Lessons and Challenges of Data-Centric Chemistry
Machine Learning for Molecules: Lessons and Challenges of Data-Centric Chemistry
 
自己紹介:機械学習・機械発見とデータ中心的自然科学
自己紹介:機械学習・機械発見とデータ中心的自然科学自己紹介:機械学習・機械発見とデータ中心的自然科学
自己紹介:機械学習・機械発見とデータ中心的自然科学
 
機械学習・機械発見から見るデータ中心型化学の野望と憂鬱
機械学習・機械発見から見るデータ中心型化学の野望と憂鬱機械学習・機械発見から見るデータ中心型化学の野望と憂鬱
機械学習・機械発見から見るデータ中心型化学の野望と憂鬱
 
(2021.11) 機械学習と機械発見:データ中心型の化学・材料科学の教訓とこれから
(2021.11) 機械学習と機械発見:データ中心型の化学・材料科学の教訓とこれから(2021.11) 機械学習と機械発見:データ中心型の化学・材料科学の教訓とこれから
(2021.11) 機械学習と機械発見:データ中心型の化学・材料科学の教訓とこれから
 
(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから (2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
(2021.10) 機械学習と機械発見 データ中心型の化学・材料科学の教訓とこれから
 
帰納バイアスと分子の組合せ的表現・幾何的表現 (本発表)
帰納バイアスと分子の組合せ的表現・幾何的表現 (本発表)帰納バイアスと分子の組合せ的表現・幾何的表現 (本発表)
帰納バイアスと分子の組合せ的表現・幾何的表現 (本発表)
 
帰納バイアスと分子の組合せ的表現・幾何的表現 (3minフラッシュトーク)
帰納バイアスと分子の組合せ的表現・幾何的表現 (3minフラッシュトーク)帰納バイアスと分子の組合せ的表現・幾何的表現 (3minフラッシュトーク)
帰納バイアスと分子の組合せ的表現・幾何的表現 (3minフラッシュトーク)
 
(2021.7) 分子のグラフ表現と機械学習の最近
(2021.7) 分子のグラフ表現と機械学習の最近(2021.7) 分子のグラフ表現と機械学習の最近
(2021.7) 分子のグラフ表現と機械学習の最近
 
A machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discoveryA machine-learning view on heterogeneous catalyst design and discovery
A machine-learning view on heterogeneous catalyst design and discovery
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Frontiers of data-driven property prediction: molecular machine learning

  • 1.
  • 2.
  • 3.
  • 5. N NH OO HH H H H H H H H H H H H H H H H H H H H H H H H O O O O O O Cl H H H H H HH H H H H H H H H H H Br Br O P O O Br Br O Br Br H H H H H H H H H H HH H HH N S N N H H H H H H H H H H H H H H H O N O O H H H O O H H N O O Cl ClCl H H H H H H H N O O H H H H H H H H H N O O H H H H H H H N H N O O N O O H H H H H H H H N CH3 O O H N Cl Cl Cl Cl Cl H3C O O O O O O H3C CH3 CH2 O HN O O NH CH3 HO OH CH3 N O O CH3 N N H N H H3C N H3C H3C NH O N O NO CH3 O N NH2 O CH3 Br CH3 N H3C H NS N O CH3 N OH CH3 CH3N N N CH3H3C H2N NH2 H OH O HO CH3 H H O CH3 H O OH3C HH H O H3C S CH3 O H H O CH3 CH3 OO HO H3CH HO F H O H3C NH2 O N HO HO O H H O O OH3C O O O CH3 O CH3 HO CH3 H O O CH3 H H N H N O H3C O O O
  • 6.
  • 7. CH3 N H3C H NS N O CH3 N OH x ˆy ˆy = f✓(x) {(x1, y1), (x2, y2), . . . , (xn, yn)} n f✓ ✓ min ✓ nX i=1 error(yi, ˆyi) ˆyi = f✓(xi) 
 x 7! y
  • 9.
  • 10. SVM, LogReg, GPR, RF, etc. 
 … …
  • 11.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. SciTegic12231509382D 13 13 0 0 0 0 999 V2000 -2.5458 -9.4750 0.0000 C 0 0 -3.3708 -9.4750 0.0000 C 0 0 -2.2875 -8.6917 0.0000 C 0 0 -3.6208 -8.6917 0.0000 C 0 0 2 0 0 0 -2.9583 -8.2042 0.0000 O 0 0 -4.3583 -8.3125 0.0000 C 0 0 1 0 0 0 -1.5000 -8.4375 0.0000 O 0 0 -2.0583 -10.1417 0.0000 O 0 0 -3.8500 -10.1417 0.0000 O 0 0 -5.0500 -8.7542 0.0000 O 0 0 -3.6958 -7.0417 0.0000 O 0 0 -4.3958 -7.4875 0.0000 C 0 0 -4.2083 -9.2667 0.0000 H 0 0 2 1 2 0 3 1 1 0 4 2 1 0 5 3 1 0 6 4 1 0 7 3 2 0 8 1 1 0 9 2 1 0 6 10 1 1 11 12 1 0 12 6 1 0 4 13 1 6 5 4 1 0 M END OC[C@H](O)[C@H]1OC(=O)C(=C1O)O InChI=1S/C6H8O6/ c7-1-2(8)5-3(9)4(10)6(11)12-5/ h2,5,7-10H,1H2/t2-,5+/m0/s1 CIWBSHSKHKDKBQ-JLAZNSOCSA-N 

  • 18. Mrv0541 04051115152D 10 10 0 0 0 0 999 V2000 -0.9408 0.3707 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.6552 -0.0418 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.9408 1.1957 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2263 -0.0418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4882 0.3707 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.2027 -0.0418 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4882 1.1957 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 1.2027 -0.8668 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4882 -1.2793 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2263 -0.8668 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 4 5 2 0 0 0 0 4 10 1 0 0 0 0 5 6 1 0 0 0 0 5 7 1 0 0 0 0 6 8 2 0 0 0 0 8 9 1 0 0 0 0 9 10 2 0 0 0 0 M END 1 2 3 4 5 6 7 8 9 102 C 1 2 3 4 5 6 7 8 9 10 1 1 2 1 2 2 1 1 1 C O O C C N C C C
  • 19. @<TRIPOS>MOLECULE ***** 10 10 0 0 0 SMALL GASTEIGER @<TRIPOS>ATOM 1 C -0.9408 0.3707 0.0000 C.2 1 UNL1 0.3891 2 O -1.6552 -0.0418 0.0000 O.co2 1 UNL1 -0.2405 3 O -0.9408 1.1957 0.0000 O.co2 1 UNL1 -0.2405 4 C -0.2263 -0.0418 0.0000 C.ar 1 UNL1 0.0965 5 C 0.4882 0.3707 0.0000 C.ar 1 UNL1 0.0954 6 C 1.2027 -0.0418 0.0000 C.ar 1 UNL1 0.0183 7 N 0.4882 1.1957 0.0000 N.pl3 1 UNL1 -0.1278 8 C 1.2027 -0.8668 0.0000 C.ar 1 UNL1 0.0014 9 C 0.4882 -1.2793 0.0000 C.ar 1 UNL1 0.0003 10 C -0.2263 -0.8668 0.0000 C.ar 1 UNL1 0.0079 @<TRIPOS>BOND 1 1 2 ar 2 1 3 ar 3 1 4 1 4 4 5 ar 5 4 10 ar 6 5 6 ar 7 5 7 1 8 6 8 ar 9 8 9 ar 10 9 10 ar C.2 ar ar ar ar ar ar 1 ar 1 ar O.co2 O.co2 C.ar C.ar C.ar C.ar C.ar C.ar N.pl3 1 2 3 4 5 6 7 8 9 10
  • 20. a h h h h d h h a h r r r r r r r rr r r r C O N S CC C C C C C C C C C C C C C C C C O2x C1x C1x C1x C1x N1x C1b C1b S2a C1c C8y C8y C8x C8x C8x C8x C8x C8xC8x C8x C8x C8x RA L L Ar Ar A Structure diagram Skeletal topology Atom/bond labeled graph KEGG atom labeled graph (KCF) Pharmacophore type labeled graph (ChemAxon Screen) Reduced graph 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 2 2 2 1 1
  • 21. … … … … ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ … ⋮ ⋮ y g 
 
 

  • 22.
  • 23.
  • 24.
  • 25. CH3 N H3C H NS N O CH3 N OH x ˆy ˆy = f✓(x) {(x1, y1), (x2, y2), . . . , (xn, yn)} n f✓ ✓ min ✓ nX i=1 error(yi, ˆyi) ˆyi = f✓(xi) 
 x 7! y
  • 26. x =  x1 x2 y (z) z +1 1 tanh(z) 2 ( 1, 1) x1 x2 y 11 h0 1 h0 2 h1 h2 h3 w0 ji w00 i wkj i 0 j ijk ✓ = (wkj, w0 ji, w00 i , 0 j, i) y = 2X i w00 i (h0 i i) = 2X i w00 i 0 @ 3X j w0 ji (hj 0 j) i 1 A = 2X i w00 i 0 @ 3X j w0 ji 2X k=1 wkjxk 0 j ! i 1 A
  • 27. min ✓ L(✓) L(✓) = nX i=1 error(yi, f✓(xi)) r✓L(✓t) = 2 6 4 @L(✓)/@✓1 |✓=✓t @L(✓)/@✓2 |✓=✓t ... 3 7 5 ✓t+1 ✓t ⌘ · r✓L(✓t)
  • 28.
  • 29.
  • 30.
  • 31. ✓t+1 ✓t ⌘ · r✓L(✓t) L(✓) = Pn i=1 error(yi, f✓(xi)) Li(✓) = error(yi, f✓(xi))✓t+1 ✓t ⌘ · r✓Li(✓t) Lm i (✓) = Pi+m k=i error(yk, f✓(xk))✓t+1 ✓t ⌘ · r✓Lm i (✓t)
  • 32. x = 2 6 6 6 6 4 x1 x2 x3 x4 x5 3 7 7 7 7 5 y = 2 4 y1 y2 y3 3 5 x 7! y y = f✓(x) x1 x2 x3 x4 x5 y1 y2 y3 wij ! wij + w yk ! yk + y wij @f✓(x) @wij = @yk @wij
  • 33. a c b d e add mult add 1 c = a + b d = b + 1 e = c ⇤ d add mult add 1 a 2 b 1 a = 2 b = 1 c = 3 d = 2 e = 6
  • 34. add mult add 1a = 2 b = 1 c = 3 d = 2 e = 6 @e @c = 2 @c @a = 1 @c @b = 1 @d @b = 1 @e @d = 3 c = a + b d = b + 1 e = c ⇤ d add mult add 1 @e @c = 2 @c @a = 1 @c @b = 1 @d @b = 1 @e @d = 3 add mult add 1 @e @c = 2 @c @a = 1 @c @b = 1 @d @b = 1 @e @d = 3 @a @b = 0 @c @b = 1 @b @b = 1 @d @b = 1 @e @b = 5 @e @e = 1 @e @c = 2 @e @d = 3 @e @b = 5 @e @a = 2 @e @b = @e @c @c @b + @e @d @d @b
  • 35.
  • 36. x(1), x(2), . . . , x(t) y(1), y(2), . . . , y(t)7! x(t) h(t) y(t) x(1) x(2) y(2)y(1) h(0) h(1) h(2) x(t) y(t) h(t) 
 h(t) y(t) x(t) h(t 1) x(t) h(t 1) ⇥ + ⇥ tanh(·) (·)tanh(·)(·)(·) ⇥ y(t) x(t) (·)(·) tanh(·) 1 · ⇥ +⇥ ⇥ y(t) x(t)
  • 38.
  • 39. 
f : Rn ! Rm f✓ f✓ ✓ x yx y ✓
  • 40.
  • 41. CH3 N H3C H NS N O CH3 N OH x ˆy ˆy = f✓(x) {(x1, y1), (x2, y2), . . . , (xn, yn)} n f✓ ✓ min ✓ nX i=1 error(yi, ˆyi) ˆyi = f✓(xi) 
 x 7! y
  • 42. Vector Annotations for Atoms (RDKit defaults)
  • 43. 000100001010001000000010000100100101000010010001010010002 1 0 3 4 5 6 7 8 9 Layer-0 (diameter 0) Layer-1 (diameter 2) Layer-2 (diameter 4) 0 1 2 3 4 5 6 7 8 9 3 0 4 7 1 5 8 2 6 9 847957139 3217380708 3218693969 3218693969 3218693969 3218693969 864942730 2246699815 864662311 3217380708 1510328189 2784506312 1533864325 4158944142 2309124039 951226070 951226070 98513984 98513984 1083852209 2784506312 132611095 2784506312 916604632 3450167988 2987120039 1171638766 3999906991 3999906991 4158944142 folding into a fixed length
  • 45.
  • 46.
  • 47.
  • 48. a1 a2 a3 a4 a5 a1 a2 a3 a4 a5 (a1, a2) (a1, a3) (a4, a5) A P (ai, aj) ai aj ai fA fP (A ! A)0 (A ! A)1 (A ! P)0 (P ! P)0 (P ! P)1 (P ! A)0 A P
  • 51.
  • 52. G = (V, E, W), L = diag(d) W, di = P j Wij x 2 R|V | ˆx = U0 x, L = U⇤U0 g✓ ⇤ x = Ug✓(⇤)U0 x