Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
PMML for QSAR Model Exchange
1. PMML
for
QSAR
Model
Exchange
Rajarshi
Guha,
Ph.D.
NIH
Center
for
Advancing
TranslaEonal
Sciences
guhar@mail.nih.gov
/
h0p://rguha.net
2. Background
• CheminformaEcs
– QSAR,
diversity
analysis,
virtual
screening,
fragments,
polypharmacology,
networks
• RNAi
screening,
high
content
imaging
• Extensive
use
of
machine
learning
• All
Eed
together
with
soLware
development
(GUI’s,
libraries)
• Contributed
pmml.lm
to
the
PMML
package
4. Why
is
QSAR
Useful?
• Lets
us
predict
whether
a
chemical
is
likely
to
be
toxic,
avoiding
animal
tesEng
• PrioriEze
molecules
from
a
high
throughput
screen
of
300K
molecules
• Predict
whether
a
molecule
will
be
(sufficiently)
soluble
in
water
• IdenEfy
molecules
with
anE-‐malarial
properEes
• Accurate,
predic-ve
models
can
save
significant
-me
and
money
(and
cute
bunnies)
5. Lots
and
Lots
of
Models
• Hundreds
of
such
models
published
in
the
literature
– Usually
in
the
form
of
tables
of
regression
coefficients
(if
we’re
lucky)
– If
the
paper
describes
an
SVM
model,
no
chance
of
reproducing
the
results
• How
can
we
exchange
QSAR
models?
6. QSAR
Model
Exchange
• Build
models
in
….,
• Save
them
in
PMML
• Distribute
• …
• Profit?
– Not
always
The
bo0leneck
is
evalua:ng
descriptors
for
the
new
observa:ons
to
supply
to
the
model
7. CheminformaEcs
in
R
• rcdk
provides
cheminformaEcs
support
in
R
– Load
and
parse
molecular
file
formats
– Evaluate
numerical
descriptors
from
chemical
structures
rcdk
CDK Jmol rpubchem
rJava fingerprint XML
R Programming Environment
9. R,
rcdk,
PMML
• rcdk
provides
the
means
to
take
in
molecules
and
output
a
PMML
encoded
model
• One
could
record
appropriate
funcEons/classes
in
the
document
and
use
that
info
to
evaluate
descriptor
for
new
observaEons
• Since
rcdk
is
based
on
the
Java
CDK
library,
could
also
use
jpmml,
a
Java
API
for
PMML
documents