SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
PMML	
  for	
  QSAR	
  Model	
  Exchange	
  	
  

            Rajarshi	
  Guha,	
  Ph.D.	
  	
  
          NIH	
  Center	
  for	
  Advancing	
  
           TranslaEonal	
  Sciences	
  


          guhar@mail.nih.gov	
  /	
  h0p://rguha.net	
  
Background	
  
•  CheminformaEcs	
  	
  
   –  QSAR,	
  diversity	
  analysis,	
  virtual	
  screening,	
  	
  
      fragments,	
  polypharmacology,	
  networks	
  
•  RNAi	
  screening,	
  high	
  content	
  imaging	
  
•  Extensive	
  use	
  of	
  machine	
  learning	
  
•  All	
  Eed	
  together	
  with	
  soLware	
  	
  
   development	
  (GUI’s,	
  libraries)	
  
•  Contributed	
  pmml.lm	
  to	
  the	
  PMML	
  
   package	
  
QuanEtaEve	
  Structure	
  AcEvity	
  
       RelaEonships	
  
Why	
  is	
  QSAR	
  Useful?	
  
•  Lets	
  us	
  predict	
  whether	
  a	
  chemical	
  is	
  likely	
  to	
  
   be	
  toxic,	
  avoiding	
  animal	
  tesEng	
  
•  PrioriEze	
  molecules	
  from	
  a	
  high	
  throughput	
  
   screen	
  of	
  300K	
  molecules	
  
•  Predict	
  whether	
  a	
  molecule	
  will	
  be	
  (sufficiently)	
  
   soluble	
  in	
  water	
  
•  IdenEfy	
  molecules	
  with	
  anE-­‐malarial	
  properEes	
  
•  Accurate,	
  predic-ve	
  models	
  can	
  save	
  
   significant	
  -me	
  and	
  money	
  (and	
  cute	
  bunnies)	
  
Lots	
  and	
  Lots	
  of	
  Models	
  
•  Hundreds	
  of	
  such	
  models	
  published	
  in	
  the	
  
   literature	
  
    –  Usually	
  in	
  the	
  form	
  of	
  tables	
  of	
  regression	
  
       coefficients	
  (if	
  we’re	
  lucky)	
  
    –  If	
  the	
  paper	
  describes	
  an	
  SVM	
  model,	
  no	
  chance	
  
       of	
  reproducing	
  the	
  results	
  
•  How	
  can	
  we	
  exchange	
  QSAR	
  models?	
  
QSAR	
  Model	
  Exchange	
  
•    Build	
  models	
  in	
  ….,	
  	
  
•    Save	
  them	
  in	
  PMML	
  
•    Distribute	
  
•    …	
  
•    Profit?	
  
      –  Not	
  always	
  
	
  
The	
  bo0leneck	
  is	
  evalua:ng	
  descriptors	
  for	
  the	
  
new	
  observa:ons	
  to	
  supply	
  to	
  the	
  model	
  
CheminformaEcs	
  in	
  R	
  
•  rcdk	
  provides	
  cheminformaEcs	
  support	
  in	
  R	
  
   –  Load	
  and	
  parse	
  molecular	
  file	
  formats	
  
   –  Evaluate	
  numerical	
  descriptors	
  from	
  chemical	
  
      structures	
  

      rcdk

CDK           Jmol                                      rpubchem

      rJava                  fingerprint                   XML

                R Programming Environment
CheminformaEcs	
  in	
  R	
  

library(pmml)!
library(rcdk)!
data(bpdata)!
mols <- parse.smiles(bpdata[, 1])!
descNames <- unique(unlist(sapply('topological',      !
                                   get.desc.names)))!
descs <- eval.desc(mols, descNames)!
model <- lm(BP ~ khs.sCH3 + khs.sF + TopoPSA + VABC,
data.frame(bpdata,descs))!
pmml(model)!
R,	
  rcdk,	
  PMML	
  
•  rcdk	
  provides	
  the	
  means	
  to	
  take	
  in	
  molecules	
  
   and	
  output	
  a	
  PMML	
  encoded	
  model	
  
•  One	
  could	
  record	
  appropriate	
  funcEons/classes	
  
   in	
  the	
  document	
  and	
  use	
  that	
  info	
  to	
  evaluate	
  
   descriptor	
  for	
  new	
  observaEons	
  
•  Since	
  rcdk	
  is	
  based	
  on	
  the	
  Java	
  CDK	
  library,	
  
   could	
  also	
  use	
  jpmml,	
  a	
  Java	
  API	
  for	
  PMML	
  
   documents	
  

Mais conteúdo relacionado

Semelhante a PMML for QSAR Model Exchange

Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Databricks
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Richard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentationRichard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentationCertara
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformaticsJoel Ricci-López
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Spark Summit
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsTim Clark
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentsTim Clark
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistryguest5929fa7
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistrybaoilleach
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 

Semelhante a PMML for QSAR Model Exchange (20)

Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Richard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentationRichard Cramer 2014 euro QSAR presentation
Richard Cramer 2014 euro QSAR presentation
 
An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Hcls sci disc-isa2rdf
Hcls sci disc-isa2rdfHcls sci disc-isa2rdf
Hcls sci disc-isa2rdf
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
Building Large Scale Machine Learning Applications with Pipelines-(Evan Spark...
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 

Mais de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 

Mais de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
Smashing Molecules
Smashing MoleculesSmashing Molecules
Smashing Molecules
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

PMML for QSAR Model Exchange

  • 1. PMML  for  QSAR  Model  Exchange     Rajarshi  Guha,  Ph.D.     NIH  Center  for  Advancing   TranslaEonal  Sciences   guhar@mail.nih.gov  /  h0p://rguha.net  
  • 2. Background   •  CheminformaEcs     –  QSAR,  diversity  analysis,  virtual  screening,     fragments,  polypharmacology,  networks   •  RNAi  screening,  high  content  imaging   •  Extensive  use  of  machine  learning   •  All  Eed  together  with  soLware     development  (GUI’s,  libraries)   •  Contributed  pmml.lm  to  the  PMML   package  
  • 3. QuanEtaEve  Structure  AcEvity   RelaEonships  
  • 4. Why  is  QSAR  Useful?   •  Lets  us  predict  whether  a  chemical  is  likely  to   be  toxic,  avoiding  animal  tesEng   •  PrioriEze  molecules  from  a  high  throughput   screen  of  300K  molecules   •  Predict  whether  a  molecule  will  be  (sufficiently)   soluble  in  water   •  IdenEfy  molecules  with  anE-­‐malarial  properEes   •  Accurate,  predic-ve  models  can  save   significant  -me  and  money  (and  cute  bunnies)  
  • 5. Lots  and  Lots  of  Models   •  Hundreds  of  such  models  published  in  the   literature   –  Usually  in  the  form  of  tables  of  regression   coefficients  (if  we’re  lucky)   –  If  the  paper  describes  an  SVM  model,  no  chance   of  reproducing  the  results   •  How  can  we  exchange  QSAR  models?  
  • 6. QSAR  Model  Exchange   •  Build  models  in  ….,     •  Save  them  in  PMML   •  Distribute   •  …   •  Profit?   –  Not  always     The  bo0leneck  is  evalua:ng  descriptors  for  the   new  observa:ons  to  supply  to  the  model  
  • 7. CheminformaEcs  in  R   •  rcdk  provides  cheminformaEcs  support  in  R   –  Load  and  parse  molecular  file  formats   –  Evaluate  numerical  descriptors  from  chemical   structures   rcdk CDK Jmol rpubchem rJava fingerprint XML R Programming Environment
  • 8. CheminformaEcs  in  R   library(pmml)! library(rcdk)! data(bpdata)! mols <- parse.smiles(bpdata[, 1])! descNames <- unique(unlist(sapply('topological', ! get.desc.names)))! descs <- eval.desc(mols, descNames)! model <- lm(BP ~ khs.sCH3 + khs.sF + TopoPSA + VABC, data.frame(bpdata,descs))! pmml(model)!
  • 9. R,  rcdk,  PMML   •  rcdk  provides  the  means  to  take  in  molecules   and  output  a  PMML  encoded  model   •  One  could  record  appropriate  funcEons/classes   in  the  document  and  use  that  info  to  evaluate   descriptor  for  new  observaEons   •  Since  rcdk  is  based  on  the  Java  CDK  library,   could  also  use  jpmml,  a  Java  API  for  PMML   documents