SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Open-source from/in the enterprise: the RDKit
Gregory Landrum
NIBR Informatics
Novartis Institutes for BioMedical Research, Basel, Switzerland
Outline
§  What is the RDKit?
§  RDKit integration with other open-source projects
•  Knime
•  PostgreSQL
•  IPython
•  Pandas
•  Lucene
§  RDKit in NIBR, some case studies
RDKit: What is it?
§  Open-source C++ toolkit for cheminformatics
§  Wrappers for Python (2.x), Java, C#
§  Functionality:
•  2D and 3D molecular operations
•  Descriptor generation for machine learning
•  PostgreSQL database cartridge for substructure and similarity searching
•  Knime nodes
•  IPython integration
•  Lucene integration (experimental)
•  Supports Mac/Windows/Linux
§  Releases every 6 months
§  business-friendly BSD license
§  Code: https://github.com/rdkit
§  http://www.rdkit.org
The community
§  Mailing lists hosted at sourceforge: https://sourceforge.net/p/rdkit/
mailman/
§  Active participants from academia, small and large pharma, software
companies, and service providers
§  30+ attendees at each of the two user group meetings
Some features
§  Input/Output: SMILES/SMARTS, SDF, TDT, PDB,
SLN [1], Corina mol2 [1]
§  “Cheminformatics”:
•  Substructure searching
•  Canonical SMILES
•  Chirality support (i.e. R/S or E/Z labeling)
•  Chemical transformations (e.g. remove matching
substructures)
•  Chemical reactions
§  2D depiction, including constrained depiction
§  2D->3D conversion/conformational analysis via
distance geometry
§  UFF and MMFF94 implementation for cleaning up
structures
§  Fingerprinting: Daylight-like, atom pairs, topological
torsions, Morgan algorithm, “MACCS keys”, etc.
§  Similarity/diversity picking
§  2D pharmacophores [1]
§  Gasteiger-Marsili charges
§  Hierarchical subgraph/fragment analysis
§  Bemis and Murcko scaffold determination
§  RECAP and BRICS implementations
§  Multi-molecule maximum common substructure
§  Feature maps
§  Shape-based similarity
§  Fraggle similarity (from GSK)
§  Molecule-molecule alignment
§  Open3DAlign implementation
§  Integration with PyMOL for 3D visualization
§  Functional group filtering
§  Salt stripping
§  Molecular descriptor library:
Topological (κ3, Balaban J, etc.), Compositional (Number
of Rings, Number of Aromatic Heterocycles, etc.),
EState, SlogP/SMR (Wildman and Crippen approach),
“MOE like” VSA descriptors, Feature-map vectors
§  Machine Learning:
•  Clustering (hierarchical)
•  Information theory (Shannon entropy, information
gain, etc.)
§  Tight integration with the IPython notebook and
pandas
§  Integration with the InChI library
[1] These implementations are functional but are not necessarily
the best, fastest, or most complete.
The contrib dir
§  LEF (Anna Vulpetti, NIBR): Local Environment of Fluorine
§  PBF (Nicholas Firth, ICR): Plane of best fit descriptor
§  SA_Score (Peter Ertl, NIBR): synthetic-accessibility score
§  fraggle (Jameed Hussain, GSK): fragment-based similarity
§  mmpa (Jameed Hussain, GSK): molecular matched pairs
§  pzc (Paul Czodrowski, Merck KGaA): tools for building and validating
classifiers
§  ConformerParser (Sereina Riniker, NIBR): parser for Amber trajectory
files
C++ :
Core data structures and algorithms
Postgre
SQL
Java
SWIG
Python
Boost.Python
Knime
What is this all about?
script
inter-
active
Exact same algorithms/implementations accessible from
many different endpoints
C#
App
Knime integration
§  Open-source RDKit-based nodes for Knime providing cheminformatics
functionality
+
§  Trusted nodes distributed from
knime community site
§  Work in progress: more nodes being
added (new wizard makes it easy)
What’s there?
+
RDKit Interactive Table
§  KNIME interactive table with molecules as column headers
+
+
Functionality for working with 3D molecules
§  Example: flexible molecule-molecule alignment
PostgreSQL integration
§  PostgreSQL (http://www.postgresql.org): a robust, flexible, and
extensible relational open-source database. Rich collection of
extensions available
§  RDKit “cartridge”:
•  Fast substructure and similarity search
•  Fingerprints (count-based and bit-vector):
Morgan (ECFP-like), FeatMorgan (FCFP-like), RDKit (Daylight like), atom pair,
topological torsion, MACCS
•  Standard molecule properties and descriptors
§  Basis for myChEMBL (http://chembl.blogspot.co.uk/2013/10/chembl-
virtual-machine-aka-mychembl.html) Ochoa, R., Davies, M., Papadatos, G.,
Atkinson, F., & Overington, J. P. (2014). myChEMBL: a virtual machine implementation of
open data and cheminformatics tools. Bioinformatics, 30(2), 298–300.
+
PostgreSQL integration
Substructure search
+
chembl_17=# select molregno,m from rdk.mols where
m@>'c1ccc2c(c1)C(=NN(C2=O)Cc3nc4cc(ccc4s3)C)CC(=O)O';!
molregno | m !
----------+---------------------------------------------------------------!
7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12!
23364 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(C(F)(F)F)c3s2)c(=O)c2ccccc12!
23439 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(Cl)c3s2)c(=O)c2ccccc12!
23462 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(F)c3s2)c(=O)c2ccccc12!
24192 | Cc1cc2nc(Cn3nc(CC(=O)O)c4ccccc4c3=O)sc2c(C)c1!
24190 | COc1cc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2cc1C(F)(F)F!
24194 | Cc1ccc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2c1!
24237 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)c(O)cc3s2)c(=O)c2ccccc12!
24331 | CC(c1nc2cc(C(F)(F)F)ccc2s1)n1nc(CC(=O)O)c2ccccc2c1=O!
(9 rows)!
!
Time: 112.325 ms!
PostgreSQL integration
Similarity search
+
chembl_17=# select * from get_mfp2_neighbors('O=C(O)Cc1nn(Cc2nc3cc(C(F)
(F)F)ccc3s2)c(=O)c2ccccc12') limit 5;!
molregno | m | similarity !
----------+------------------------------------------------------+-------------------!
7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 1!
24184 | O=C(O)Cc1nn(Cc2nc3ccc(C(F)(F)F)cc3s2)c(=O)c2ccccc12 | 0.859649122807018!
24153 | O=C(O)Cc1nn(CCc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 0.830508474576271!
24152 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2cc(C(F)(F)F)ccc12 | 0.813559322033898!
24150 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2ccc(C(F)(F)F)cc12 | 0.813559322033898!
(5 rows)!
!
Time: 1222.426 ms!
!
!
Notice that results come back in sorted order
PostgreSQL integration
Other functionality
+
chembl_17=# select mol_formula('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');!
mol_formula !
---------------!
C19H12F3N3O3S!
(1 row)!
chembl_17=# select mol_logp('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');!
mol_logp !
----------!
3.7004!
(1 row)!
chembl_17=# select mol_inchi('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');
mol_inchi !
------------------------------------------------------------------------------------------
-----------------------------------------------!
InChI=1S/C19H12F3N3O3S/
c20-19(21,22)10-5-6-15-14(7-10)23-16(29-15)9-25-18(28)12-4-2-1-3-11(12)13(24-25)8-17(26)27
/h1-7H,8-9H2,(H,26,27)!
(1 row)!
!
!
!
PostgreSQL integration
Other functionality
+
chembl_17=# select mol_to_ctab('CC'::mol);!
mol_to_ctab !
-----------------------------------------------------------------------!
+!
RDKit 2D +!
+!
2 1 0 0 0 0 0 0 0 0999 V2000 +!
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+!
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+!
1 2 1 0 +!
M END +!
!
(1 row)!
!
!
!
IPython notebok integration
§  IPython: a very powerful interactive shell for python
http://www.ipython.org
§  IPython notebook: IPython in the browser, with graphics
•  combines code and output in one place
•  great tool for reproducible research
•  Example notebook with graphics.
§  RDKit integration:
•  Display molecules, substructure matches, reactions, graphics from PyMOL
+
IPython notebook integration:
Molecule tables
http://rdkit.blogspot.ch/2014/02/more-on-datasets-ii.html
+
IPython notebook integration:
Similarity Maps
+
Riniker, S. & Landrum, G. A. J Cheminf (2013). http://www.jcheminf.com/content/5/1/43
IPython notebook integration:
PyMol
http://rdkit.blogspot.ch/2013/12/using-allchemconstrainedembed.html
+
Pandas integration
§  Pandas: library for working with data tables in Python. Integrates well
with matplotlib and ipython
http://pandas.pydata.org/
§  RDKit integration:
•  Load smiles tables or SD files into Pandas data tables
•  Adds molecule columns to existing tables with smiles/SD columns
•  Enables substructure filters on tables
•  Integration with IPython notebook to render molecules
+
Pandas integration
+
http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Tutorials/pandastools/Pandas_RDKit_UGM.ipynb
Substructure filters
Molecules in tables
Lucene integration
§  Still in the experimental stage
§  Adds substructure search functionality with fingerprint screenout to
Lucene
§  Includes demo app for testing
+
RDKit in NIBR
§  Extensive use by CADD, informaticians, and IT
§  Lots of convenience code/wrappers for accessing internal data sources
and tools
§  Combined with the Avalon toolkit (another NIBR-supported open-
source project), provides the underpinning for many of our global
chemistry-based applications
+
The Avalon toolkit
§  C/Java cheminformatics toolkit
§  Primary author: Bernd Rohde (NIBRIT Basel)
§  http://sourceforge.net/projects/avalontoolkit/
§  Functionality:
•  Canonical SMILES
•  Avalon fingerprint (highly optimized substructure fingerprint)
•  Molecular standardization (STRUCHK)
•  2D Coordinate generation
•  Tomcat webapp for 2D rendering
§  The RDKit has (optional) Python bindings for much of the functionality
+
RDKit in NIBR
Case study 1: CIx Framework
§  “Service bus” for cheminformatics/CADD services
§  Handles format conversions for input/output automatically
i.e. callers can provide SMILES input to a service/model wants CTABs with 3D
coordinates
§  Supports versioning of models/services
§  Tight integration with scientific tools (e.g. Tibco Spotfire, Knime, Instant
JChem, etc.)
§  Enables trivial addition of “chemical intelligence” to web apps
§  Makes it easy to globally deploy models: once a new model/service (or
new version of a model/service) is registered with the Framework, it is
instantly globally accessible
+
CIx Framework architecture
Translation service
- molecule format conversion
- name lookup
XML File exchange
between engine and the
Models
Database to store
Model information
Model registration and
Request service
Web Model
Registration
Portal Front
end
Cix Tools Framework:
Cix Tools Web
Service
-SOAP
-REST
Model
Script
Model
Model
Script
Model
Model
Script
Model
Model
Script
Model
CIX Tools Engine
Data
In one of the following
formats:
- TSV/CSV File
- SMILES/CPD_NO
- SD-File
- DART query
XML File exchange
between engine and the
Translation service
Get the Model info from the Database
Client
- web app
-  KNIME
-  Spotfire
-  IJC
-  Python
Java/Tomcat
Python/Django
Geographically diverse servers
Most models are Python/Django
+
RDKit in NIBR
Case study 2: Small-Molecule Registration
§  Internally developed web application for compound registration
§  C#-based web services writing to Oracle
§  RDKit + Avalon toolkit for structure standardization
§  RDKit + InChI used for structure-key calculation
§  Calls out to CIx Framework for standard computed properties
§  Independent (but validated) Python implementation of standardization
and structure-key calculation for standalone use
+
RDKit in NIBR
Case study 3: QSAR Toolkit
§  Descriptor calculator providing access to all available internal
descriptors
§  Tools for pulling assay data from our data warehouse
§  Standardized model-building
§  Standardized reporting for evaluation and peer review
§  Packaging for deployment via CIx Framework
§  Model Watchdog:
Pulls most recent data, generates predictions, creates report showing evolution
of model accuracy over time
+
RDKit in NIBR
Case study 4: Similarity Server
§  Central PostgreSQL database with easily available compounds
•  in-house available
•  available from reliable vendors
§  Kept up-to-date
§  Substructure search
§  Similarity search with various fingerprints:
•  Avalon
•  Morgan2, Morgan3, FeatMorgan2
•  Atom Pairs, Topological Torsions
§  Web services interface
§  Available to chemists via one of their standard desktop tools
+
NIBR Open Source
Something new
Acknowledgements
§  General:
•  Remy Evard (NIBR/Informatics)
•  Richard Lewis (NIBR/GDC)
•  Tom Digby (NIBR/Legal)
•  Peter Gedeck (NIBR/GDC)
•  Nik Stiefl (NIBR/GDC)
§  RDKit Community
•  Roger Sayle (NextMove): PDB Parser
•  Andrew Dalke (Dalke Scientific): FMCS
•  Paolo Tosco (University of Turin):
MMFF94, Open3DAlign
•  Jameed Hussain (GSK): Fraggle,
mmpa
§  Pandas, scikit-learn:
•  Sereina Riniker (NIBR/Informatics)
•  Nikolas Fechner (NIBR/Informatics)
http://www.rdkit.org
§  Knime:
•  Manuel Schwarze (NIBR/Informatics)
•  Thorsten Meinl (knime.com)
•  Bernd Wiswedel (knime.com)
§  SMR
•  Thomas Mueller (NIBR/Informatics)
•  Thomas Veith (NIBR/Informatics)
•  Dave Cotter (NIBR/Informatics)
§  QSAR Toolkit:
•  Peter Gedeck (NIBR/GDC)
•  Nikolas Fechner (NIBR/Informatics)
§  CIx Framework
•  Sandra Mueller (NIBR/Informatics)
•  Joerg Muehlbacher (NIBR/CPC)
•  Riccardo Vianello (NIBR/Informatics)
§  NIBR Open Source
•  Ken Robbins (NIBR/Informatics)
•  Dennis Jen (NIBR/Informatics)
•  Mark Schreiber (NIBR/Informatics)
Advertising
33
3rd RDKit User Group Meeting
22-24 October 2014
Merck KGaA, Darmstadt, Germany
Talks, “talktorials”, lightning talks, social activities, and a hackathon on
the 24th.
Registration: http://goo.gl/z6QzwD
Full announcement: http://goo.gl/ZUm2wm
We’re looking for speakers. Please contact greg.landrum@gmail.com

Mais conteúdo relacionado

Mais procurados

Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsVikram Aditya
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptorsRAJAN ROLTA
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminarHaitham Hijazi
 
Cheminformatics
CheminformaticsCheminformatics
CheminformaticsVin Anto
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformaticsbaoilleach
 
Drug Discovery Today: Fighting TB with Technology
Drug Discovery Today: Fighting TB with TechnologyDrug Discovery Today: Fighting TB with Technology
Drug Discovery Today: Fighting TB with Technologyrendevilla
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modelingpkchoudhury
 
Ligbuilder V2: overview and tutorial.
Ligbuilder V2: overview and tutorial.Ligbuilder V2: overview and tutorial.
Ligbuilder V2: overview and tutorial.Ashish Pratim Mahanta
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand dockingbaoilleach
 
Cheminformatics by kk sahu
Cheminformatics by kk sahuCheminformatics by kk sahu
Cheminformatics by kk sahuKAUSHAL SAHU
 
Molecular Docking Using Autodock
Molecular Docking Using AutodockMolecular Docking Using Autodock
Molecular Docking Using AutodockSapan Shah
 
Programming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyProgramming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyDr. Jayarama Reddy
 
chemoinformatics ppt 2.pptx
chemoinformatics ppt 2.pptxchemoinformatics ppt 2.pptx
chemoinformatics ppt 2.pptxwadhava gurumeet
 
Molecular docking and_virtual_screening
Molecular docking and_virtual_screeningMolecular docking and_virtual_screening
Molecular docking and_virtual_screeningFlorent Barbault
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureDevakumar Jain
 
Structure Based Drug Design
Structure Based Drug DesignStructure Based Drug Design
Structure Based Drug Designnmicaelo
 

Mais procurados (20)

Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock Tools
 
Lecture 9 molecular descriptors
Lecture 9  molecular descriptorsLecture 9  molecular descriptors
Lecture 9 molecular descriptors
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminar
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Drug Discovery Today: Fighting TB with Technology
Drug Discovery Today: Fighting TB with TechnologyDrug Discovery Today: Fighting TB with Technology
Drug Discovery Today: Fighting TB with Technology
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
Organic electrochemistry applications
Organic electrochemistry applicationsOrganic electrochemistry applications
Organic electrochemistry applications
 
Ligbuilder V2: overview and tutorial.
Ligbuilder V2: overview and tutorial.Ligbuilder V2: overview and tutorial.
Ligbuilder V2: overview and tutorial.
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Spirocompounds
SpirocompoundsSpirocompounds
Spirocompounds
 
Cheminformatics by kk sahu
Cheminformatics by kk sahuCheminformatics by kk sahu
Cheminformatics by kk sahu
 
Molecular Docking Using Autodock
Molecular Docking Using AutodockMolecular Docking Using Autodock
Molecular Docking Using Autodock
 
Programming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyProgramming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddy
 
chemoinformatics ppt 2.pptx
chemoinformatics ppt 2.pptxchemoinformatics ppt 2.pptx
chemoinformatics ppt 2.pptx
 
Molecular modelling
Molecular modelling Molecular modelling
Molecular modelling
 
Molecular docking and_virtual_screening
Molecular docking and_virtual_screeningMolecular docking and_virtual_screening
Molecular docking and_virtual_screening
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 
Cheminformatics-1.ppt
Cheminformatics-1.pptCheminformatics-1.ppt
Cheminformatics-1.ppt
 
Structure Based Drug Design
Structure Based Drug DesignStructure Based Drug Design
Structure Based Drug Design
 

Destaque

1st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 20161st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 2016maditabalnco
 
The Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionThe Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionCGS
 
Automotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessAutomotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessGreg Gifford
 
Quand lecture rime avec plaisir
Quand lecture rime avec plaisirQuand lecture rime avec plaisir
Quand lecture rime avec plaisirSoumia EL Yaacoubi
 
Strategy Instruction in writing
Strategy Instruction in writingStrategy Instruction in writing
Strategy Instruction in writingmystiquemel
 
Ppt eng y4
Ppt eng y4Ppt eng y4
Ppt eng y4azura272
 
Zentangle Animals
Zentangle AnimalsZentangle Animals
Zentangle Animalsquicarroll
 
Musicas cifradas mpb 5
Musicas cifradas mpb 5Musicas cifradas mpb 5
Musicas cifradas mpb 5Nome Sobrenome
 
(Nunca) perder la esperanza.
(Nunca) perder la esperanza.(Nunca) perder la esperanza.
(Nunca) perder la esperanza.José María
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 
Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Heriyanto Asep
 

Destaque (16)

BEDP II
BEDP IIBEDP II
BEDP II
 
1st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 20161st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 2016
 
The Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionThe Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee Retention
 
Automotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessAutomotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your Business
 
Quand lecture rime avec plaisir
Quand lecture rime avec plaisirQuand lecture rime avec plaisir
Quand lecture rime avec plaisir
 
Strategy Instruction in writing
Strategy Instruction in writingStrategy Instruction in writing
Strategy Instruction in writing
 
Ppt eng y4
Ppt eng y4Ppt eng y4
Ppt eng y4
 
P7 e2 josemariabarrio
P7 e2 josemariabarrio P7 e2 josemariabarrio
P7 e2 josemariabarrio
 
538df1cdf0b7f
538df1cdf0b7f538df1cdf0b7f
538df1cdf0b7f
 
Zentangle Animals
Zentangle AnimalsZentangle Animals
Zentangle Animals
 
Musicas cifradas mpb 5
Musicas cifradas mpb 5Musicas cifradas mpb 5
Musicas cifradas mpb 5
 
(Nunca) perder la esperanza.
(Nunca) perder la esperanza.(Nunca) perder la esperanza.
(Nunca) perder la esperanza.
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)
 
SPA: Key Questions
SPA: Key QuestionsSPA: Key Questions
SPA: Key Questions
 
New IBM Mainframe 2016 - Z13
New IBM Mainframe 2016 - Z13 New IBM Mainframe 2016 - Z13
New IBM Mainframe 2016 - Z13
 

Semelhante a Open-source from/in the enterprise: the RDKit

A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Research Data Alliance
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscoverygwprice
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...Keiichiro Ono
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

Semelhante a Open-source from/in the enterprise: the RDKit (20)

A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
From Laboratory to e-Laboratory
From Laboratory to e-LaboratoryFrom Laboratory to e-Laboratory
From Laboratory to e-Laboratory
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscovery
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Bosco r users2013
Bosco r users2013Bosco r users2013
Bosco r users2013
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

Mais de Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registrationGreg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Greg Landrum
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 

Mais de Greg Landrum (18)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 

Último (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 

Open-source from/in the enterprise: the RDKit

  • 1. Open-source from/in the enterprise: the RDKit Gregory Landrum NIBR Informatics Novartis Institutes for BioMedical Research, Basel, Switzerland
  • 2. Outline §  What is the RDKit? §  RDKit integration with other open-source projects •  Knime •  PostgreSQL •  IPython •  Pandas •  Lucene §  RDKit in NIBR, some case studies
  • 3. RDKit: What is it? §  Open-source C++ toolkit for cheminformatics §  Wrappers for Python (2.x), Java, C# §  Functionality: •  2D and 3D molecular operations •  Descriptor generation for machine learning •  PostgreSQL database cartridge for substructure and similarity searching •  Knime nodes •  IPython integration •  Lucene integration (experimental) •  Supports Mac/Windows/Linux §  Releases every 6 months §  business-friendly BSD license §  Code: https://github.com/rdkit §  http://www.rdkit.org
  • 4. The community §  Mailing lists hosted at sourceforge: https://sourceforge.net/p/rdkit/ mailman/ §  Active participants from academia, small and large pharma, software companies, and service providers §  30+ attendees at each of the two user group meetings
  • 5. Some features §  Input/Output: SMILES/SMARTS, SDF, TDT, PDB, SLN [1], Corina mol2 [1] §  “Cheminformatics”: •  Substructure searching •  Canonical SMILES •  Chirality support (i.e. R/S or E/Z labeling) •  Chemical transformations (e.g. remove matching substructures) •  Chemical reactions §  2D depiction, including constrained depiction §  2D->3D conversion/conformational analysis via distance geometry §  UFF and MMFF94 implementation for cleaning up structures §  Fingerprinting: Daylight-like, atom pairs, topological torsions, Morgan algorithm, “MACCS keys”, etc. §  Similarity/diversity picking §  2D pharmacophores [1] §  Gasteiger-Marsili charges §  Hierarchical subgraph/fragment analysis §  Bemis and Murcko scaffold determination §  RECAP and BRICS implementations §  Multi-molecule maximum common substructure §  Feature maps §  Shape-based similarity §  Fraggle similarity (from GSK) §  Molecule-molecule alignment §  Open3DAlign implementation §  Integration with PyMOL for 3D visualization §  Functional group filtering §  Salt stripping §  Molecular descriptor library: Topological (κ3, Balaban J, etc.), Compositional (Number of Rings, Number of Aromatic Heterocycles, etc.), EState, SlogP/SMR (Wildman and Crippen approach), “MOE like” VSA descriptors, Feature-map vectors §  Machine Learning: •  Clustering (hierarchical) •  Information theory (Shannon entropy, information gain, etc.) §  Tight integration with the IPython notebook and pandas §  Integration with the InChI library [1] These implementations are functional but are not necessarily the best, fastest, or most complete.
  • 6. The contrib dir §  LEF (Anna Vulpetti, NIBR): Local Environment of Fluorine §  PBF (Nicholas Firth, ICR): Plane of best fit descriptor §  SA_Score (Peter Ertl, NIBR): synthetic-accessibility score §  fraggle (Jameed Hussain, GSK): fragment-based similarity §  mmpa (Jameed Hussain, GSK): molecular matched pairs §  pzc (Paul Czodrowski, Merck KGaA): tools for building and validating classifiers §  ConformerParser (Sereina Riniker, NIBR): parser for Amber trajectory files
  • 7. C++ : Core data structures and algorithms Postgre SQL Java SWIG Python Boost.Python Knime What is this all about? script inter- active Exact same algorithms/implementations accessible from many different endpoints C# App
  • 8. Knime integration §  Open-source RDKit-based nodes for Knime providing cheminformatics functionality + §  Trusted nodes distributed from knime community site §  Work in progress: more nodes being added (new wizard makes it easy)
  • 10. RDKit Interactive Table §  KNIME interactive table with molecules as column headers +
  • 11. + Functionality for working with 3D molecules §  Example: flexible molecule-molecule alignment
  • 12. PostgreSQL integration §  PostgreSQL (http://www.postgresql.org): a robust, flexible, and extensible relational open-source database. Rich collection of extensions available §  RDKit “cartridge”: •  Fast substructure and similarity search •  Fingerprints (count-based and bit-vector): Morgan (ECFP-like), FeatMorgan (FCFP-like), RDKit (Daylight like), atom pair, topological torsion, MACCS •  Standard molecule properties and descriptors §  Basis for myChEMBL (http://chembl.blogspot.co.uk/2013/10/chembl- virtual-machine-aka-mychembl.html) Ochoa, R., Davies, M., Papadatos, G., Atkinson, F., & Overington, J. P. (2014). myChEMBL: a virtual machine implementation of open data and cheminformatics tools. Bioinformatics, 30(2), 298–300. +
  • 13. PostgreSQL integration Substructure search + chembl_17=# select molregno,m from rdk.mols where m@>'c1ccc2c(c1)C(=NN(C2=O)Cc3nc4cc(ccc4s3)C)CC(=O)O';! molregno | m ! ----------+---------------------------------------------------------------! 7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12! 23364 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(C(F)(F)F)c3s2)c(=O)c2ccccc12! 23439 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(Cl)c3s2)c(=O)c2ccccc12! 23462 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(F)c3s2)c(=O)c2ccccc12! 24192 | Cc1cc2nc(Cn3nc(CC(=O)O)c4ccccc4c3=O)sc2c(C)c1! 24190 | COc1cc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2cc1C(F)(F)F! 24194 | Cc1ccc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2c1! 24237 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)c(O)cc3s2)c(=O)c2ccccc12! 24331 | CC(c1nc2cc(C(F)(F)F)ccc2s1)n1nc(CC(=O)O)c2ccccc2c1=O! (9 rows)! ! Time: 112.325 ms!
  • 14. PostgreSQL integration Similarity search + chembl_17=# select * from get_mfp2_neighbors('O=C(O)Cc1nn(Cc2nc3cc(C(F) (F)F)ccc3s2)c(=O)c2ccccc12') limit 5;! molregno | m | similarity ! ----------+------------------------------------------------------+-------------------! 7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 1! 24184 | O=C(O)Cc1nn(Cc2nc3ccc(C(F)(F)F)cc3s2)c(=O)c2ccccc12 | 0.859649122807018! 24153 | O=C(O)Cc1nn(CCc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 0.830508474576271! 24152 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2cc(C(F)(F)F)ccc12 | 0.813559322033898! 24150 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2ccc(C(F)(F)F)cc12 | 0.813559322033898! (5 rows)! ! Time: 1222.426 ms! ! ! Notice that results come back in sorted order
  • 15. PostgreSQL integration Other functionality + chembl_17=# select mol_formula('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');! mol_formula ! ---------------! C19H12F3N3O3S! (1 row)! chembl_17=# select mol_logp('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');! mol_logp ! ----------! 3.7004! (1 row)! chembl_17=# select mol_inchi('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12'); mol_inchi ! ------------------------------------------------------------------------------------------ -----------------------------------------------! InChI=1S/C19H12F3N3O3S/ c20-19(21,22)10-5-6-15-14(7-10)23-16(29-15)9-25-18(28)12-4-2-1-3-11(12)13(24-25)8-17(26)27 /h1-7H,8-9H2,(H,26,27)! (1 row)! ! ! !
  • 16. PostgreSQL integration Other functionality + chembl_17=# select mol_to_ctab('CC'::mol);! mol_to_ctab ! -----------------------------------------------------------------------! +! RDKit 2D +! +! 2 1 0 0 0 0 0 0 0 0999 V2000 +! 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+! 1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+! 1 2 1 0 +! M END +! ! (1 row)! ! ! !
  • 17. IPython notebok integration §  IPython: a very powerful interactive shell for python http://www.ipython.org §  IPython notebook: IPython in the browser, with graphics •  combines code and output in one place •  great tool for reproducible research •  Example notebook with graphics. §  RDKit integration: •  Display molecules, substructure matches, reactions, graphics from PyMOL +
  • 18. IPython notebook integration: Molecule tables http://rdkit.blogspot.ch/2014/02/more-on-datasets-ii.html +
  • 19. IPython notebook integration: Similarity Maps + Riniker, S. & Landrum, G. A. J Cheminf (2013). http://www.jcheminf.com/content/5/1/43
  • 21. Pandas integration §  Pandas: library for working with data tables in Python. Integrates well with matplotlib and ipython http://pandas.pydata.org/ §  RDKit integration: •  Load smiles tables or SD files into Pandas data tables •  Adds molecule columns to existing tables with smiles/SD columns •  Enables substructure filters on tables •  Integration with IPython notebook to render molecules +
  • 23. Lucene integration §  Still in the experimental stage §  Adds substructure search functionality with fingerprint screenout to Lucene §  Includes demo app for testing +
  • 24. RDKit in NIBR §  Extensive use by CADD, informaticians, and IT §  Lots of convenience code/wrappers for accessing internal data sources and tools §  Combined with the Avalon toolkit (another NIBR-supported open- source project), provides the underpinning for many of our global chemistry-based applications +
  • 25. The Avalon toolkit §  C/Java cheminformatics toolkit §  Primary author: Bernd Rohde (NIBRIT Basel) §  http://sourceforge.net/projects/avalontoolkit/ §  Functionality: •  Canonical SMILES •  Avalon fingerprint (highly optimized substructure fingerprint) •  Molecular standardization (STRUCHK) •  2D Coordinate generation •  Tomcat webapp for 2D rendering §  The RDKit has (optional) Python bindings for much of the functionality +
  • 26. RDKit in NIBR Case study 1: CIx Framework §  “Service bus” for cheminformatics/CADD services §  Handles format conversions for input/output automatically i.e. callers can provide SMILES input to a service/model wants CTABs with 3D coordinates §  Supports versioning of models/services §  Tight integration with scientific tools (e.g. Tibco Spotfire, Knime, Instant JChem, etc.) §  Enables trivial addition of “chemical intelligence” to web apps §  Makes it easy to globally deploy models: once a new model/service (or new version of a model/service) is registered with the Framework, it is instantly globally accessible +
  • 27. CIx Framework architecture Translation service - molecule format conversion - name lookup XML File exchange between engine and the Models Database to store Model information Model registration and Request service Web Model Registration Portal Front end Cix Tools Framework: Cix Tools Web Service -SOAP -REST Model Script Model Model Script Model Model Script Model Model Script Model CIX Tools Engine Data In one of the following formats: - TSV/CSV File - SMILES/CPD_NO - SD-File - DART query XML File exchange between engine and the Translation service Get the Model info from the Database Client - web app -  KNIME -  Spotfire -  IJC -  Python Java/Tomcat Python/Django Geographically diverse servers Most models are Python/Django +
  • 28. RDKit in NIBR Case study 2: Small-Molecule Registration §  Internally developed web application for compound registration §  C#-based web services writing to Oracle §  RDKit + Avalon toolkit for structure standardization §  RDKit + InChI used for structure-key calculation §  Calls out to CIx Framework for standard computed properties §  Independent (but validated) Python implementation of standardization and structure-key calculation for standalone use +
  • 29. RDKit in NIBR Case study 3: QSAR Toolkit §  Descriptor calculator providing access to all available internal descriptors §  Tools for pulling assay data from our data warehouse §  Standardized model-building §  Standardized reporting for evaluation and peer review §  Packaging for deployment via CIx Framework §  Model Watchdog: Pulls most recent data, generates predictions, creates report showing evolution of model accuracy over time +
  • 30. RDKit in NIBR Case study 4: Similarity Server §  Central PostgreSQL database with easily available compounds •  in-house available •  available from reliable vendors §  Kept up-to-date §  Substructure search §  Similarity search with various fingerprints: •  Avalon •  Morgan2, Morgan3, FeatMorgan2 •  Atom Pairs, Topological Torsions §  Web services interface §  Available to chemists via one of their standard desktop tools +
  • 32. Acknowledgements §  General: •  Remy Evard (NIBR/Informatics) •  Richard Lewis (NIBR/GDC) •  Tom Digby (NIBR/Legal) •  Peter Gedeck (NIBR/GDC) •  Nik Stiefl (NIBR/GDC) §  RDKit Community •  Roger Sayle (NextMove): PDB Parser •  Andrew Dalke (Dalke Scientific): FMCS •  Paolo Tosco (University of Turin): MMFF94, Open3DAlign •  Jameed Hussain (GSK): Fraggle, mmpa §  Pandas, scikit-learn: •  Sereina Riniker (NIBR/Informatics) •  Nikolas Fechner (NIBR/Informatics) http://www.rdkit.org §  Knime: •  Manuel Schwarze (NIBR/Informatics) •  Thorsten Meinl (knime.com) •  Bernd Wiswedel (knime.com) §  SMR •  Thomas Mueller (NIBR/Informatics) •  Thomas Veith (NIBR/Informatics) •  Dave Cotter (NIBR/Informatics) §  QSAR Toolkit: •  Peter Gedeck (NIBR/GDC) •  Nikolas Fechner (NIBR/Informatics) §  CIx Framework •  Sandra Mueller (NIBR/Informatics) •  Joerg Muehlbacher (NIBR/CPC) •  Riccardo Vianello (NIBR/Informatics) §  NIBR Open Source •  Ken Robbins (NIBR/Informatics) •  Dennis Jen (NIBR/Informatics) •  Mark Schreiber (NIBR/Informatics)
  • 33. Advertising 33 3rd RDKit User Group Meeting 22-24 October 2014 Merck KGaA, Darmstadt, Germany Talks, “talktorials”, lightning talks, social activities, and a hackathon on the 24th. Registration: http://goo.gl/z6QzwD Full announcement: http://goo.gl/ZUm2wm We’re looking for speakers. Please contact greg.landrum@gmail.com