SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Extraction, Analysis, Atom Mapping,
Classification and Naming of
Reactions from Pharmaceutical ELNs
Roger Sayle, Daniel Lowe, Noel O’Boyle
NextMove Software, Cambridge, UK
Michael Kappler, Hoffmann-La Roche, Nutley, NJ, USA
Nick Tomkinson, AstraZeneca, Alderley Park, UK
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
introduction
• Pharmaceutical ELNs contain a wealth of
synthetic chemistry knowledge, particularly on
failed reactions, often not described in the
literature or available from public sources.
• This presentation describes several of the
technical informatics challenges encountered
during the process of exploiting reaction
information from ELNs.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
motivation #1: economics
• The primary motivation for reaction informatics is
reducing cost of goods, through higher yields, fewer
failed reactions and more direct synthetic routes.
Yuta Fujiwara et al., “Practical and innate carbon-hydrogen functionalization of heterocycles”,
Nature Vol. 492, pp. 95-99, 6th December 2012.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
motivation #2: yield prediction
• Nadine et al.[1] hypothesize that low LogP is a major
cause of synthesis failure in parallel synthesis of
combinatorial libraries.
• GSK data set from Pickett et al.[2] of 2500 Suzuki
couplings in a 50x50 library of MMP-12 inhibitors.
– 1704 compounds measured, mean logP = 3.56 (1.44)
– 566 compounds not made, mean logP = 2.83 (1.52)
– Student’s t-test for different distributions, p<2x10-22.
1. Nadine, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for
Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012.
2. Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
motivation #2: yield prediction
The clear trend between Suzuki coupling success rate
and predicted octanol-water partition co-efficient.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
97 232
340
525
474 202
63
55 119 127 141 80 36 8
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
<1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0
Reaction Product Predicted cLogP
Sucessful Reactions Failed Reactions
motivation #3: virtual libraries
• Statistics of ELN reactions may be used to redefine
RECAP-style rules for retro synthesis [1,2]
• For example, reduction of nitro groups to amines is
one of the top 5 most common ELN reactions, but
nitration reactions are amongst the rarest.
• This implies that nitro containing compounds are
purchased as reagents rather made in-house.
1. Xiao Qing Lewell, Duncan B. Judd, Stephen P. Watson and Michael M. Hann, “RECAP –
Retrosynthetic Combinatorial Analysis Procedure”, JCICS 38(3):511-522, 1998.
2. Vainio, Kogej and Raubacher, “Automated Recycling of Chemistry for Virtual Screening and
Library Design”, J. Chem. Inf. Model. 52(7):1777-1786, 2012.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
The challenges
1. Export of the data from the ELN.
2. High fidelity conversion to other file formats.
3. Reaction normalization/standardization.
4. Reaction identity (canonicalization).
5. Reaction naming and classification.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
typical eln experiment
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Database export
• Typically, ELNs are implemented as complex schemas
within relational databases (Oracle), supporting
transactions, auditing and security privileges.
• Not uncommonly the vendor provided functionality
or APIs for data export are slow and/or buggy.
• In addition to reactions and structures, there is often
a requirement to export all associated data, including
textual and numeric data, tables, even LCMS and
NMR spectra.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
typical eln fields in RD format
$DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:VALUE
$DATUM 120 &#xB0;C
$DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MINVAL
$DATUM 393.15
$DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MAXVAL
$DATUM 393.15
$DTYPE REACTION:REACTION.CONDITIONS:PRESSURE:VALUE
$DATUM 5 bar
$DTYPE REACTION:REACTANTS(1):NAME
$DATUM nicotinoyl chloride
$DTYPE REACTION:REACTANTS(1):CHEMICAL.STRUCTURE
$DATUM c1cc(cnc1)C(=O)Cl
$DTYPE REACTION:REACTANTS(1):FORMULA.MASS:VALUE
$DATUM 141.56
$DTYPE REACTION:REACTANTS(2):NAME
$DATUM (2R,3S)-3-methylpentan-2-ol
$DTYPE REACTION:REACTANTS(2):CHEMICAL.STRUCTURE
$DATUM CC[C@@H](C)[C@H](C)O
$DTYPE REACTION:REACTANTS(2):FORMULA.MASS:VALUE
$DATUM 102.17
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
$DTYPE REACTION:PRODUCTS(1):CHEMICAL.STRUCTURE
$DATUM CC[C@@H](C)[C@H](C)OC(=O)c1cccnc1
$DTYPE REACTION:PRODUCTS(1):NAME
$DATUM (2R,3S)-3-methylpentan-2-yl nicotinate
$DTYPE REACTION:PRODUCTS(1):MOLECULAR.WEIGHT:VALUE
$DATUM 207.27
$DTYPE REACTION:PRODUCTS(1):MOLECULAR.FORMULA
$DATUM C12H17NO2
$DTYPE REACTION:PRODUCTS(1):ACTUAL.MOLES:VALUE
$DATUM 2.16 mmol
$DTYPE REACTION:SOLVENTS(1):NAME
$DATUM 1-pentanol
$DTYPE REACTION:SOLVENTS(1):VOLUME:VALUE
$DATUM 5 mL
$DTYPE REACTION:SOLVENTS(1):R.VALUES
$DATUM R10,R20
File format conversion
• The source format for many reactions is typically a
sketch, in either CDX, CDXML, ISIS Sketch or Marvin
file format.
• For data processing reactions are much easier to
handle as reaction SMILES, MDL RXN or RD files and
possibly even variants of MOL and SD file formats.
• Alas handling of reaction file formats is generally
poorly handled by many cheminformatics tools.
• Additionally, reaction file formats can rarely encode
all of the same information.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
decrypting cdx & cDXML files
• CambridgeSoft are to be congratulated for publicly
documenting their CDX and CDXML file formats.
• Unfortunately this online ChemDraw developer
resource is no longer being kept up to date.
• New tags: object 0x802b encodes “annotation”.
• Mistakes: “arrow” is encoded by object 0x8021.
• Proprietary property tags: USPTO’s “PageDefinition”.
• Support for reading and writing isotopic information
in CDXML files has been contributed to Open Babel.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
agents and catalysts
How to express reaction role, i.e. molecules drawn
above and below the reaction arrow.
CSc1ccccc1(F)>OO>CS(=O)c1ccc(cc1)F
Although standard MDL RXN files only capture reactants and
products, a useful ChemAxon extension is to add a third count
for agents. Downstream tools can optionally remove or
reclassify these agents to produce strictly compliant MDL files.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Valence issues
A tricky problem is working around the different MDL
valence interpretations used by cheminformatics tools.
For example, it is not uncommon for the alchemical reaction
[Pb]>>[Au] to become reinterpreted as [PbH2]>>[Au] after
writing and re-reading from MDL formats. Such errors lose the
distinction between sodium metal and sodium hydride, resulting
in incorrect molecular formulae, or to distinguish metals from
radicals, causing problems for substructure searching.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
rich text format
• Exporting formatted text from electronic lab note
books, such as experimental/preparation write-ups,
requires converting Microsoft Rich Text Format (RTF).
• This can be translated into HTML or ASCII, and then
long lines wrapped for inclusion in SD/RD data fields.
• Special code is required for handling non-U.S.
character sets, and whilst Western European,
Russian, Chinese and Japanese were expected,
finding Arabic, Thai and Vietnamese text in major
pharmaceutical ELNs came as a surprise.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
salt/component grouping
• Keeping track of the intended number and formulae
of reactants, products and agents in a reaction,
requires preserving salt form associations.
• This is implemented by honoring the “group”
information from the sketch as single disconnected
components in MDL RXN and RD file output.
• These associations are traditionally lost in SMILES...
...>CC(=O)[O-].[Cl-].[Cl-].[Fe].[K+].[Pd+2]...>...
but can be retained via ChemAxon/GGA extensions.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
three types of superatom/Label
1. Recognized superatoms are expanded to their
explicit all atom representations.
2. Unrecognized superatoms/labels that are bonded to
a molecule are encoded as dummy asterisk atoms
where the text is preserved as an MDL atom alias.
– Support for writing MDL aliases contributed to RDKit.
3. Disconnected unrecognized labels are preserved as
supplementary data fields in SD and RD file formats.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
superatom expansion
becomes
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
text label preservation
is converted to an MDL SD file as
NextMove07171309292D
5 4 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0
1.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 -0.8660 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.5000 0.8660 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
2.5000 0.8660 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 2 0 0 0 0
2 4 1 0 0 0 0
4 5 1 0 0 0 0
A 1
Alexa555
A 5
LH-RH
M END
> <LABELS>
NMS-123456
$$$$
chemdraw superatom correction
Chemist drew Chemist Intended Chemist got
DMF
K2CO3
HBTU
mCPBA
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Reaction standardization
Chemists may draw the reaction components in
multiple ways (e.g. tautomers and protonation states)
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Reaction standardization
• An often overlooked aspect of ELNs is the need to
enforce “business rules” to consistently represent a
reaction, in the same way that normalized molecules
are stored in registration systems.
• Pharmaceutical ELNs contain structures where nitros
are represented arbitrarily, and even cases where
azide representations differ on each side of an arrow.
• Unfortunately, the rules used for molecules (such as
InChI) may be inappropriate for reactions, where
metal co-ordination and radicals play a major role.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Reaction identity
ELNs frequently contain repeated reactions, duplicates.
We need define when two reactions are the same.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Reaction identity
• When translating from the experiment-centric view
of an ELN to the reaction-centric view of a reaction
database one asks when are two reactions the same.
• A pragmatic/operational definition might be that two
experiments with identical sets of reactants and
products, but differing quantities, conditions,
catalysts and solvents are variations of each other.
• Whether a component is a reactant, catalyst, solvent
or reagent may be consistently defined by atom-
mapping; reactants contribute atoms to the product.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Reaction classification
• For searching and analysis it is often convenient to
algorithmically assign each reaction to a type, often a
named reaction such as Negishi coupling, Diels-Alder
cycloaddition, nitro reduction or chiral separation.
• This is implemented using a database of SMIRKS-like
transforms that may be pre-compiled for efficient
matching and portability across informatics toolkits.
• This approach provides both classification under the
RSC’s RXNO ontology and reaction atom mapping for
component role assignment.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
simple classifications
• Examples of simple reaction classifications
A.B>>C Regular reaction
A.B>> Failed reaction
>>C Compound purchase
A>>A Purification
A.B>>A Separation (and chiral separation)
A.B>>C.A Catalysts or unreacted reagents
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
categorization of ELN reactions
1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006.
2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
34%
17%
5%
2%
3%
6%
10%
1%
15%
2%
5% Heteroatom alkylation and arylation
Acylation and related processes
C-C bond formations
Heterocycle formation
Protections
Deprotections
Reductions
Oxidations
Functional group conversion
Functional group addition
Resolution
reaction ontology
• Reactions are classified into a common subset of the
Carey et al. classes and the RSC’s RXNO ontology.
• There are 12 super-classes
– e.g. 3 C-C bond formation (RXNO:0000002).
• These contain 84 class/categories.
– e.g. 3.5 Pd-catalyzed C-C bond formation (RXNO:0000316)
• These contain ~300 named reactions/types.
– e.g. 3.5.3 Negishi coupling (RXNO:0000088)
• These require >400 SMIRKS-like transformations.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
example smirks-like transform
• Reactions are specified as SMIRKS transformations:
[BrD1h0+0:1][#6:2].[#7X3v3+0:3][H]>>[#6:2][#7:3]
1.6.2 BROMO_N_ALKYLATION
• As demonstrated in the example above, these
patterns may operate on explicit hydrogen atoms for
brevity, but these are “compiled” via more efficient
SMARTS-like patterns for matching during naming.
• The nitrogen match becomes “[#7X3v3h>0+0:3]”.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
eTl* summary statistics
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
ELN Experiment Category Fraction
A NULL CLOBs 0.30%
B Empty sketches 0.24%
C No reaction (molecule?) 8.52%
D No reactants 0.63%
E No products 0.06%
F Regular Reactions 88.93%
G Markush Reactions 1.32%
Total 100.0%
Export success for a typical pharmaceutical ELN is
currently about 90.94% (see D+E+F+G above).
* Extract-Transform-Load (ETL), A data warehousing term.
analysis of eln reaction yields
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
analysis of eln reaction yields
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
reaction yield distributions
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
conclusions
• In an attempt to better understand and hopefully
improve the productivity of synthetic chemists,
new computational methods have been developed
to process “real world” organic reaction data.
• The fruits of this work now enable medicinal
chemists and informaticians to make greater use of
the wealth of information in their in-house ELNs.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
acknowledgements
• Anna Paola Pellicioli and Greg Landrum, Novartis,CH.
• Ethan Hoff and Manli Zheng, AbbVie, IL, USA.
• Colin Batchelor, RSC, Cambridge, UK.
• David Drake, AstraZeneca, Alderley Park, UK.
• Plamen Petrov, AstraZeneca, Molndal, SE.
• Daniel Stoffler, Hoffman-La Roche, Basel, CH.
• Pat Walters, Vertex Pharmaceuticals, MA, USA.
• Andrew Wooster, GSK, RTP, NC, USA.
• Thank you for your time.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
co-ordinate handling
• Several approaches for handling 2D co-ordinates
– Preserve original co-ordinates as drawn by chemist
– Center and/or rescale for downstream depiction tools
– Regenerate all 2D co-ordinates algorithmically
– Clean-up long short/bonds introduced by superatom
expansion, attempting to preserve original orientation.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
nested reactions
• Some ELN configurations allow more than one
reaction or experiment per lab notebook page, i.e.
multiple reaction sketches in different “tabs”.
• Two solutions to this breaking of the 1-to-1 mapping
between COLLECTION_ID and reaction include:
1. Nested Reactions: Using the MDL RD file’s ability to
embed each reaction step as data fields in a single
record.
2. Splitting Reactions: Where each reaction has its
own record, possibly duplicating shared data.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
meta data fields
• In addition to the data explicitly recorded by the
chemist and captured by the ELN, it is also often
useful to export meta-data from an ELN schema.
• This includes data fields such as experiment creation
data, creation modification date, experiment status
(open/closed), chemist name, chemist user id, etc.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
incremental updates
• An important feature for a production data
warehouse for ELN reaction data is the ability to keep
its contents valid/fresh with live data.
• The can be implemented by supporting incremental
updates that export only those creations that have
been modified or created since a given date or in a
range of dates.
• One subtlety is the handling of “closed” experiments
(i.e. those signed off by a supervisor), whose status
change date need not match the last modified date.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
reaction role assignment
• Normalized reaction roles may be assigned via
reaction atom mapping algorithms.
• Reaction components that contribute atoms to the
product are defined to be reactants, and the
remaining components as catalysts and solvents.
• Hence, c1cccc1[N+](=O)[O-].[Ni]>>c1ccccc1N
may be canonicalized as
c1ccccc1[N+](=O)[O-]>[Ni]>c1ccccc1N
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
future work
• Preserving superatom definitions as S groups in
V2000 and V3000 format Mol/RXN files.
• Enhanced stereochemistry (non-tetrahedral and axial
chiralities for catalyst optimization).
• Improved support for Marvin and ISIS sketches.
• Support for IDBS e-Workbook ELN.
6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013

Mais conteúdo relacionado

Mais procurados

Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Ken Karapetyan
 
IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...
IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...
IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...Lawrence kok
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...Lawrence kok
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...Lawrence kok
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...Lawrence kok
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...Lawrence kok
 
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivativesPreparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivativeselshimaa eid
 
Collaboration for Innovation: Enriching the Knowledge Pool
Collaboration for Innovation: Enriching the Knowledge Pool Collaboration for Innovation: Enriching the Knowledge Pool
Collaboration for Innovation: Enriching the Knowledge Pool BIOVIA
 
Exploring SAR between Patents and PubChem
Exploring SAR between Patents and PubChemExploring SAR between Patents and PubChem
Exploring SAR between Patents and PubChemChris Southan
 
One pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acid
One pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acidOne pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acid
One pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acidrkkoiri
 

Mais procurados (20)

Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
 
Ranjini ppr
Ranjini pprRanjini ppr
Ranjini ppr
 
IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...
IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...
IB Chemistry on ICT, 3D software, Avogadro, AngusLab, Swiss PDB Viewer for In...
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
 
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
IB Chemistry on ICT, 3D software, Avogadro, Jmol, Swiss PDB, Pymol for Intern...
 
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivativesPreparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
Preparation of pyrimido[4,5 b][1,6]naphthyridin-4(1 h)-one derivatives
 
UQ364438_OA
UQ364438_OAUQ364438_OA
UQ364438_OA
 
Collaboration for Innovation: Enriching the Knowledge Pool
Collaboration for Innovation: Enriching the Knowledge Pool Collaboration for Innovation: Enriching the Knowledge Pool
Collaboration for Innovation: Enriching the Knowledge Pool
 
Auto dock tutorial
Auto dock tutorialAuto dock tutorial
Auto dock tutorial
 
Exploring SAR between Patents and PubChem
Exploring SAR between Patents and PubChemExploring SAR between Patents and PubChem
Exploring SAR between Patents and PubChem
 
One pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acid
One pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acidOne pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acid
One pot synthesis of cu(ii) 2,2′ bipyridyl complexes of 5-hydroxy-hydurilic acid
 

Destaque

ХӨС семинар 10
ХӨС семинар 10ХӨС семинар 10
ХӨС семинар 10Usukhuu Galaa
 
Why FPGA
Why FPGAWhy FPGA
Why FPGAProFAX
 
How to start a blog step-by-step guide
How to start a blog   step-by-step guideHow to start a blog   step-by-step guide
How to start a blog step-by-step guideKaran Labra
 
Habitual behaviour(unit 1)
Habitual behaviour(unit 1)Habitual behaviour(unit 1)
Habitual behaviour(unit 1)Hamed Hashemian
 
Anatomy of a methodology
Anatomy of a methodologyAnatomy of a methodology
Anatomy of a methodologySaumya Ganguly
 
Press Briefing Slides on the budget and economic outlook 2015 to 2025
Press Briefing Slides on the budget and economic outlook 2015 to 2025Press Briefing Slides on the budget and economic outlook 2015 to 2025
Press Briefing Slides on the budget and economic outlook 2015 to 2025Congressional Budget Office
 
совладельцы. сособственники
совладельцы. сособственникисовладельцы. сособственники
совладельцы. сособственникиSychev.n
 
10 Celebrities You Didn't Know Studied Accounting
10 Celebrities You Didn't Know Studied Accounting10 Celebrities You Didn't Know Studied Accounting
10 Celebrities You Didn't Know Studied AccountingAccounting4Free
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge BasesKushal Arora
 
Миний вэб сайт хийдэг аргачлал
Миний вэб сайт хийдэг аргачлалМиний вэб сайт хийдэг аргачлал
Миний вэб сайт хийдэг аргачлалJavkhlan Rentsendorj
 

Destaque (20)

Catalog
CatalogCatalog
Catalog
 
ХӨС семинар 10
ХӨС семинар 10ХӨС семинар 10
ХӨС семинар 10
 
Why FPGA
Why FPGAWhy FPGA
Why FPGA
 
How to start a blog step-by-step guide
How to start a blog   step-by-step guideHow to start a blog   step-by-step guide
How to start a blog step-by-step guide
 
Section 3
Section 3Section 3
Section 3
 
Informal letter
Informal letterInformal letter
Informal letter
 
Habitual behaviour(unit 1)
Habitual behaviour(unit 1)Habitual behaviour(unit 1)
Habitual behaviour(unit 1)
 
Anatomy of a methodology
Anatomy of a methodologyAnatomy of a methodology
Anatomy of a methodology
 
Letter of application
Letter of applicationLetter of application
Letter of application
 
Press Briefing Slides on the budget and economic outlook 2015 to 2025
Press Briefing Slides on the budget and economic outlook 2015 to 2025Press Briefing Slides on the budget and economic outlook 2015 to 2025
Press Briefing Slides on the budget and economic outlook 2015 to 2025
 
совладельцы. сособственники
совладельцы. сособственникисовладельцы. сособственники
совладельцы. сособственники
 
10 Celebrities You Didn't Know Studied Accounting
10 Celebrities You Didn't Know Studied Accounting10 Celebrities You Didn't Know Studied Accounting
10 Celebrities You Didn't Know Studied Accounting
 
Phonology
PhonologyPhonology
Phonology
 
Formal letter
Formal letterFormal letter
Formal letter
 
STEM CELL THERAPY
STEM CELL THERAPYSTEM CELL THERAPY
STEM CELL THERAPY
 
Speaking
SpeakingSpeaking
Speaking
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Exceptional Naming
Exceptional NamingExceptional Naming
Exceptional Naming
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge Bases
 
Миний вэб сайт хийдэг аргачлал
Миний вэб сайт хийдэг аргачлалМиний вэб сайт хийдэг аргачлал
Миний вэб сайт хийдэг аргачлал
 

Semelhante a Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions from Pharmaceutical ELNs

Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...NextMove Software
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics IIbaoilleach
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patentsdan2097
 
nano catalysis as a prospectus of green chemistry
nano catalysis as a prospectus of green chemistry nano catalysis as a prospectus of green chemistry
nano catalysis as a prospectus of green chemistry Ankit Grover
 
Thermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry designThermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry designPeter Kenny
 
EcoEngines Chemical Kinetics
EcoEngines Chemical KineticsEcoEngines Chemical Kinetics
EcoEngines Chemical KineticsEdward Blurock
 
Molecular design: One step back and two paths forward
Molecular design:  One step back and two paths forwardMolecular design:  One step back and two paths forward
Molecular design: One step back and two paths forwardPeter Kenny
 
Some new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular designSome new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular designPeter Kenny
 
Stimuli-Responsive Hydrogels
Stimuli-Responsive HydrogelsStimuli-Responsive Hydrogels
Stimuli-Responsive HydrogelsReeve D'Souza
 
Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)Peter Kenny
 
Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)Peter Kenny
 
Molecular design: How to and how not to?
Molecular design:  How to and how not to?Molecular design:  How to and how not to?
Molecular design: How to and how not to?Peter Kenny
 
Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)Peter Kenny
 
Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...
Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...
Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...IOSR Journals
 

Semelhante a Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions from Pharmaceutical ELNs (20)

Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
Efficient Searching and Similarity of Unmapped Reactions: Application to ELN ...
 
Cheminformatics II
Cheminformatics IICheminformatics II
Cheminformatics II
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 
nano catalysis as a prospectus of green chemistry
nano catalysis as a prospectus of green chemistry nano catalysis as a prospectus of green chemistry
nano catalysis as a prospectus of green chemistry
 
Thermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry designThermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry design
 
EcoEngines Chemical Kinetics
EcoEngines Chemical KineticsEcoEngines Chemical Kinetics
EcoEngines Chemical Kinetics
 
Molecular design: One step back and two paths forward
Molecular design:  One step back and two paths forwardMolecular design:  One step back and two paths forward
Molecular design: One step back and two paths forward
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Combined Draft 4
Combined Draft 4 Combined Draft 4
Combined Draft 4
 
Some new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular designSome new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular design
 
week 9
week 9week 9
week 9
 
Stimuli-Responsive Hydrogels
Stimuli-Responsive HydrogelsStimuli-Responsive Hydrogels
Stimuli-Responsive Hydrogels
 
Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)
 
Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)
 
CRE-!-Lec.pptx
CRE-!-Lec.pptxCRE-!-Lec.pptx
CRE-!-Lec.pptx
 
Molecular design: How to and how not to?
Molecular design:  How to and how not to?Molecular design:  How to and how not to?
Molecular design: How to and how not to?
 
Celis 2015, KpCld
Celis 2015, KpCldCelis 2015, KpCld
Celis 2015, KpCld
 
Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)
 
Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...
Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...
Micellar Effect On Dephosphorylation Of Bis-4-Chloro-3,5-Dimethylphenylphosph...
 

Mais de NextMove Software

Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]NextMove Software
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChemNextMove Software
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulNextMove Software
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)NextMove Software
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightNextMove Software
 

Mais de NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChem
 
GHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be usefulGHS and NFPA diamonds: where they come from and how they can be useful
GHS and NFPA diamonds: where they come from and how they can be useful
 
Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)
 
Sketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sightSketchy sketches hiding chemistry in plain sight
Sketchy sketches hiding chemistry in plain sight
 

Último

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Último (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions from Pharmaceutical ELNs

  • 1. Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions from Pharmaceutical ELNs Roger Sayle, Daniel Lowe, Noel O’Boyle NextMove Software, Cambridge, UK Michael Kappler, Hoffmann-La Roche, Nutley, NJ, USA Nick Tomkinson, AstraZeneca, Alderley Park, UK 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 2. introduction • Pharmaceutical ELNs contain a wealth of synthetic chemistry knowledge, particularly on failed reactions, often not described in the literature or available from public sources. • This presentation describes several of the technical informatics challenges encountered during the process of exploiting reaction information from ELNs. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 3. motivation #1: economics • The primary motivation for reaction informatics is reducing cost of goods, through higher yields, fewer failed reactions and more direct synthetic routes. Yuta Fujiwara et al., “Practical and innate carbon-hydrogen functionalization of heterocycles”, Nature Vol. 492, pp. 95-99, 6th December 2012. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 4. motivation #2: yield prediction • Nadine et al.[1] hypothesize that low LogP is a major cause of synthesis failure in parallel synthesis of combinatorial libraries. • GSK data set from Pickett et al.[2] of 2500 Suzuki couplings in a 50x50 library of MMP-12 inhibitors. – 1704 compounds measured, mean logP = 3.56 (1.44) – 566 compounds not made, mean logP = 2.83 (1.52) – Student’s t-test for different distributions, p<2x10-22. 1. Nadine, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012. 2. Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 5. motivation #2: yield prediction The clear trend between Suzuki coupling success rate and predicted octanol-water partition co-efficient. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013 97 232 340 525 474 202 63 55 119 127 141 80 36 8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% <1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0 Reaction Product Predicted cLogP Sucessful Reactions Failed Reactions
  • 6. motivation #3: virtual libraries • Statistics of ELN reactions may be used to redefine RECAP-style rules for retro synthesis [1,2] • For example, reduction of nitro groups to amines is one of the top 5 most common ELN reactions, but nitration reactions are amongst the rarest. • This implies that nitro containing compounds are purchased as reagents rather made in-house. 1. Xiao Qing Lewell, Duncan B. Judd, Stephen P. Watson and Michael M. Hann, “RECAP – Retrosynthetic Combinatorial Analysis Procedure”, JCICS 38(3):511-522, 1998. 2. Vainio, Kogej and Raubacher, “Automated Recycling of Chemistry for Virtual Screening and Library Design”, J. Chem. Inf. Model. 52(7):1777-1786, 2012. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 7. The challenges 1. Export of the data from the ELN. 2. High fidelity conversion to other file formats. 3. Reaction normalization/standardization. 4. Reaction identity (canonicalization). 5. Reaction naming and classification. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 8. typical eln experiment 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 9. Database export • Typically, ELNs are implemented as complex schemas within relational databases (Oracle), supporting transactions, auditing and security privileges. • Not uncommonly the vendor provided functionality or APIs for data export are slow and/or buggy. • In addition to reactions and structures, there is often a requirement to export all associated data, including textual and numeric data, tables, even LCMS and NMR spectra. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 10. typical eln fields in RD format $DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:VALUE $DATUM 120 &#xB0;C $DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MINVAL $DATUM 393.15 $DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MAXVAL $DATUM 393.15 $DTYPE REACTION:REACTION.CONDITIONS:PRESSURE:VALUE $DATUM 5 bar $DTYPE REACTION:REACTANTS(1):NAME $DATUM nicotinoyl chloride $DTYPE REACTION:REACTANTS(1):CHEMICAL.STRUCTURE $DATUM c1cc(cnc1)C(=O)Cl $DTYPE REACTION:REACTANTS(1):FORMULA.MASS:VALUE $DATUM 141.56 $DTYPE REACTION:REACTANTS(2):NAME $DATUM (2R,3S)-3-methylpentan-2-ol $DTYPE REACTION:REACTANTS(2):CHEMICAL.STRUCTURE $DATUM CC[C@@H](C)[C@H](C)O $DTYPE REACTION:REACTANTS(2):FORMULA.MASS:VALUE $DATUM 102.17 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013 $DTYPE REACTION:PRODUCTS(1):CHEMICAL.STRUCTURE $DATUM CC[C@@H](C)[C@H](C)OC(=O)c1cccnc1 $DTYPE REACTION:PRODUCTS(1):NAME $DATUM (2R,3S)-3-methylpentan-2-yl nicotinate $DTYPE REACTION:PRODUCTS(1):MOLECULAR.WEIGHT:VALUE $DATUM 207.27 $DTYPE REACTION:PRODUCTS(1):MOLECULAR.FORMULA $DATUM C12H17NO2 $DTYPE REACTION:PRODUCTS(1):ACTUAL.MOLES:VALUE $DATUM 2.16 mmol $DTYPE REACTION:SOLVENTS(1):NAME $DATUM 1-pentanol $DTYPE REACTION:SOLVENTS(1):VOLUME:VALUE $DATUM 5 mL $DTYPE REACTION:SOLVENTS(1):R.VALUES $DATUM R10,R20
  • 11. File format conversion • The source format for many reactions is typically a sketch, in either CDX, CDXML, ISIS Sketch or Marvin file format. • For data processing reactions are much easier to handle as reaction SMILES, MDL RXN or RD files and possibly even variants of MOL and SD file formats. • Alas handling of reaction file formats is generally poorly handled by many cheminformatics tools. • Additionally, reaction file formats can rarely encode all of the same information. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 12. decrypting cdx & cDXML files • CambridgeSoft are to be congratulated for publicly documenting their CDX and CDXML file formats. • Unfortunately this online ChemDraw developer resource is no longer being kept up to date. • New tags: object 0x802b encodes “annotation”. • Mistakes: “arrow” is encoded by object 0x8021. • Proprietary property tags: USPTO’s “PageDefinition”. • Support for reading and writing isotopic information in CDXML files has been contributed to Open Babel. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 13. agents and catalysts How to express reaction role, i.e. molecules drawn above and below the reaction arrow. CSc1ccccc1(F)>OO>CS(=O)c1ccc(cc1)F Although standard MDL RXN files only capture reactants and products, a useful ChemAxon extension is to add a third count for agents. Downstream tools can optionally remove or reclassify these agents to produce strictly compliant MDL files. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 14. Valence issues A tricky problem is working around the different MDL valence interpretations used by cheminformatics tools. For example, it is not uncommon for the alchemical reaction [Pb]>>[Au] to become reinterpreted as [PbH2]>>[Au] after writing and re-reading from MDL formats. Such errors lose the distinction between sodium metal and sodium hydride, resulting in incorrect molecular formulae, or to distinguish metals from radicals, causing problems for substructure searching. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 15. rich text format • Exporting formatted text from electronic lab note books, such as experimental/preparation write-ups, requires converting Microsoft Rich Text Format (RTF). • This can be translated into HTML or ASCII, and then long lines wrapped for inclusion in SD/RD data fields. • Special code is required for handling non-U.S. character sets, and whilst Western European, Russian, Chinese and Japanese were expected, finding Arabic, Thai and Vietnamese text in major pharmaceutical ELNs came as a surprise. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 16. salt/component grouping • Keeping track of the intended number and formulae of reactants, products and agents in a reaction, requires preserving salt form associations. • This is implemented by honoring the “group” information from the sketch as single disconnected components in MDL RXN and RD file output. • These associations are traditionally lost in SMILES... ...>CC(=O)[O-].[Cl-].[Cl-].[Fe].[K+].[Pd+2]...>... but can be retained via ChemAxon/GGA extensions. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 17. three types of superatom/Label 1. Recognized superatoms are expanded to their explicit all atom representations. 2. Unrecognized superatoms/labels that are bonded to a molecule are encoded as dummy asterisk atoms where the text is preserved as an MDL atom alias. – Support for writing MDL aliases contributed to RDKit. 3. Disconnected unrecognized labels are preserved as supplementary data fields in SD and RD file formats. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 18. superatom expansion becomes 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 19. text label preservation is converted to an MDL SD file as NextMove07171309292D 5 4 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 1.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.5000 -0.8660 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1.5000 0.8660 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 2.5000 0.8660 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 2 4 1 0 0 0 0 4 5 1 0 0 0 0 A 1 Alexa555 A 5 LH-RH M END > <LABELS> NMS-123456 $$$$
  • 20. chemdraw superatom correction Chemist drew Chemist Intended Chemist got DMF K2CO3 HBTU mCPBA 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 21. Reaction standardization Chemists may draw the reaction components in multiple ways (e.g. tautomers and protonation states) 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 22. Reaction standardization • An often overlooked aspect of ELNs is the need to enforce “business rules” to consistently represent a reaction, in the same way that normalized molecules are stored in registration systems. • Pharmaceutical ELNs contain structures where nitros are represented arbitrarily, and even cases where azide representations differ on each side of an arrow. • Unfortunately, the rules used for molecules (such as InChI) may be inappropriate for reactions, where metal co-ordination and radicals play a major role. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 23. Reaction identity ELNs frequently contain repeated reactions, duplicates. We need define when two reactions are the same. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 24. Reaction identity • When translating from the experiment-centric view of an ELN to the reaction-centric view of a reaction database one asks when are two reactions the same. • A pragmatic/operational definition might be that two experiments with identical sets of reactants and products, but differing quantities, conditions, catalysts and solvents are variations of each other. • Whether a component is a reactant, catalyst, solvent or reagent may be consistently defined by atom- mapping; reactants contribute atoms to the product. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 25. Reaction classification • For searching and analysis it is often convenient to algorithmically assign each reaction to a type, often a named reaction such as Negishi coupling, Diels-Alder cycloaddition, nitro reduction or chiral separation. • This is implemented using a database of SMIRKS-like transforms that may be pre-compiled for efficient matching and portability across informatics toolkits. • This approach provides both classification under the RSC’s RXNO ontology and reaction atom mapping for component role assignment. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 26. simple classifications • Examples of simple reaction classifications A.B>>C Regular reaction A.B>> Failed reaction >>C Compound purchase A>>A Purification A.B>>A Separation (and chiral separation) A.B>>C.A Catalysts or unreacted reagents 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 27. categorization of ELN reactions 1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006. 2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013 34% 17% 5% 2% 3% 6% 10% 1% 15% 2% 5% Heteroatom alkylation and arylation Acylation and related processes C-C bond formations Heterocycle formation Protections Deprotections Reductions Oxidations Functional group conversion Functional group addition Resolution
  • 28. reaction ontology • Reactions are classified into a common subset of the Carey et al. classes and the RSC’s RXNO ontology. • There are 12 super-classes – e.g. 3 C-C bond formation (RXNO:0000002). • These contain 84 class/categories. – e.g. 3.5 Pd-catalyzed C-C bond formation (RXNO:0000316) • These contain ~300 named reactions/types. – e.g. 3.5.3 Negishi coupling (RXNO:0000088) • These require >400 SMIRKS-like transformations. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 29. example smirks-like transform • Reactions are specified as SMIRKS transformations: [BrD1h0+0:1][#6:2].[#7X3v3+0:3][H]>>[#6:2][#7:3] 1.6.2 BROMO_N_ALKYLATION • As demonstrated in the example above, these patterns may operate on explicit hydrogen atoms for brevity, but these are “compiled” via more efficient SMARTS-like patterns for matching during naming. • The nitrogen match becomes “[#7X3v3h>0+0:3]”. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 30. eTl* summary statistics 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013 ELN Experiment Category Fraction A NULL CLOBs 0.30% B Empty sketches 0.24% C No reaction (molecule?) 8.52% D No reactants 0.63% E No products 0.06% F Regular Reactions 88.93% G Markush Reactions 1.32% Total 100.0% Export success for a typical pharmaceutical ELN is currently about 90.94% (see D+E+F+G above). * Extract-Transform-Load (ETL), A data warehousing term.
  • 31. analysis of eln reaction yields 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013 Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
  • 32. analysis of eln reaction yields 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013 Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
  • 33. reaction yield distributions 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 34. conclusions • In an attempt to better understand and hopefully improve the productivity of synthetic chemists, new computational methods have been developed to process “real world” organic reaction data. • The fruits of this work now enable medicinal chemists and informaticians to make greater use of the wealth of information in their in-house ELNs. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 35. acknowledgements • Anna Paola Pellicioli and Greg Landrum, Novartis,CH. • Ethan Hoff and Manli Zheng, AbbVie, IL, USA. • Colin Batchelor, RSC, Cambridge, UK. • David Drake, AstraZeneca, Alderley Park, UK. • Plamen Petrov, AstraZeneca, Molndal, SE. • Daniel Stoffler, Hoffman-La Roche, Basel, CH. • Pat Walters, Vertex Pharmaceuticals, MA, USA. • Andrew Wooster, GSK, RTP, NC, USA. • Thank you for your time. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 36. co-ordinate handling • Several approaches for handling 2D co-ordinates – Preserve original co-ordinates as drawn by chemist – Center and/or rescale for downstream depiction tools – Regenerate all 2D co-ordinates algorithmically – Clean-up long short/bonds introduced by superatom expansion, attempting to preserve original orientation. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 37. nested reactions • Some ELN configurations allow more than one reaction or experiment per lab notebook page, i.e. multiple reaction sketches in different “tabs”. • Two solutions to this breaking of the 1-to-1 mapping between COLLECTION_ID and reaction include: 1. Nested Reactions: Using the MDL RD file’s ability to embed each reaction step as data fields in a single record. 2. Splitting Reactions: Where each reaction has its own record, possibly duplicating shared data. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 38. meta data fields • In addition to the data explicitly recorded by the chemist and captured by the ELN, it is also often useful to export meta-data from an ELN schema. • This includes data fields such as experiment creation data, creation modification date, experiment status (open/closed), chemist name, chemist user id, etc. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 39. incremental updates • An important feature for a production data warehouse for ELN reaction data is the ability to keep its contents valid/fresh with live data. • The can be implemented by supporting incremental updates that export only those creations that have been modified or created since a given date or in a range of dates. • One subtlety is the handling of “closed” experiments (i.e. those signed off by a supervisor), whose status change date need not match the last modified date. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 40. reaction role assignment • Normalized reaction roles may be assigned via reaction atom mapping algorithms. • Reaction components that contribute atoms to the product are defined to be reactants, and the remaining components as catalysts and solvents. • Hence, c1cccc1[N+](=O)[O-].[Ni]>>c1ccccc1N may be canonicalized as c1ccccc1[N+](=O)[O-]>[Ni]>c1ccccc1N 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013
  • 41. future work • Preserving superatom definitions as S groups in V2000 and V3000 format Mol/RXN files. • Enhanced stereochemistry (non-tetrahedral and axial chiralities for catalyst optimization). • Improved support for Marvin and ISIS sketches. • Support for IDBS e-Workbook ELN. 6th Joint Sheffield Conference on Chemoinformatics, Tuesday 23rd July 2013