We have previously described the extraction of reactions from US and European patents. This talk will discuss the assembly of over six million extracted reaction details consisting of the connection tables, procedure, quantities, solvents, catalysts and yields into a searchable "read-only" Electronic Lab Notebook.
In addition to reactions details, concepts including diseases, drug targets, and assignees are recognised from the patent documents and normalised to appropriate ontologies. Each normalised term is paired with the reaction details found in the document to allow intuitive cross concept querying (e.g. "GlaxoSmithKline C-C Bond Formation greater than 80% yield Myocardial Infarction"). Reactions are classified and assigned to leafs in the RXNO Ontology. The ontologies are used to provide organisation, faceting, and filtering of results. The reaction classification also provides a precise atom mapping that facilitates structural transformation queries and can improve reaction diagram layout.
Through improvements in substructure search technology we will demonstrate several types of chemical synthesis queries that can be efficiently answered. The combination of high performance chemical searching and additional document terms provides a powerful exploratory and trend analysis tool for chemists.
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
1. CINF 13, ACS Fall 2017, Washington, D.C.
pistachio
Search and Faceting of Large Reaction Databases
John Mayfield, Daniel Lowe, Roger Sayle
2. What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
3. What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
4. HazELNut Filbert NameRXN Cobnut
Accelrys
Pipeline Pilot
(AstraZeneca, AbbVie
& Hoffmann-La Roche)
ChemAxon
JChem Cartridge
(GlaxoSmithKline
& Novartis)
Elsevier Reaxys
(Hoffmann-La Roche,
AstraZeneca, Merck)
Perkin Elmer Informatics
(formerly CambridgeSoft)
eNotebook v9, v11 or v13
or Symyx ELN v5.x or v6.x
Oracle Server
version 10, 11 or
Microsoft Windows, Linux or Mac OS
Infrastructure for liberating and processing
reactions from Electronic Lab Notebooks (ELNs)
CINF 13, ACS Fall 2017, Washington, D.C.
5. To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
US 2016/16966 A1
Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
6. Daniel M. Lowe. Extraction of chemical structures and reactions from the literature. Ph.D. Thesis,
University of Cambridge, 2012
To 7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid (Peakdale) (220 mg, 1.025 mmol) and (3,4-
dimethoxyphenyl)boronic acid (187 mg, 1.025 mmol) in 1,4-dioxane (3 mL) and water (1.5 mL) was
added sodium carbonate(435 mg, 4.10 mmol) and tetrakis(triphenylphosphine)palladium(0) (110 mg, 0.095
mmol). The reaction was heated in the microwave at 80° C. for 2 hours and at 100° C. for a further 2 hours.
The solvent was removed and the residue was suspended in DMSO, filtered and purified by MDAP. Appropriate
fractions were combined and the solvent removed to give 7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-
d]pyridazine-2-carboxylic acid (25 mg, 7%) as a yellow solid.
[0517]
Product Properties
7-(3,4-dimethoxyphenyl)-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 25 mg, 7% yield, Yellow Solid
Reactant Properties
7-chloro-4-oxo-4,5-dihydrofuro[2,3-d]pyridazine-2-carboxylic acid 220 mg, 1.025 mmol
(3,4-dimethoxyphenyl)boronic acid 187 mg, 1.025 mmol
Agent Properties
1,4-dioxane 3mL
water 1.5mL
sodium carbonate 435 mg, 4.10 mol
tetrakis(triphenylphosphine)palladium(0) 110 mg, 0.095 mmol
DMSO
Unstructured text to a structured reaction table
US 2016/16966 A1
LeadMine + Chemical Tagger
7. Christos Nicolaou et al. The Proximal Lilly Collection: Mapping, Exploring and Exploiting
Feasible Chemical Space J. Chem. Inf. Model., 2016, 56 (7), pp 1253–1266
Nadine Schneider et al. Big Data from Pharmaceutical Patents: A Computational Analysis of
Medicinal Chemists’ Bread and Butter. J. Med. Chem., 2016, 59 (9), pp 4385–4402
Nadine Schneider et al. Development of a Novel Fingerprint for Chemical Reactions and Its
Application to Large-Scale Reaction Classification and Similarity J. Chem. Inf.
Model., 2015, 55 (1), pp 39–53
Nadine Schneider et al. What’s What: The (Nearly) Definitive Guide to Reaction Role
Assignment. J. Chem. Inf. Model., 2016, 56 (12), pp 2336–2346
Connor Coley et al. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS
Cent. Sci., 2017, 3 (5), pp 434–443
Data impact
CINF 13, ACS Fall 2017, Washington, D.C.
Public subset released in 2014 as CC-Zero
Pistachio expands the scope of the data and uses Atom-
Atom Maps from NameRxn
10. What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
11. reaction DIAGRAMS
Good reaction diagrams are essential in
communicating synthetic chemistry
Layout can be stored or generated
• When extracting from text, layout must be generated
• Generated diagrams can be unsatisfactory for display
CINF 13, ACS Fall 2017, Washington, D.C.
14. diagram improvements
Typical work arounds:
• Separately render molecules
• Hide agents and list separately
What do humans do:
• Wrap products below
• Abbreviate functional groups and agents
• Orientate reactants to products and visa versa
• Hide agents and list as text
CINF 13, ACS Fall 2017, Washington, D.C.
17. What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
18. 4.1.6 Cyclic Beckmann rearrangement
Assigns names to 900+ reactions using transformations
Can guarantee perfect Atom-Atom Mapping
• Atom-Atom Mapping is an output not an input
• MCS mappers struggle with rearrangements:
namerxn
19. concepts and rxno
CINF 13, ACS Fall 2017, Washington, D.C.
1 Heteroatom alkylation and arylation
.7 O-substitution
.1 Chan-Lam ether coupling
.2 Diazomethane esterification
.3 Ethyl esterification
.4 Hydroxy to methoxy
.5 Hydroxy to triflyloxy
.6 Methyl esterification
.n
2 Acylation and related processes
.6 O-acylation to ester
.1 Ester Schotten-Baumann
.2 Esterification (generic)
.3 Fischer-Speier esterification
.4 Baeyer-Villiger oxidation
.5 Yamaguchi esterification
.6 Hydroxy to imidazolecarbonyloxy
.7 Imidazolecarbonyl to ester
.8 Hydroxy to acetoxy
.9 Steglich esterification
.n
20. concepts and rxno
CINF 13, ACS Fall 2017, Washington, D.C.
1 Heteroatom alkylation and arylation
.7 O-substitution
.1 Chan-Lam ether coupling
.2 Diazomethane esterification
.3 Ethyl esterification
.4 Hydroxy to methoxy
.5 Hydroxy to triflyloxy
.6 Methyl esterification
.n
2 Acylation and related processes
.6 O-acylation to ester
.1 Ester Schotten-Baumann
.2 Esterification (generic)
.3 Fischer-Speier esterification
.4 Baeyer-Villiger oxidation
.5 Yamaguchi esterification
.6 Hydroxy to imidazolecarbonyloxy
.7 Imidazolecarbonyl to ester
.8 Hydroxy to acetoxy
.9 Steglich esterification
.n
Esterification (7)
Chan-Lam coupling (3)
Schotten-Baumann
Reaction (9)
RXNO: http://github.com/rsc-ontologies/rxno
21. result FACETS
Provides summary over the key concepts of results
Cut through information deluge and refine search
CINF 13, ACS Fall 2017, Washington, D.C.
• Reaction Types (NextMove ontology tree)
• Drug Targets (ChEMBL ontology tree)
• Disease Targets (MESH ontology tree)
• Yields
• Affiliation (NextMove ontology tree)
• Publication Date, Documents, Authors
22. CINF 13, ACS Fall 2017, Washington, D.C.
Intel(R) Core(TM) i7-6900K CPU @
3.20GHz
2.9 seconds to summarise
all 6.6 million rows
Resource expensive – O(n) size of
result set
• Client, server, or database?
• Overhead copying and transferring data that is
not needed
• Calculate when requested or up-front?
facet calculation
Custom cartridge:
23. What do Synthetic Chemists Want from Their
Reaction Systems?
CINF 13, ACS Fall 2017, Washington, D.C.
Data ClassificationDiagrams Search
24. one entry point
CINF 13, ACS Fall 2017, Washington, D.C.
Systematic Name Date Range Trivial Name
Yield Range Affiliation Reaction SMARTS
Disease Target DocumentLine Formula
SMILES InChIAuthor Protein Target Collection
Reaction Type (NameRxn)SMARTSSource
…and logical combinations thereof
26. structure search technology
NextMove’s Arthor Technology
Up to 100x faster then state-of-the-
art
Combination of SMARTS
compilation and efficient storage
Preliminary PostgreSQL integration
36s Arthor
56m BIOVIA Direct (Oracle)
1h Bingo (NoSQL)
1h54m Bingo (PostgreSQL)
2h6m Bingo (Oracle)
2h41m JChem (Oracle)
5h9m RDCart (PostgreSQL)
13h54m pgchem (PostgreSQL)
1d1h52m mychem (MySQL)
3d1h13m orchem (Oracle)
Benchmark: ~3.5K queries against ~7M structures (eMolecules 2014) all on the same
hardware.
John May and Roger Sayle, Substructure Search Face-off, May 2015
27. Intention can be refined by qualifiers
Role
{structure} product
Substructure
{structure} substructure
{structure} substructure product
Make/Break
Synthesis of {structure}
Combined with other terms
{structure} substructure product and yield of 80%
refining structure search
CINF 13, ACS Fall 2017, Washington, D.C.
30. Acknowledgements
Noel O’Boyle (NextMove Software), Egon Willighagen (CDK)
James Davison, Matt Swain (Vernalis)
What do Synthetic Chemists Want from Their
Reaction Systems?
Data ClassificationDiagrams Search
pistachio
http://www.nextmovesoftware.com/pistachio.html
Come find me around ACS for a demo!
See also: CINF 90