SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
The anatomy of a
chemical reaction:
Dissection by machine
learning algorithms
Alex M. Clark, Ph.D.
August 2014
© 2015 Molecular Materials Informatics, Inc.
http://molmatinf.com
MOLECULAR MATERIALS INFORMATICS
21st Century Publishing
2
chemist
experiment
write up
confirm
μ pub
URI
viewing
searching
machine
learning
MOLECULAR MATERIALS INFORMATICS
All your byte are belong to us
• Just because a reaction scheme is digital…
• … doesn’t mean it’s of any use to a computer.
3
MOLECULAR MATERIALS INFORMATICS
Production Raster Graphics
4
Generic molfile
15 16 0 0 0 0 0 0 0 0999 V2000
-3.9510 4.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.2500 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6519 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.2500 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.9510 1.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.6519 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2306 3.7694 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.3482 2.5611 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2240 1.3407 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-6.5490 4.0500 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-7.8481 3.3000 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-6.5490 5.5500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.1518 2.5673 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9072 1.2714 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.8964 3.8695 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 2 0 0 0 0
2 4 2 0 0 0 0
4 5 1 0 0 0 0
5 6 2 0 0 0 0
6 3 1 0 0 0 0
3 7 1 0 0 0 0
7 8 2 0 0 0 0
8 9 1 0 0 0 0
9 6 1 0 0 0 0
2 10 1 0 0 0 0
10 11 1 0 0 0 0
10 12 2 0 0 0 0
8 13 1 0 0 0 0
13 14 1 0 0 0 0
13 15 1 0 0 0 0
M CHG 2 10 1 11 -1
M END
MOLECULAR MATERIALS INFORMATICS
Production Vector Graphics
• Manuscripts usually delivered as PDFs:
5
MOLECULAR MATERIALS INFORMATICS
Spreadsheets
• Data gives the impression of organisation
• Very high degrees of freedom, nothing for structures
6
MOLECULAR MATERIALS INFORMATICS
Common Scheme
7
MOLECULAR MATERIALS INFORMATICS
Digitally Friendly
8
primary
reactant
secondary
reactants
catalyst
solvent
intermediate
byproducts
final
product
reagent
MOLECULAR MATERIALS INFORMATICS
Representation
• For machines: representation
must be very rigidly defined
• For humans: can generate
diagram programmatically
• MDL RXN/RDfile ~50% there
• DataSheet XML with Experiment
aspect
http://molmatinf.com/fmtaspect.html
9
StructureStep Role
1
1
1
1
1
1
2
2
Reactant
Reagent
Product
Product
Stoich.
1
1
1
1
1
1
1
1
Reactant
Reagent
Reagent
Reagent
MOLECULAR MATERIALS INFORMATICS
Balancing
10
MOLECULAR MATERIALS INFORMATICS
Quantities
11
MOLECULAR MATERIALS INFORMATICS
Green Metrics
12
• Totals for reactants, products & waste
• For each non-waste product: yield, PMI, E-factor,
Atom-E… always calculated, always recorded
MOLECULAR MATERIALS INFORMATICS 13
Reaction Transforms
• Reaction = specific description of experiment
1 2 3
4
1 2
3
4
• Transform = the generic form of a reaction
MOLECULAR MATERIALS INFORMATICS 14
Convenience
• Apply to a molecule...
10 g
MOLECULAR MATERIALS INFORMATICS 15
Decision MakingProductSearchResults
Yield PMI E-factor
Atom
Economy
100% 2.18 1.18 100%
84% 12.49 11.49 93.3%
82% 19.17 18.17 87.4%
100% 8.26 7.26 56.3%
63% 8.93 7.93 73.3%
MOLECULAR MATERIALS INFORMATICS
Model Building
• Most reaction data is noisy and incomplete
• Imagine opportunities with quantity & quality...
16
1
2
3
4
5 1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
• For example: model solvent substitution
MOLECULAR MATERIALS INFORMATICS
Conclusions & Future
• Most published reactions intractible to machines
• Most reaction informatics formats 50% complete
• Full description has immediate benefits...
• ... eventual large scale machine learning.
• μPublications with provenance: the path to open
repositories - but requires attention to content
17
Acknowledgments
http://molmatinf.com
http://molsync.com
http://cheminf20.org
@aclarkxyz
• Antony Williams
• Sean Ekins
• Leah McEwen
• Open data advocates
• Inquiries to
info@molmatinf.com

Mais conteúdo relacionado

Semelhante a The anatomy of a chemical reaction: Dissection by machine learning algorithms

Fast & Micro GC Presentation
Fast & Micro GC PresentationFast & Micro GC Presentation
Fast & Micro GC Presentation
spparker
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
ChemAxon
 

Semelhante a The anatomy of a chemical reaction: Dissection by machine learning algorithms (20)

Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical data
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in Action
 
PEP Functional Proteomics Technology
PEP Functional Proteomics TechnologyPEP Functional Proteomics Technology
PEP Functional Proteomics Technology
 
Fast & Micro GC Presentation
Fast & Micro GC PresentationFast & Micro GC Presentation
Fast & Micro GC Presentation
 
Chang Sha, China
Chang Sha, ChinaChang Sha, China
Chang Sha, China
 
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
Assessing the consistency, quality, and completeness of the Reviewed Event Bu...
 
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
EUGM 2013 - Anh Kiet Tran Minh (CNRS): French Academic Compound Library: the ...
 
Mr Benjamin Soffer
Mr Benjamin SofferMr Benjamin Soffer
Mr Benjamin Soffer
 
Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
Industry 4.0 v2.0 4x3 (are sunum) (1)
Industry 4.0 v2.0 4x3 (are sunum) (1)Industry 4.0 v2.0 4x3 (are sunum) (1)
Industry 4.0 v2.0 4x3 (are sunum) (1)
 
BILS 2015 Christoph Herwig
BILS 2015 Christoph HerwigBILS 2015 Christoph Herwig
BILS 2015 Christoph Herwig
 
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
 
Energy Saving Calculations for Recommissioning and Design
Energy Saving Calculations for Recommissioning and DesignEnergy Saving Calculations for Recommissioning and Design
Energy Saving Calculations for Recommissioning and Design
 
Adventures in Metabolite Profiling with an Accurate Mass QTof
Adventures in Metabolite Profiling with an Accurate Mass QTofAdventures in Metabolite Profiling with an Accurate Mass QTof
Adventures in Metabolite Profiling with an Accurate Mass QTof
 
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Machine Learning Impact on IoT - Part 2
Machine Learning Impact on IoT - Part 2Machine Learning Impact on IoT - Part 2
Machine Learning Impact on IoT - Part 2
 
Impact of Laboratory Automation on quality and TRT. Evaluating and Selecting...
Impact of  Laboratory Automation on quality and TRT. Evaluating and Selecting...Impact of  Laboratory Automation on quality and TRT. Evaluating and Selecting...
Impact of Laboratory Automation on quality and TRT. Evaluating and Selecting...
 
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 ...
 
Irish Renewable Energy Summit 230212 Final
Irish Renewable Energy Summit 230212 FinalIrish Renewable Energy Summit 230212 Final
Irish Renewable Energy Summit 230212 Final
 

Mais de Alex Clark

Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
Alex Clark
 

Mais de Alex Clark (20)

Mixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsMixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicals
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Mixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsMixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informatics
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer products
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)
 
Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...
 
Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...Bringing bioassay protocols to the world of informatics, using semantic annot...
Bringing bioassay protocols to the world of informatics, using semantic annot...
 
ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)
 
Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocols
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay Express
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile apps
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health Montreal
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App Strategy
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile apps
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 Philadelphia
 

Último

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Último (20)

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 

The anatomy of a chemical reaction: Dissection by machine learning algorithms

  • 1. The anatomy of a chemical reaction: Dissection by machine learning algorithms Alex M. Clark, Ph.D. August 2014 © 2015 Molecular Materials Informatics, Inc. http://molmatinf.com
  • 2. MOLECULAR MATERIALS INFORMATICS 21st Century Publishing 2 chemist experiment write up confirm μ pub URI viewing searching machine learning
  • 3. MOLECULAR MATERIALS INFORMATICS All your byte are belong to us • Just because a reaction scheme is digital… • … doesn’t mean it’s of any use to a computer. 3
  • 4. MOLECULAR MATERIALS INFORMATICS Production Raster Graphics 4 Generic molfile 15 16 0 0 0 0 0 0 0 0999 V2000 -3.9510 4.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -5.2500 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.6519 3.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -5.2500 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.9510 1.0500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.6519 1.8000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.2306 3.7694 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.3482 2.5611 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.2240 1.3407 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 -6.5490 4.0500 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0 -7.8481 3.3000 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 -6.5490 5.5500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1.1518 2.5673 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9072 1.2714 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.8964 3.8695 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 2 0 0 0 0 2 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 6 3 1 0 0 0 0 3 7 1 0 0 0 0 7 8 2 0 0 0 0 8 9 1 0 0 0 0 9 6 1 0 0 0 0 2 10 1 0 0 0 0 10 11 1 0 0 0 0 10 12 2 0 0 0 0 8 13 1 0 0 0 0 13 14 1 0 0 0 0 13 15 1 0 0 0 0 M CHG 2 10 1 11 -1 M END
  • 5. MOLECULAR MATERIALS INFORMATICS Production Vector Graphics • Manuscripts usually delivered as PDFs: 5
  • 6. MOLECULAR MATERIALS INFORMATICS Spreadsheets • Data gives the impression of organisation • Very high degrees of freedom, nothing for structures 6
  • 8. MOLECULAR MATERIALS INFORMATICS Digitally Friendly 8 primary reactant secondary reactants catalyst solvent intermediate byproducts final product reagent
  • 9. MOLECULAR MATERIALS INFORMATICS Representation • For machines: representation must be very rigidly defined • For humans: can generate diagram programmatically • MDL RXN/RDfile ~50% there • DataSheet XML with Experiment aspect http://molmatinf.com/fmtaspect.html 9 StructureStep Role 1 1 1 1 1 1 2 2 Reactant Reagent Product Product Stoich. 1 1 1 1 1 1 1 1 Reactant Reagent Reagent Reagent
  • 12. MOLECULAR MATERIALS INFORMATICS Green Metrics 12 • Totals for reactants, products & waste • For each non-waste product: yield, PMI, E-factor, Atom-E… always calculated, always recorded
  • 13. MOLECULAR MATERIALS INFORMATICS 13 Reaction Transforms • Reaction = specific description of experiment 1 2 3 4 1 2 3 4 • Transform = the generic form of a reaction
  • 14. MOLECULAR MATERIALS INFORMATICS 14 Convenience • Apply to a molecule... 10 g
  • 15. MOLECULAR MATERIALS INFORMATICS 15 Decision MakingProductSearchResults Yield PMI E-factor Atom Economy 100% 2.18 1.18 100% 84% 12.49 11.49 93.3% 82% 19.17 18.17 87.4% 100% 8.26 7.26 56.3% 63% 8.93 7.93 73.3%
  • 16. MOLECULAR MATERIALS INFORMATICS Model Building • Most reaction data is noisy and incomplete • Imagine opportunities with quantity & quality... 16 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 • For example: model solvent substitution
  • 17. MOLECULAR MATERIALS INFORMATICS Conclusions & Future • Most published reactions intractible to machines • Most reaction informatics formats 50% complete • Full description has immediate benefits... • ... eventual large scale machine learning. • μPublications with provenance: the path to open repositories - but requires attention to content 17
  • 18. Acknowledgments http://molmatinf.com http://molsync.com http://cheminf20.org @aclarkxyz • Antony Williams • Sean Ekins • Leah McEwen • Open data advocates • Inquiries to info@molmatinf.com