SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
Alex M. Clark
Mixtures
as
fi
rst class citizens in the realm of informatics
alex@collaborativedrug.com
February 2021
Premise
✤ Most mixtures stored as text or
custom table layouts

✤ Value of upgrading to
cheminformatics is well
established...

✤ ... with the right datastructure,
always just one script away from
what you need

✤ If you can represent it, you can
model it
2
Mol
fi
le Mix
fi
le
InChI MInChI
1980's 2020's
Mixfile/MInChI ✤ Format needs to be:

‣ hierarchical

‣ embed structures when possible

‣ include concentration information

‣ tolerate uncertainty

✤ More verbose ELN-friendly form is Mix
fi
le

✤ Concise form with canonical components is
MInChI (mixtures InChI)
MInChI=0.00.1S/C4H8O/c1-2-4-5-3-1/h1-4H2&C6H12/
c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/
h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/
c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C6H14N.Li/c1-5(2)7-6(3)4;/
h5-6H,1-4H3;/q-1;+1/n{6&{1&{3&2&4&5}}}/
g{1mr0&{1vp0&{5:7pp1&1:2pp1&1:5pp0&1:5pp0}7vp0}}
3
(JSON-serialised)
Data Creation
4
github.com/cdd/mixtures collaborativedrug.com
Formulation Example
✤ Many consumer products are well described
from a chemical perspective

✤ Some components are more easily de
fi
ned
than others

✤ When structure is not available, can use
external identi
fi
ers

✤ Hierarchy encodes information about the
design of the product

✤ Concentrations can be expressed with
uncertainties
5
Comparisons with Structures
InChI=1S/C7H6O3/c8-6-4-2-1-3-5(6)7(9)10/h1-4,8H,(H,9,10)
≡ substructure of
n n
MW 300-500 MW 400-700
≅
Y1.2Ba0.8CuO4 ≅ YBa2Cu3O7−δ
≅
6
Search Queries
>40%
has
has INCI: COCAMIDE DEA
has not substructure
✤ Looking for a certain subset of
external cleaning surfactants,
phosphate-free
7
Informatics Example
✤ Solubility of theophylline

✤ Often delivered in liquid form with mixed solvents: optimising
proportion of drug is important

✤ Consider a scenario where:

‣ all data was provided in Mixtures InChI form

‣ these data exist in openly available repositories

✤ Query:

‣ check that theophylline is present and has concentration

‣ check that other ingredients are solvents

✤ Consider 4 papers with relevant solubility, published over 20 years...
theophylline

nasal anti-in
fl
ammatory
8
https://tinyurl.com/


y3svytfp


➫ 4:13:00
Papers (x4)
9
n n n n
All Together for QSAR
Solubility
0.699 1
15.19 1
1.04 1
3.142 1
0.784 1
0.91 1
6.3 1
13.7 1
11.6 1
13.58 1
6.73 1
9.3 1
8.20 0.8 0.2
16.38 0.5 0.5
13.60 0.2 0.8
(+8 more similar)
15.39 0.333 0.667
26.6 0.5 0.5
17.97 0.083 0.584 0.333
19.06 0.708 0.292
22.59 0.417 0.25 0.333
26.52 0.283 0.25 0.3 0.167
(+14 more similar)
n
10
Deep Eutectics / Carbon Capture
✤ Mixtures of ionic & neutral solvents can
absorb gases like CO2
11
DOI: 10.1021/ef5028873
✤ Want to model?

‣
fi
nding data in literature is hard

‣ curating is extremely laborious

✤ Each datapoint is a mixture...
Mixturfication
✤ Data entry easier as
tabular content

✤ Mixture hierarchy is
implicit
12
Mixtures
✤ Convert each row into a self-contained Mix
fi
le...

✤ ... have solubility of CO2 and SO2 in various solvent combinations

✤ Could be looked up in a database of mixtures from many di
ff
erent sources
13
Free Wilson-esque
✤ Tabulate components as column headings, with proportions - scriptable

✤ Build a basic regression model (PLS)...
14
Absorption
R² = 0.9523
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Predicted
Measured
Gas Absorption
Cross-populate
✤ Empty cells set to max(Tanimoto ECFP4 × concentration)
15
Absorption
R² = 0.9412
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Predicted
Measured
Gas Absorption
10-most Significant Descriptors
16
R² = 0.7723
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Predicted
Measured
Gas Absorption
Labelling Caveat
✤ Mixture data not quite free of implicit assumptions...
17
saturated
single

phase,

liquid
✤ Need additional metadata, e.g.

‣ ontologies

‣ IUPAC terminology

✤ Work in progress
Summary
✤ Open protocols, tools & example data available for representing mixtures

✤ New work
fl
ows become practical with cheminformaticisation of mixtures

✤ Much work to be done to catch up with structure databases

✤ Community building is key

✤ Many industries: lab chemistry, drug formulations, consumer products,
agriculture, analytical standards, safety...
18
Questions?
✤ Contact:

Alex M. Clark alex@collaborativedrug.com (Collaborative Drug Discovery)
19
Journal of Cheminformatics (2019
)

10.1186/s13321-019-0357-4
Open Sourc
e

github.com/cdd/mixtures
CDD Vaul
t

collaborativedrug.com
✤ Also: Leah McEwen (Cornell, InChI, IUPAC)

Mais conteúdo relacionado

Mais de Alex Clark

Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
Alex Clark
 

Mais de Alex Clark (20)

Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocols
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay Express
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithms
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile apps
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by design
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab Notebook
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health Montreal
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App Strategy
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile apps
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 Philadelphia
 
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 PhiladelphiaAlex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
 
Building a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaborationBuilding a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaboration
 

Último

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 

Último (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 

Mixtures as first class citizens in the realm of informatics

  • 1. Alex M. Clark Mixtures as fi rst class citizens in the realm of informatics alex@collaborativedrug.com February 2021
  • 2. Premise ✤ Most mixtures stored as text or custom table layouts ✤ Value of upgrading to cheminformatics is well established... ✤ ... with the right datastructure, always just one script away from what you need ✤ If you can represent it, you can model it 2 Mol fi le Mix fi le InChI MInChI 1980's 2020's
  • 3. Mixfile/MInChI ✤ Format needs to be: ‣ hierarchical ‣ embed structures when possible ‣ include concentration information ‣ tolerate uncertainty ✤ More verbose ELN-friendly form is Mix fi le ✤ Concise form with canonical components is MInChI (mixtures InChI) MInChI=0.00.1S/C4H8O/c1-2-4-5-3-1/h1-4H2&C6H12/ c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/ h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/ c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C6H14N.Li/c1-5(2)7-6(3)4;/ h5-6H,1-4H3;/q-1;+1/n{6&{1&{3&2&4&5}}}/ g{1mr0&{1vp0&{5:7pp1&1:2pp1&1:5pp0&1:5pp0}7vp0}} 3 (JSON-serialised)
  • 5. Formulation Example ✤ Many consumer products are well described from a chemical perspective ✤ Some components are more easily de fi ned than others ✤ When structure is not available, can use external identi fi ers ✤ Hierarchy encodes information about the design of the product ✤ Concentrations can be expressed with uncertainties 5
  • 6. Comparisons with Structures InChI=1S/C7H6O3/c8-6-4-2-1-3-5(6)7(9)10/h1-4,8H,(H,9,10) ≡ substructure of n n MW 300-500 MW 400-700 ≅ Y1.2Ba0.8CuO4 ≅ YBa2Cu3O7−δ ≅ 6
  • 7. Search Queries >40% has has INCI: COCAMIDE DEA has not substructure ✤ Looking for a certain subset of external cleaning surfactants, phosphate-free 7
  • 8. Informatics Example ✤ Solubility of theophylline ✤ Often delivered in liquid form with mixed solvents: optimising proportion of drug is important ✤ Consider a scenario where: ‣ all data was provided in Mixtures InChI form ‣ these data exist in openly available repositories ✤ Query: ‣ check that theophylline is present and has concentration ‣ check that other ingredients are solvents ✤ Consider 4 papers with relevant solubility, published over 20 years... theophylline nasal anti-in fl ammatory 8 https://tinyurl.com/ y3svytfp ➫ 4:13:00
  • 10. All Together for QSAR Solubility 0.699 1 15.19 1 1.04 1 3.142 1 0.784 1 0.91 1 6.3 1 13.7 1 11.6 1 13.58 1 6.73 1 9.3 1 8.20 0.8 0.2 16.38 0.5 0.5 13.60 0.2 0.8 (+8 more similar) 15.39 0.333 0.667 26.6 0.5 0.5 17.97 0.083 0.584 0.333 19.06 0.708 0.292 22.59 0.417 0.25 0.333 26.52 0.283 0.25 0.3 0.167 (+14 more similar) n 10
  • 11. Deep Eutectics / Carbon Capture ✤ Mixtures of ionic & neutral solvents can absorb gases like CO2 11 DOI: 10.1021/ef5028873 ✤ Want to model? ‣ fi nding data in literature is hard ‣ curating is extremely laborious ✤ Each datapoint is a mixture...
  • 12. Mixturfication ✤ Data entry easier as tabular content ✤ Mixture hierarchy is implicit 12
  • 13. Mixtures ✤ Convert each row into a self-contained Mix fi le... ✤ ... have solubility of CO2 and SO2 in various solvent combinations ✤ Could be looked up in a database of mixtures from many di ff erent sources 13
  • 14. Free Wilson-esque ✤ Tabulate components as column headings, with proportions - scriptable ✤ Build a basic regression model (PLS)... 14 Absorption R² = 0.9523 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted Measured Gas Absorption
  • 15. Cross-populate ✤ Empty cells set to max(Tanimoto ECFP4 × concentration) 15 Absorption R² = 0.9412 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted Measured Gas Absorption
  • 16. 10-most Significant Descriptors 16 R² = 0.7723 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted Measured Gas Absorption
  • 17. Labelling Caveat ✤ Mixture data not quite free of implicit assumptions... 17 saturated single phase, liquid ✤ Need additional metadata, e.g. ‣ ontologies ‣ IUPAC terminology ✤ Work in progress
  • 18. Summary ✤ Open protocols, tools & example data available for representing mixtures ✤ New work fl ows become practical with cheminformaticisation of mixtures ✤ Much work to be done to catch up with structure databases ✤ Community building is key ✤ Many industries: lab chemistry, drug formulations, consumer products, agriculture, analytical standards, safety... 18
  • 19. Questions? ✤ Contact: Alex M. Clark alex@collaborativedrug.com (Collaborative Drug Discovery) 19 Journal of Cheminformatics (2019 ) 10.1186/s13321-019-0357-4 Open Sourc e github.com/cdd/mixtures CDD Vaul t collaborativedrug.com ✤ Also: Leah McEwen (Cornell, InChI, IUPAC)