SlideShare uma empresa Scribd logo
1 de 39
Validation and Standardization of
Molecular Structures in General
and Sugars in Particular: a Case
Study
Colin Batchelor,
Ken Karapetyan, Valery Tkachenko, Antony Williams
6th Joint Sheffield Conference
on Chemoinformatics
2013-07-24
Overview
Open PHACTS and chemical validation and
standardization
RDF for chemoinformatics calculations
General case study: ChEMBL and DrugBank
Sugar case study: Perspective perception
Overview
Open PHACTS and chemical validation and
standardization
RDF for chemoinformatics calculations
General case study: ChEMBL and DrugBank
Sugar case study: Perspective perception
Who is involved? 28 Consortium Members >45 Associated Partners
3-year European project funded by:
• European Pharmaceutical Industry
• Innovative Medicines Initiative
Open PHACTS API
Applications using the Open PHACTS API
dev.openphacts.org
Explorer
www.openphacts.org Twitter: @open_phacts
How do we fit in?
We integrate and standardize the chemical
compound collection underpinning Open
PHACTS and provide regular updates and on-
going data curation.
The validation and standardization rules have
been derived from the FDA structure guidelines
and have been changed for consistency and
input from members of EFPIA.
Open PHACTS provides an integrated platform of publicly
available pharmacological and physicochemical data
”“
Data accessible via:
• Free application programming interface (API)
dev.openphacts.org
• Third-party applications built to use the API
Open PHACTS app ecosystem
How does Open PHACTS work?
Currently integrated databases
Database Millions of triples
ACD Labs / ChemSpider 161.3
ChEBI 0.9
ChEMBL 146.1
ConceptWiki 3.7
DrugBank 0.5
Enzyme 0.1
Gene Ontology 0.9
SwissProt 156.6
WikiPathways 0.1
TOTAL 470.2
CVSP and the OPS CRS
Standardization workflows
(CVSP, FDA, OPS, custom) using
modules such as:
• SMIRKS transformations
• layout (GGA)
• canonical tautomers (ChemAxon)
• sugar interpretation (RSC)
 
Overview
Open PHACTS and chemical validation and
standardization
RDF for chemoinformatics calculations
General case study: ChEMBL and DrugBank
Sugar case study: Perspective perception
RDF and Open PHACTS
The underlying language of Open PHACTS is RDF.
There are few constraints as such, only guidelines
for which classes of identifier to use and accounts of
best practice.
This RDF goes into the data cache and we access
the results through user interfaces built on RESTful
JSON web services.
What does RDF look like?
In the Turtle format below, each line is a triple, in
which a binary predicate links a subject and an
object.
:CSID1execution obo:OBO_0000299 :CSID1prop11 .
:CSID1prop11 obo:IAO_0000136 ops:OPS1 .
:CSID1prop11 rdf:type cheminf:CHEMINF_000349 .
:CSID1prop11 qudt:numericValue "1.049E-17"^^xsd:double .
:CSID1prop11 qudt:unit obo:UO_0000324 .
There is also RDF/XML, which is less human-
readable.
Royal Society of Chemistry
data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between
ChEBI, ChEMBL, DrugBank and OPS
identifiers
3. Molecule–molecule relations (―parent–
child‖) of interest for drug discovery
4. Calculated physicochemical properties
for compounds (both molecular and
macroscopic)
Royal Society of Chemistry
data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between
ChEBI, ChEMBL, DrugBank and OPS
identifiers
3. Molecule–molecule relations (―parent–
child‖) of interest for drug discovery
4. Calculated physicochemical properties
for compounds (both molecular and
macroscopic)
Calculated physicochemical
properties (ACD 12.0)
log P log D (at pH 5.5, at pH 7.4)
bioconcentration factor KOC (at pH 5.5, at
pH 7.4) index of refraction polar surface
area molar refractivity molar volume
polarizability surface tension density at STP
boiling point at 1 atm flash point at 1 atm
enthalpy of vaporization at STP vapour
pressure at STP
RDF for calculated properties:
vocabularies
Two dozen calculated properties for each of
>106 molecules.
CHEMINF ontology for kinds of calculation and
chemical data
QUDT for results
OPS IDs for molecules
OBI and IAO to connect calculations to results
RDF for calculated properties:
schema
benzene’s
connection table
OPS
benzene
calculation result
QUDT
dimensionless
quantity
―2.17‖^^xsd:float
IAO
is about
OBI
has specified
output
OBI
has specified
input
QUDT
has value
QUDT
has standard
uncertainty
QUDT
has unit
CHEMINF
calculated log P
rdf:type
CHEMINF
connection table
rdf:type
―0.234‖^^xsd:float
calculation
process
CHEMINF
execution of
ACD/Labs
PhysChem software
library version 12.01
rdf:type
Overview
Open PHACTS and chemical validation and
standardization
RDF for chemoinformatics calculations
General case study: ChEMBL and DrugBank
Sugar case study: Perspective perception
ChEMBL and DrugBank
analysed
Taking ChEMBL 16 (http://www.ebi.ac.uk/chembl/) which
contains 1 295 510 distinct molecules, CVSP found
something to say about 456 250 of them (35%).
DrugBank 3.0 (http://www.drugbank.ca/) contains 6510
distinct molecules of which CVSP has found something to
say about 662 of them (10%)
(We haven’t done all of CS yet; we will.)
ChEMBL DrugBank
Potentially serious things
14218 1.09% 202 3.10% Not an overall neutral system
485 0.04% 21 0.32% Forbidden-valence atoms
44 — 0 — Has adjacent atoms with like charges
4 — 0 — Has more than one radical centre
ChEMBL DrugBank
Aesthetics
57275 4.42% 70 1.08
%
Uneven-length bonds
25736 1.99% 78 1.20
%
Congested layout
23622 1.82% 24 0.37
%
Containing not-quite-linear cyano groups
167 0.01% 1 — Zero-dimensional structures
70 0.01% 0 — Containing not-quite-linear isocyano groups
ChEMBL DrugBank
Artwork molecules
0 0 Cyclobutane
8 0 Ethane molecules in the structure
6 0 Sulfur atoms with no explicit bonds
4 0 Boron atoms with no explicit bonds
1 0 Ethyne molecule
(in the ChEMBL case it actually is acetylene)
3 0 Stray methane molecules
ChEMBL DrugBank
FDA tautomer and metal rules
17508 1.35% 80 1.29% In enol form (or chalcogenoenol form)
9526 0.74% 4 0.07% N=C–OH tautomer of a carbonyl compound
2 — 1 — Nitroso-form oximes
1104 0.09% 6 0.09% Metal–nitrogen bond
845 0.06% 10 0.15% Non-metal–transition-metal bond
432 0.03% 10 0.15% Metal–oxygen bond
3 — 2 — Aluminium–non-metal bond
2 — 0 — Metal–fluorine bond
ChEMBL DrugBank
Stereochemistry
185742 14.3% 39 0.60% G2-4: Has a single unknown stereocentre and no
defined stereocentres: probably a racemate
68572 5.3% 13 0.20% G2-42 Has more than one unknown stereocentre
and no defined stereocentres: probably
problematic. Could indicate relative
stereochemistry?
36572 2.8% 27 0.44% G2-44 At least one defined stereocentre, and one
is stereocentre undefined or unknown: probably
an epimer or mixture of anomers
26076 2.0% 11 0.17% G2-46 Has more than one unknown stereocentre
and more than one defined stereocentre –
probably problematic again
23113 1.8% 13 0.20% Unknown double bond arrangement
883 0.1% 1 — At least one ring containing stereobonds
Overview
Open PHACTS and chemical validation and
standardization
RDF for chemoinformatics calculations
General case study: ChEMBL
Sugar case study: Perspective perception
Sugar depiction challenges
Stereochemistry not stored in V2000
format (though present in .cdx).
Consequences
ChEMBL
(19275)
DrugBank
(153)
Sugar questions
5359 27.8% 138 90.2% At least one L-pyranose ring (often antibiotics
contain these)
4748 24.6% 0 — At least one perspective chair
416 2.16% 0 — At least one Haworth ring
52 0.03% 0 — At least one perspective boat or twist boat
Sugar ring redepiction
algorithm
1. Identify perspective conformation
(boat, chair, Haworth)
2. Determine perspective stereo
3. Assign wedge or hash to bonds
accordingly
4. Reconstruct sugar ring so as to minimize
disruption to the rest of molecule
5. Tidy
Take the x-axis as parallel to
the line through the top two
chair atoms or through the
bottom two chair atoms.
Δy positive: wedge
Δy negative: hash
Then remap chair to
homotropous hexagon.
In the boat case, the
substituent further up the
page is the wedge, while
the one further down the
page is the
hash, regardless of
whether bridgehead or
not.
Depiction 1. Identify mean bond
length and chair centroid.
2. Snap ring atoms to a
regular-hexagonal grid.
3. Remove superfluous
hydrogen atoms.
4. Only mark stereo on a
single substituent if they
are paired (cf. Grice).
Tidying: desiderata
Different problem from structure layout in
general.
The structure we end up with is, in many
important respects, fine.
Preserve drawing conventions—aglycones
being on the top right hand side.
Next steps
Stable user-facing URI for CVSP
(currently http://cvsp.beta.rsc-us.org/,
but subject to change)
Apply CVSP to all of ChemSpider.
Investigate fused rings.
Acknowledgements
In particular,
Jon Steele (RSC)
David Sharpe (RSC)
John Blunt (Canterbury, NZ)
Any questions?
batchelorc@rsc.org
@documentvector

Mais conteúdo relacionado

Semelhante a 20130724 cisrg sugars_batchelor

2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryAbhik Seal
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxNoorelhuda2
 
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaLa chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaCRS4 Research Center in Sardinia
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...Michel Dumontier
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolismAlichy Sowmya
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryLee Larcombe
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...Kamel Mansouri
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Ken Karapetyan
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspKen Karapetyan
 
Applying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicityApplying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicitySean Ekins
 
louisa_bellis_small_molecules_copenhagen_roadshow.pptx
louisa_bellis_small_molecules_copenhagen_roadshow.pptxlouisa_bellis_small_molecules_copenhagen_roadshow.pptx
louisa_bellis_small_molecules_copenhagen_roadshow.pptxdrzyp
 

Semelhante a 20130724 cisrg sugars_batchelor (20)

2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptx
 
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceuticaLa chemoinformatica: uno strumento computazionale per la chimica farmaceutica
La chemoinformatica: uno strumento computazionale per la chimica farmaceutica
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
CADD
CADDCADD
CADD
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
Assignment 105B.pptx
Assignment 105B.pptxAssignment 105B.pptx
Assignment 105B.pptx
 
Cadd assignment 4 (sarita)
Cadd assignment 4 (sarita)Cadd assignment 4 (sarita)
Cadd assignment 4 (sarita)
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolism
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...Building support for the semantic web for chemistry at the Royal Society of C...
Building support for the semantic web for chemistry at the Royal Society of C...
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Applying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicityApplying computational models for transporters to predict toxicity
Applying computational models for transporters to predict toxicity
 
louisa_bellis_small_molecules_copenhagen_roadshow.pptx
louisa_bellis_small_molecules_copenhagen_roadshow.pptxlouisa_bellis_small_molecules_copenhagen_roadshow.pptx
louisa_bellis_small_molecules_copenhagen_roadshow.pptx
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 

Mais de Royal Society of Chemistry

Engaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careersEngaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careersRoyal Society of Chemistry
 
Navigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resourcesNavigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resourcesRoyal Society of Chemistry
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscRoyal Society of Chemistry
 
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chisRoyal Society of Chemistry
 
Newcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineNewcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineRoyal Society of Chemistry
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchRoyal Society of Chemistry
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityRoyal Society of Chemistry
 

Mais de Royal Society of Chemistry (19)

Open Data: Touching Upon the Intangible
Open Data: Touching Upon the IntangibleOpen Data: Touching Upon the Intangible
Open Data: Touching Upon the Intangible
 
20130410 carbohydrates
20130410 carbohydrates20130410 carbohydrates
20130410 carbohydrates
 
Engaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careersEngaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careers
 
Navigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resourcesNavigating scientific resources using wiki based resources
Navigating scientific resources using wiki based resources
 
Utilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rscUtilizing open source software to facilitate communication of chemistry at rsc
Utilizing open source software to facilitate communication of chemistry at rsc
 
RSC Mobile
RSC Mobile RSC Mobile
RSC Mobile
 
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
ChemSpider as a chemical term resolver
ChemSpider as a chemical term resolverChemSpider as a chemical term resolver
ChemSpider as a chemical term resolver
 
ChemCareers India Specialist presentation
ChemCareers India Specialist presentation ChemCareers India Specialist presentation
ChemCareers India Specialist presentation
 
RSC membership presentation 2011
RSC membership presentation 2011RSC membership presentation 2011
RSC membership presentation 2011
 
Newcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU OnlineNewcastle chemistry admissions talk for MTU Online
Newcastle chemistry admissions talk for MTU Online
 
ChemNet Careers 2011-12
ChemNet Careers 2011-12ChemNet Careers 2011-12
ChemNet Careers 2011-12
 
Town hall speech
Town hall speechTown hall speech
Town hall speech
 
Chemistry Landscape - Town Hall Speech
Chemistry Landscape - Town Hall SpeechChemistry Landscape - Town Hall Speech
Chemistry Landscape - Town Hall Speech
 
All aboard the Semantic Bandwagon
All aboard the Semantic BandwagonAll aboard the Semantic Bandwagon
All aboard the Semantic Bandwagon
 
Linking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish researchLinking chemistry: wider lessons for how we publish research
Linking chemistry: wider lessons for how we publish research
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201
 

Último

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Último (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

20130724 cisrg sugars_batchelor

  • 1. Validation and Standardization of Molecular Structures in General and Sugars in Particular: a Case Study Colin Batchelor, Ken Karapetyan, Valery Tkachenko, Antony Williams 6th Joint Sheffield Conference on Chemoinformatics 2013-07-24
  • 2. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception
  • 3. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception
  • 4. Who is involved? 28 Consortium Members >45 Associated Partners 3-year European project funded by: • European Pharmaceutical Industry • Innovative Medicines Initiative Open PHACTS API Applications using the Open PHACTS API dev.openphacts.org Explorer www.openphacts.org Twitter: @open_phacts
  • 5. How do we fit in? We integrate and standardize the chemical compound collection underpinning Open PHACTS and provide regular updates and on- going data curation. The validation and standardization rules have been derived from the FDA structure guidelines and have been changed for consistency and input from members of EFPIA.
  • 6. Open PHACTS provides an integrated platform of publicly available pharmacological and physicochemical data ”“ Data accessible via: • Free application programming interface (API) dev.openphacts.org • Third-party applications built to use the API Open PHACTS app ecosystem
  • 7. How does Open PHACTS work?
  • 8. Currently integrated databases Database Millions of triples ACD Labs / ChemSpider 161.3 ChEBI 0.9 ChEMBL 146.1 ConceptWiki 3.7 DrugBank 0.5 Enzyme 0.1 Gene Ontology 0.9 SwissProt 156.6 WikiPathways 0.1 TOTAL 470.2
  • 9. CVSP and the OPS CRS Standardization workflows (CVSP, FDA, OPS, custom) using modules such as: • SMIRKS transformations • layout (GGA) • canonical tautomers (ChemAxon) • sugar interpretation (RSC)  
  • 10. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception
  • 11. RDF and Open PHACTS The underlying language of Open PHACTS is RDF. There are few constraints as such, only guidelines for which classes of identifier to use and accounts of best practice. This RDF goes into the data cache and we access the results through user interfaces built on RESTful JSON web services.
  • 12. What does RDF look like? In the Turtle format below, each line is a triple, in which a binary predicate links a subject and an object. :CSID1execution obo:OBO_0000299 :CSID1prop11 . :CSID1prop11 obo:IAO_0000136 ops:OPS1 . :CSID1prop11 rdf:type cheminf:CHEMINF_000349 . :CSID1prop11 qudt:numericValue "1.049E-17"^^xsd:double . :CSID1prop11 qudt:unit obo:UO_0000324 . There is also RDF/XML, which is less human- readable.
  • 13. Royal Society of Chemistry data in Open PHACTS 1. Molecule synonyms and identifiers 2. Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3. Molecule–molecule relations (―parent– child‖) of interest for drug discovery 4. Calculated physicochemical properties for compounds (both molecular and macroscopic)
  • 14. Royal Society of Chemistry data in Open PHACTS 1. Molecule synonyms and identifiers 2. Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3. Molecule–molecule relations (―parent– child‖) of interest for drug discovery 4. Calculated physicochemical properties for compounds (both molecular and macroscopic)
  • 15. Calculated physicochemical properties (ACD 12.0) log P log D (at pH 5.5, at pH 7.4) bioconcentration factor KOC (at pH 5.5, at pH 7.4) index of refraction polar surface area molar refractivity molar volume polarizability surface tension density at STP boiling point at 1 atm flash point at 1 atm enthalpy of vaporization at STP vapour pressure at STP
  • 16. RDF for calculated properties: vocabularies Two dozen calculated properties for each of >106 molecules. CHEMINF ontology for kinds of calculation and chemical data QUDT for results OPS IDs for molecules OBI and IAO to connect calculations to results
  • 17. RDF for calculated properties: schema benzene’s connection table OPS benzene calculation result QUDT dimensionless quantity ―2.17‖^^xsd:float IAO is about OBI has specified output OBI has specified input QUDT has value QUDT has standard uncertainty QUDT has unit CHEMINF calculated log P rdf:type CHEMINF connection table rdf:type ―0.234‖^^xsd:float calculation process CHEMINF execution of ACD/Labs PhysChem software library version 12.01 rdf:type
  • 18. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception
  • 19. ChEMBL and DrugBank analysed Taking ChEMBL 16 (http://www.ebi.ac.uk/chembl/) which contains 1 295 510 distinct molecules, CVSP found something to say about 456 250 of them (35%). DrugBank 3.0 (http://www.drugbank.ca/) contains 6510 distinct molecules of which CVSP has found something to say about 662 of them (10%) (We haven’t done all of CS yet; we will.)
  • 20. ChEMBL DrugBank Potentially serious things 14218 1.09% 202 3.10% Not an overall neutral system 485 0.04% 21 0.32% Forbidden-valence atoms 44 — 0 — Has adjacent atoms with like charges 4 — 0 — Has more than one radical centre
  • 21. ChEMBL DrugBank Aesthetics 57275 4.42% 70 1.08 % Uneven-length bonds 25736 1.99% 78 1.20 % Congested layout 23622 1.82% 24 0.37 % Containing not-quite-linear cyano groups 167 0.01% 1 — Zero-dimensional structures 70 0.01% 0 — Containing not-quite-linear isocyano groups
  • 22. ChEMBL DrugBank Artwork molecules 0 0 Cyclobutane 8 0 Ethane molecules in the structure 6 0 Sulfur atoms with no explicit bonds 4 0 Boron atoms with no explicit bonds 1 0 Ethyne molecule (in the ChEMBL case it actually is acetylene) 3 0 Stray methane molecules
  • 23. ChEMBL DrugBank FDA tautomer and metal rules 17508 1.35% 80 1.29% In enol form (or chalcogenoenol form) 9526 0.74% 4 0.07% N=C–OH tautomer of a carbonyl compound 2 — 1 — Nitroso-form oximes 1104 0.09% 6 0.09% Metal–nitrogen bond 845 0.06% 10 0.15% Non-metal–transition-metal bond 432 0.03% 10 0.15% Metal–oxygen bond 3 — 2 — Aluminium–non-metal bond 2 — 0 — Metal–fluorine bond
  • 24. ChEMBL DrugBank Stereochemistry 185742 14.3% 39 0.60% G2-4: Has a single unknown stereocentre and no defined stereocentres: probably a racemate 68572 5.3% 13 0.20% G2-42 Has more than one unknown stereocentre and no defined stereocentres: probably problematic. Could indicate relative stereochemistry? 36572 2.8% 27 0.44% G2-44 At least one defined stereocentre, and one is stereocentre undefined or unknown: probably an epimer or mixture of anomers 26076 2.0% 11 0.17% G2-46 Has more than one unknown stereocentre and more than one defined stereocentre – probably problematic again 23113 1.8% 13 0.20% Unknown double bond arrangement 883 0.1% 1 — At least one ring containing stereobonds
  • 25. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL Sugar case study: Perspective perception
  • 26. Sugar depiction challenges Stereochemistry not stored in V2000 format (though present in .cdx).
  • 28. ChEMBL (19275) DrugBank (153) Sugar questions 5359 27.8% 138 90.2% At least one L-pyranose ring (often antibiotics contain these) 4748 24.6% 0 — At least one perspective chair 416 2.16% 0 — At least one Haworth ring 52 0.03% 0 — At least one perspective boat or twist boat
  • 29. Sugar ring redepiction algorithm 1. Identify perspective conformation (boat, chair, Haworth) 2. Determine perspective stereo 3. Assign wedge or hash to bonds accordingly 4. Reconstruct sugar ring so as to minimize disruption to the rest of molecule 5. Tidy
  • 30.
  • 31.
  • 32. Take the x-axis as parallel to the line through the top two chair atoms or through the bottom two chair atoms. Δy positive: wedge Δy negative: hash Then remap chair to homotropous hexagon.
  • 33.
  • 34. In the boat case, the substituent further up the page is the wedge, while the one further down the page is the hash, regardless of whether bridgehead or not.
  • 35. Depiction 1. Identify mean bond length and chair centroid. 2. Snap ring atoms to a regular-hexagonal grid. 3. Remove superfluous hydrogen atoms. 4. Only mark stereo on a single substituent if they are paired (cf. Grice).
  • 36. Tidying: desiderata Different problem from structure layout in general. The structure we end up with is, in many important respects, fine. Preserve drawing conventions—aglycones being on the top right hand side.
  • 37. Next steps Stable user-facing URI for CVSP (currently http://cvsp.beta.rsc-us.org/, but subject to change) Apply CVSP to all of ChemSpider. Investigate fused rings.
  • 38. Acknowledgements In particular, Jon Steele (RSC) David Sharpe (RSC) John Blunt (Canterbury, NZ)