The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Open innovation contributions from RSC resulting from the Open Phacts project
1. Open innovation and chemistry
data management contributions
from RSC resulting from the
Open PHACTS project
Antony Williams, Valery Tkachenko, Ken
Karapetyan, Alexey Pshenichnov, Colin
Batchelor, Jon Steele & David Sharpe
ACS San Francisco
August 2014
2. What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections
to disease?
Connections
to disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
3. Fundamental issue:
•There is a LOT of science online!
•Chaotic, varying quality and very valuable!
•Scientists want to find information quickly and
easily
•Often they just “can’t get there” (or don’t even
know where “there” is)
•And you have to manage it all (or not)
4. Pre-competitive Informatics:
Pharma are all accessing, processing, storing & re-processing external research data
Literature
PubChem
Genbank
Patents
Databases
Downloads
Data Integration Data Analysis
Firewalled Databases
Repeat @
each
company
x
Lowering industry firewalls: pre-competitive informatics in drug discovery
Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
7. • 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
semantic web technologies
• Open source code, open data and open
standards
• Academics, Pharmas, Publishers…
• To put medicines in the pipeline…
10. Open PHACTS Deliverables
• Many details but overall…
• Deliver an Open Source chemical registry
service, independent of ChemSpider
• Development of Open Source CVSP platform
• Deliver widgets and APIs to the project
• Deliver high quality, standardized Open Data
• Deliver structure data in RDF format
11. Standardize
• Use the SRS as guidance for standardization
• Adjust as necessary to our needs
18. ChEMBL (1.3 million records)
• 11,020 records with 4 bonds and zero charge,
e.g. CHEMBL501101 or CHEMBL501973
• 271 records with hypervalent oxygen (e.g. ,
CHEMBL2219679), carbon (e.g. 1005895),
boron, chlorine, iodine or phosphine
• 6,177 records where direction of bond makes
no sense, e.g. CHEMBL12760 and
CHEMBL34704
20. Open Sourcing Data and Code
• All Open PHACTS data is licensed as Open
Data and available from Open PHACTS
website – ca. 2 Million chemicals
• The Chemical Registration Service, including
Chemical Validation and Standardization
Platform preparing as Open Source now!
21. RSC data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between ChEBI, ChEMBL, DrugBank
and OPS identifiers
3. Molecule–molecule relations (“parent–child”) of
interest for drug discovery
4. Calculated physicochemical properties for
compounds (both molecular and macroscopic)
22. Our RDF schema
Two dozen calculated properties >106
molecules
•CHEMINF ontology for cheminformatics
•QUDT for units and numeric values
•ChemSpider IDs for molecules
23. Synonyms and identifiers
Newly added to the CHEMINF ontology:
•Validated ChemSpider synonyms
•Unvalidated ChemSpider synonyms
•Validated database identifiers
•Unvalidated database identifiers
•InChI, InChIKey, SMILES
•Preferred ChemSpider name
24. Physicochemical properties
• log P
• log D (at pH 5.5 and 7.4)
• bioconcentration factor KOC (at pH 5.5, at pH 7.4)
• index of refraction
• polar surface area
• molar refractivity
• molar volume
• Polarizability
• surface tension
• density at STP
• flash point at 1 atm
• boiling point at 1 atm
• enthalpy of vaporization at STP
• vapour pressure at STP
26. benzene’s
connection table
OPS
benzene
calculation result
QUDT
dimensionless
quantity
“2.17”^^xsd:float
IAO
is about
OBI
has specified
output
OBI
has specified
input
QUDT
has value
QUDT
has standard
uncertainty
QUDT
has unit
CHEMINF
calculated log P
rdf:type
CHEMINF
connection table
rdf:type
“0.234”^^xsd:float
calculation
process
CHEMINF
execution of
ACD/Labs
PhysChem software
library version 12.01
rdf:type
It is actually more complicated..
28. Important for other projects
• Multiple outputs from the project available for
reuse to underpin other projects:
• Chemical registry service
• Chemical validation and standardization
• APIs and visualization widgets
30. New Repository Architecture
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
API
Documents
API
Compounds
Widgets
Reactions
Widgets
Spectra
Widgets
Materials
Widgets
Documents
Widgets
Data tier
Data access
tier
User
interface
components
tier
Analytical Laboratory application
User
interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd
party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
31. Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
Module
Spectra
Module
Reactions
Module
Materials
Module
Textmining
Module
͙
Module
Web UI for unified depositions
DropBox, Google Drive,
SkyDrive, etc
LabTroveand other templated
data
Documents
API, FTP, etc
Raw data Validated data
Staging
databases
Alldatabases are
sliced by data
sources/data
collections and
havesimple
security model
where each data
slice/sourceis
private, public or
embargoed
35. For Deposition of Data
• Quality of data at source
• ensuring chemicals are correct - VALIDATION
• reactions map and balance as appropriate –
VALIDATION and STANDARDIZATION
• file format handling for analytical data types –
binary file formats are proprietary -
STANDARDIZATION
• valid interpretation of data – VALIDATION and
ANNOTATION
36. Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
Module
Spectra
Module
Reactions
Module
Materials
Module
Textmining
Module
͙
Module
Web UI for unified depositions
DropBox, Google Drive,
SkyDrive, etc
LabTroveand other templated
data
Documents
API, FTP, etc
Raw data Validated data
Staging
databases
Alldatabases are
sliced by data
sources/data
collections and
havesimple
security model
where each data
slice/sourceis
private, public or
embargoed
48. Open PHACTS Project Partners
Pfizer Limited – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for Bioinformatics
BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona
Leiden University Medical Centre
Royal Society of Chemistry
Vrije Universiteit Amsterdam
Spanish National Cancer Research Centre
University of Manchester
Maastricht University
Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität Bonn
AstraZeneca
GlaxoSmithKline
Esteve
Novartis
Merck Serono
H. Lundbeck A/S
Eli Lilly
Netherlands Bioinformatics Centre
Swiss Institute of Bioinformatics
ConnectedDiscovery
EMBL-European Bioinformatics Institute
Janssen
OpenLink
49. Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams
Editor's Notes
Mx/psa, how calculated who did it?
Mash up. With your data too,
- top layer join together but need them all
commerical
10
Can go get everything
OPS not a repo of the world, specific sources