7. Where do we Round Up data?
Where can I find the molfile for Roundup?
Papers/Patents about Roundup?
What are the side effects of Roundup?
Where can I order Roundup?
What are the physicochemical properties?
Metabolic pathways?
Different synonyms of Roundup?
Synthesis of Roundup?
Side effects of Roundup?
Etc….
12. ChemSpider
Takes on the role of a structure centric hub:
Connecting, validating, qualifying data
Enhancing data with connections to services
Provides access to data and services for others
to use (Thermo, Agilent, Bruker, Waters,
ACD/Labs, Accelrys, etc.)
Uses available services to integrate, connect
and enhance the offering
23. How did we build it?
We deal in Molfiles or SDF files – with coordinates
Deposit anything that has an InChI – we support
what InChI can handle, good and bad
Standardization based on “InChI standardization”
InChIs aggregate (certain) tautomers
How much of ChemSpider is “on ChemSpider”?
24. Connecting Chemistry across the web
So much of what is seen on ChemSpider is
retrieved in real time using services
27. A Comment on Quality
For >28 million chemical compounds there are
some errors:
“Incorrect” structure representations
Mismatched name-structure relationships
Experimental properties (the values, the units)
Real vs. virtual compounds – text-mining and
conversion
We have deprecated a LOT of data…
28. Downsides of InChI
Good for small molecules – but no polymers,
issues with inorganics, organometallics, imperfect
stereochemistry. ChemSpider is “small molecules”
InChI used as the “deduplicator” – FIRST version
of a compound into the database becomes THE
structure to deduplicate against…
33. Downsides of Overall Approach
Meshing data together based on InChIs worked
for simple molecules
2D layout errors inherited or limited by algorithm
Complex molecules that are meant to be the
same thing were NOT deduplicated. Compounds
differing by one stereocenter, named the same,
meant to be the same, are not the same
37. What needs to happen?
If we could validate
Catch errors in databases (and clean)
Proactively catch errors in publications/patents
Reduce junk in the ether – improve QUALITY!
If we collectively standardized
Interlinking between databases should improve
CVSP – a separate presentation….stick around
38. Crowdsourcing ChemSpider
ChemSpider is crowdsourced
Community deposition, annotation
and curation
Anyone can “Leave Feedback”
Registered users can add data
39. ChemSpider and Global Chemistry Hub
Internet Data
Small organic molecules Commercial Software
Undefined materials Pre-competitive Data
Organometallics Open Science
Nanomaterials Open Data
Polymers Publishers
Minerals Educators
Particle bound Open Databases
Links to Biologicals Chemical Vendors
40. Delivering a Prediction Platform
Experimental data will be used as the basis of
model generation – a predictive platform…
41. The Future of ChemSpider
Continued focus on quality over quantity –
but more data is good too!
ChemSpider Reactions – work in progress
and includes >300,000 reactions
Plugging in a validation and standardization
platform
Delivering personal and institutional
repository capabilities