SlideShare uma empresa Scribd logo
1 de 29
oreChem: Planning and
Enacting Chemistry on the
Semantic Web
Microsoft Research eScience Workshop 2010
Berkeley, CA USA
Mark Borkum, Simon Coles and Jeremy Frey
12 October 2010
Overview
• Introduction
• Ontology
• Case Study: X-ray Crystallography
• Future Work
• Summary
2
The Scientific Method
• A systematic process
for knowledge
acquisition
• Becoming increasingly
data-intensive
Planning
Enactment
Analysis
Publication
3
The Data Deluge
4
• In Haiku:
– Lots of producers;
Generating more data
than ever before.
• 40 years ago, a PhD
student would
determine 3 structures
over the entire course
of their study!
The Great Wave off Kanagawa by Katsushika Hokusai
The Scientific Method (on the Web)
5
Provenance (The Elephant in the Room)
• The 7 W’s [Goble 2002]
– Who, What, Where,
Why, When, Which, &
(W)How
• The Why aspect is
often ignored 
6
Why
Planning
Who
Authorship
What &
(W)How
Enactment
Where & When
Annotations
The oreChem Project
• Funded by Microsoft
Research
• Investigating the design and
deployment of a semantic-
based eScience infrastructure
for Chemistry
• Project website:
– http://research.microsoft.com/
en-us/projects/orechem/
7
Why
Planning
Who
Authorship
What &
(W)How
Enactment
Where & When
Annotations
oreChem
Dublin Core, FOAF, SIOC, OWL Time, GeoNames, etc…
oreChem Core Ontology
8
Planning
• Prospective provenance
• Describes a scientific
experiment that will be
enacted (in the future)
• Three entity types:
– Plan
– Plan Stage
– Plan Object
9
Enactment
• Retrospective provenance
• Describes a scientific
experiment that was
enacted
• Three entity types:
– Run
– Stage
– Object
10
“In theory, there is no difference
between theory and practice.
But, in practice, there is.”
Unknown (possibly Yogi Berra)
Realisation (is not Instantiation)
• Each ‘run thing’ is
linked to zero or one
‘plan thing’
– Deviation from the plan
is allowed
12
X-RAY CRYSTALLOGRAPHY
Case Study
13
Current Practice in Crystallography
• Crystallography data is
highly structured
– The de facto standard
adopted by the
community is the CIF
(Crystallographic
Information File)
• Relatively few crystal
structures are openly
available online
14
http://www.rin.ac.uk/our-work/data-management-and-
curation/share-or-not-share-research-data-outputs
Crystallography and Fraud
15
The eCrystals Federation
• JISC project
• Network of
crystallography
resources
• All published records
are available as
Open Data
• Based on EPrints
repository 16
http://ecrystals.chem.soton.ac.uk/
eCrystal #20
• Each eCrystals record
contains:
– Bibliographic metadata
– Fundamental and
derived data (excluding
raw images)
– Final structure solution
17
Single Crystal Structure Determination
18
1. Take powder
specimen of chemical
substance
2. Measure diffraction of
X-rays
3. Compute electron
densities
4. Solve for crystal
structure
oreChem Plan for eCrystals
• Machine-readable
representation of
methodology
• Describes requirements
for software and data
products
• Available online at:
– http://ecrystals.chem.soton.
ac.uk/plan.rdf
19
oreChem Run for eCrystal #20
• Exported by “oreChem”
plug-in for EPrints 3.1
– RDF/XML serialisation
– Uses SWRL rules to infer
causal relationships
• Describes:
– Software
– Data products
20
http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecry
stals-eprint-20.xml?include_xsl=1
Retrospective Provenance
Graphs for eCrystal #20
Stages and Objects Objects
21
used (dashed)
emitted (solid)
derivedFrom (solid)
used(?s, ?o1) & emitted(?s, ?o2)
 derivedFrom(?o2, ?o1)
Crystallography and Fraud – SPARQL
PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#>
PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#>
SELECT ?run ?raw ?derived ?reported
WHERE {
?run a orechem:Run ;
orechem:hasPlan ecrystals:Ecrystals ;
orechem:containsObject ?raw ;
orechem:containsObject ?derived ;
orechem:containsObject ?reported .
?raw a orechem:File ;
orechem:hasPlanObject ecrystals:HKL .
?derived a orechem:File ;
orechem:derivedFrom ?raw .
?reported a orechem:File ;
orechem:hasPlanObject ecrystals:CIF ;
orechem:derivedFrom ?derived .
}
22
Crystallography and Fraud – SPARQL (2)
23
Crystallography and Fraud – SPARQL (3)
24
?run ?raw
?reported
?derived
http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecry
stals-eprint-20.xml?include_xsl=1
Crystallography and Fraud – SPARQL (4)
?run ?raw ?derived ?reported
_:eCrystal_20_Run 02sot126.hkl 02sot126.prp 02sot126.cif
_:eCrystal_20_Run 02sot126.hkl 02sot126.lst 02sot126.cif
_:eCrystal_20_Run 02sot126.hkl 02sot126.res 02sot126.cif
25
Future Work
• oreChem Core Ontology
– Support for conditionals and continuations
• oreChem Lower Ontology
– Specialised for Physical and Computational Chemistry
• Applications and Services
– oreChem Plan Designer and Enactor
– oreChem Run Inspector
26
Summary
• <summary/>
27
Acknowledgements
• Microsoft Research
– Tony Hey
– Lee Dirks
– Savas Parastatidis
– Alex Wade
• oreChem Project
– Carl Lagoze, Theresa Velden
– Jeremy Frey, Simon Coles
– Peter Murray-Rust, Nick
Day, Jim Downing
– C. Lee Giles, Prasenjit Mitra,
William Brouwer, Na Li
– Marlon Pierce, Sashi Kiran
Challa
28
Thank You
• Questions?
29

Mais conteúdo relacionado

Mais procurados

Royal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesRoyal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesValery Tkachenko
 
Troy_Williams__Resume
Troy_Williams__ResumeTroy_Williams__Resume
Troy_Williams__ResumeTroy Williams
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardizationValery Tkachenko
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsValery Tkachenko
 
GeoChronos - SpecNet Workshop 2009 Presentation
GeoChronos - SpecNet Workshop 2009 PresentationGeoChronos - SpecNet Workshop 2009 Presentation
GeoChronos - SpecNet Workshop 2009 PresentationCameron Kiddle
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationStuart Chalk
 
Semantic data integration proof of concept
Semantic data integration proof of conceptSemantic data integration proof of concept
Semantic data integration proof of conceptNicolas Bertrand
 

Mais procurados (11)

Royal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesRoyal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and libraries
 
Troy_Williams__Resume
Troy_Williams__ResumeTroy_Williams__Resume
Troy_Williams__Resume
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardization
 
The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platforms
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
GeoChronos - SpecNet Workshop 2009 Presentation
GeoChronos - SpecNet Workshop 2009 PresentationGeoChronos - SpecNet Workshop 2009 Presentation
GeoChronos - SpecNet Workshop 2009 Presentation
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
Bh14 ogo
Bh14 ogoBh14 ogo
Bh14 ogo
 
Semantic data integration proof of concept
Semantic data integration proof of conceptSemantic data integration proof of concept
Semantic data integration proof of concept
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 

Semelhante a oreChem: Planning and Enacting Chemistry on the Semantic Web

Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
The Catalan Research portal: collecting information from Catalan universities...
The Catalan Research portal: collecting information from Catalan universities...The Catalan Research portal: collecting information from Catalan universities...
The Catalan Research portal: collecting information from Catalan universities...Ricard de la Vega
 
Lessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeLessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeCraig Knoblock
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...Lisette Giepmans
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...Ardan Patwardhan
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data GenerationFilip Radulovic
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through PrefetchingRohit Deshpande
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation SlidesDuraSpace
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College LondonTorsten Reimer
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesOCLC
 

Semelhante a oreChem: Planning and Enacting Chemistry on the Semantic Web (20)

Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
 
The Catalan Research portal: collecting information from Catalan universities...
The Catalan Research portal: collecting information from Catalan universities...The Catalan Research portal: collecting information from Catalan universities...
The Catalan Research portal: collecting information from Catalan universities...
 
The Catalan Research portal: collecting information from Catalan universities...
The Catalan Research portal: collecting information from Catalan universities...The Catalan Research portal: collecting information from Catalan universities...
The Catalan Research portal: collecting information from Catalan universities...
 
Lessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeLessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art Collaborative
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
 
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Acceleration of XML Parsing through Prefetching
Acceleration of XML  Parsing through PrefetchingAcceleration of XML  Parsing through Prefetching
Acceleration of XML Parsing through Prefetching
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College London
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter Libraries
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

oreChem: Planning and Enacting Chemistry on the Semantic Web

  • 1. oreChem: Planning and Enacting Chemistry on the Semantic Web Microsoft Research eScience Workshop 2010 Berkeley, CA USA Mark Borkum, Simon Coles and Jeremy Frey 12 October 2010
  • 2. Overview • Introduction • Ontology • Case Study: X-ray Crystallography • Future Work • Summary 2
  • 3. The Scientific Method • A systematic process for knowledge acquisition • Becoming increasingly data-intensive Planning Enactment Analysis Publication 3
  • 4. The Data Deluge 4 • In Haiku: – Lots of producers; Generating more data than ever before. • 40 years ago, a PhD student would determine 3 structures over the entire course of their study! The Great Wave off Kanagawa by Katsushika Hokusai
  • 5. The Scientific Method (on the Web) 5
  • 6. Provenance (The Elephant in the Room) • The 7 W’s [Goble 2002] – Who, What, Where, Why, When, Which, & (W)How • The Why aspect is often ignored  6 Why Planning Who Authorship What & (W)How Enactment Where & When Annotations
  • 7. The oreChem Project • Funded by Microsoft Research • Investigating the design and deployment of a semantic- based eScience infrastructure for Chemistry • Project website: – http://research.microsoft.com/ en-us/projects/orechem/ 7 Why Planning Who Authorship What & (W)How Enactment Where & When Annotations oreChem Dublin Core, FOAF, SIOC, OWL Time, GeoNames, etc…
  • 9. Planning • Prospective provenance • Describes a scientific experiment that will be enacted (in the future) • Three entity types: – Plan – Plan Stage – Plan Object 9
  • 10. Enactment • Retrospective provenance • Describes a scientific experiment that was enacted • Three entity types: – Run – Stage – Object 10
  • 11. “In theory, there is no difference between theory and practice. But, in practice, there is.” Unknown (possibly Yogi Berra)
  • 12. Realisation (is not Instantiation) • Each ‘run thing’ is linked to zero or one ‘plan thing’ – Deviation from the plan is allowed 12
  • 14. Current Practice in Crystallography • Crystallography data is highly structured – The de facto standard adopted by the community is the CIF (Crystallographic Information File) • Relatively few crystal structures are openly available online 14 http://www.rin.ac.uk/our-work/data-management-and- curation/share-or-not-share-research-data-outputs
  • 16. The eCrystals Federation • JISC project • Network of crystallography resources • All published records are available as Open Data • Based on EPrints repository 16 http://ecrystals.chem.soton.ac.uk/
  • 17. eCrystal #20 • Each eCrystals record contains: – Bibliographic metadata – Fundamental and derived data (excluding raw images) – Final structure solution 17
  • 18. Single Crystal Structure Determination 18 1. Take powder specimen of chemical substance 2. Measure diffraction of X-rays 3. Compute electron densities 4. Solve for crystal structure
  • 19. oreChem Plan for eCrystals • Machine-readable representation of methodology • Describes requirements for software and data products • Available online at: – http://ecrystals.chem.soton. ac.uk/plan.rdf 19
  • 20. oreChem Run for eCrystal #20 • Exported by “oreChem” plug-in for EPrints 3.1 – RDF/XML serialisation – Uses SWRL rules to infer causal relationships • Describes: – Software – Data products 20 http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecry stals-eprint-20.xml?include_xsl=1
  • 21. Retrospective Provenance Graphs for eCrystal #20 Stages and Objects Objects 21 used (dashed) emitted (solid) derivedFrom (solid) used(?s, ?o1) & emitted(?s, ?o2)  derivedFrom(?o2, ?o1)
  • 22. Crystallography and Fraud – SPARQL PREFIX orechem: <http://www.openarchives.org/2010/05/24-orechem-ns#> PREFIX ecrystals: <http://ecrystals.chem.soton.ac.uk/plan.rdf#> SELECT ?run ?raw ?derived ?reported WHERE { ?run a orechem:Run ; orechem:hasPlan ecrystals:Ecrystals ; orechem:containsObject ?raw ; orechem:containsObject ?derived ; orechem:containsObject ?reported . ?raw a orechem:File ; orechem:hasPlanObject ecrystals:HKL . ?derived a orechem:File ; orechem:derivedFrom ?raw . ?reported a orechem:File ; orechem:hasPlanObject ecrystals:CIF ; orechem:derivedFrom ?derived . } 22
  • 23. Crystallography and Fraud – SPARQL (2) 23
  • 24. Crystallography and Fraud – SPARQL (3) 24 ?run ?raw ?reported ?derived http://ecrystals.chem.soton.ac.uk/cgi/export/20/ORE_Chem/ecry stals-eprint-20.xml?include_xsl=1
  • 25. Crystallography and Fraud – SPARQL (4) ?run ?raw ?derived ?reported _:eCrystal_20_Run 02sot126.hkl 02sot126.prp 02sot126.cif _:eCrystal_20_Run 02sot126.hkl 02sot126.lst 02sot126.cif _:eCrystal_20_Run 02sot126.hkl 02sot126.res 02sot126.cif 25
  • 26. Future Work • oreChem Core Ontology – Support for conditionals and continuations • oreChem Lower Ontology – Specialised for Physical and Computational Chemistry • Applications and Services – oreChem Plan Designer and Enactor – oreChem Run Inspector 26
  • 28. Acknowledgements • Microsoft Research – Tony Hey – Lee Dirks – Savas Parastatidis – Alex Wade • oreChem Project – Carl Lagoze, Theresa Velden – Jeremy Frey, Simon Coles – Peter Murray-Rust, Nick Day, Jim Downing – C. Lee Giles, Prasenjit Mitra, William Brouwer, Na Li – Marlon Pierce, Sashi Kiran Challa 28