1. +
The Open Provenance
Model Vocabulary
Jun Zhao
University of Oxford
Jun.zhao@zoo.ox.ac.uk
2. +
Outline
Background about data.gov.uk
The use cases
XML serialization
Data transformation on the fly
Complex and nested processes
Provenance of non-digital artifacts
The Open Provenance Model Vocabulary (OPMV)
The rationale
An overview
Examples
Future work
Summary
3. +
data.gov.uk
Linking UK government data
Aims:
Provide a set of best practices for government agencies
Provide the minimum set of tooling and specification to facilitate
the publication and consumption of data
Encourage “responsible” data publishing
4. +
Downloaded from;
Unzipped from, etc input output Made accessible
XSLT Processor
XSLT
Parameter RDF File
Binding
XSLT
Stylesheet Who, when,
which version,
XSLT Template how
Contributed by Jeni Tennison
5. +
On-the-fly Transformation
Who, when,
which
version,
how
http://mytransportatio.db/j10
Data
transformation
wrapper
Contributed by Stuart Williams
6. +
Complex Data Creation Pipeline
Document Reset PR
GATE Pipeline ANNIE English
Tokeniser
ANNIE English Splitter
GateXMLRegressionTransformati ANNIE POS Tagger
on
Data.gov.uk
Morphological Analyzer
Data.gov.uk Flexible
GateXMLRdfaTransformation Roof Gazetteer
Data.gov.uk Generic
Gazeteer
GATE Noun Phrase
Chunker
RdfaRdfXmlTransformation
Data.gov.uk Generic
Transducer
TSO Coreference
Courtesy of Paul Appleby from TSO (Data Enrichment Service)
7. Services used by execuGons
S3 S2 S1
accessedService
wasTriggeredBy wasTriggeredBy
p3 p2 p1 Level 1: Provenance
of execuGon
iteraGonOfProcess at a higher level
hasParentProcess
p4
followed p5 p21 p22
Level 0: Provenance
of execuGon at a
detailed level
wasGeneratedBy wasGeneratedBy wasGeneratedBy
d6 d5 d3 d2 An arGfact d1
A data collecGon
wasDerivedFrom d4
8. +
Non-digital Data Objects
Organizations
Organizational structure changes over time
Origin organization, resulting Organization
Boundary
Legislation
An organizaGon ontology: hOp://www.epimorphics.com/public/vocabulary/org.html
9. +
The Challenges
Data of different representations, of physical forms, of
granularity
Not tooling support
Provenance across different types of systems
Identification
Different terminologies
10. +
The Gaps
A vocabulary being able to describe provenance of all types
of data, from different systems
A vocabulary providing enough terms to describe
provenance accurately
Guidance on creating and publishing provenance on the Web
Tool supports for creating and publishing provenance on the
Web
Provenance access
11. +
The Open Provenance Model
Vocabulary
Based on the Open Provenance Model
Enable “responsible” data publication, in order to trace the
responsible agents and to reproduce results
Enable to describe provenance of any types of data
An alternative implementation of the OWL OPM Serialization
12. +
The Rationale
Grounded upon existing SW technologies
Do not explicitly define a graph, OPMGraph
Named Graphs
Reuse existing vocabularies
Lightweight
3 classes and 12 properties
Reuse 3 classes from the W3C Time Ontology
Easy to use and extend
13. +
Overview of the Vocabulary
Defined as a vocabulary expressed using OWL
Implement the core concepts of the Open Provenance Model
No specific granularity prescribed
Partitioned into:
The Core Module
Other typed modules: common, xml, gate, sparql
14. +
Overview of OPMV
wasDerivedFrom
Agent wasUsedAt
Artifact
wasGeneratedAt
wasControlledBy
used
wasGeneratedBy
wasPerformedAt time:
Process TemporalEntity
1 prefix time: http://www.w3.org/2006/time# wasTriggeredBy
Object properGes implemenGng OPM time:Interval time:Instant
wasStartedAt
Object properGes not as exactly
wasEndedAt
defined in OPM
rdfs:subClassOf relaGonships withRespectOf
15. +
The When and Who of an Artifact
_:d0
rdf:type opmv:Artifact ;
opmv:wasGeneratedAt _:t0 ;
opmv:wasGeneratedBy [
rdf:type opmv:Process ;
opmv:wasPerformedBy _:p0
]
.
_:t0
rdf:type time:Instant ;
time:inXSDDateTime "2010-10-07T12:09:00Z"^^xsd:dateTime ;
.
_:p0
rdf:type opmv:Agent, foaf:Agent ;
.
16. +
The Creation of An artifact (PC 3)
pc1:p5 rdf:type opmv:Process ; rdfs:label "Reslice 1" .
pc1:a3 rdf:type opmv:Artifact
opmv:wasGeneratedBy [
rdf:type opmv:Process;
opmv:used pc1:p1 ;
opmv:wasPerformedAt [
rdf:type time:Interval ;
time:hasBeginning [
a time:Instant ;
time:inXSDDateTime "{PROCESS START TIME}"^^xsd:dateTime ] ;
time:hasEnd time:hasBeginning [
a time:Instant ;
time:inXSDDateTime "{PROCESS END TIME}"^^xsd:dateTime ]
]
].
19. +
Comparison with OPM OWL
A more intuitive OWL ontology and RDF representation
Take full advantage of SW technologies
Lack of explicit semantics for graph membership
Less expressivity, e.g. no cardinality constraints
20. +
Future Development
More typed modules
A guide on how to publish provenance
Where and how much
What is the minimum provenance
How to represent the information
21. +
Summary
The vocabulary is well-accepted and easy to understand for
the data.gov.uk team
Experimental adoption, not yet large scale production
Missing the guidance on what provenance information to be
created and published, and how
Lack of ideas about how provenance information will be used
22. This work is created by Jun Zhao
and licensed under a Creative
Commons Attribution-Share Alike
+ 3.0 License
(http://creativecommons.org/
licenses/by-sa/3.0/)