1. Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Provenance in the Semantic Web
Steffen Staab
Joint work with
Simon Schenk, Renata Dividino, Christoph Ringelstein
2. Institut WeST – Web Science & Technologies
Semantic Web Web Retrieval Interactive Web Multimedia Web Software Web
eGovernment eMedia eScience eOrganizations ePerson
Institute for Computer Institute for Leibniz Institute for
Science Information Systems Social Sciences (GESIS)
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 2
3. Do you care where your data comes from?
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 3
4. How to loose 1,000,000,000 US$ in half a day
WeST
Via @Bauckhage Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 4
5. +++ „Los Angeles (dpa) – In der
kalifornischen Kleinstadt Bluewater
soll es nach einem Bericht des
örtlichen Senders vpk-tv zu einem
Selbstmordanschlag gekommen
sein. Es habe in einem Restaurant
zwei Explosionen gegeben...“ +++
German Press Agency DPA, 10 Sep 2009
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 5
7. Loosing your reputation quickly…
Hoax
better check who said what when and
whether you actually want to trust
some information
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 7
8. Defining Provenance
Provenance … means
the origin… of something, or
the history of the ownership or location of an object.
The term was …used for works of art, but is now …
including science and computing. …
In most fields, the primary purpose of provenance is to
confirm or gather evidence as to the time, place, and—
when appropriate—the person responsible for the creation,
production, or discovery of the object.
This will typically be accomplished by tracing the whole
history of the object up to the present
http://en.wikipedia.org/wiki/Provenance
May 31, 2011
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 8
9. The situation…
Call to Ontoprise from an insurance company:
„Can you integrate our 5000 databases?“
EU IP experience (Large Engineering Company):
„oh, we just found another PC that has several tens of
thousands of relevant documents“
Linked open data cloud
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 9
10. Some of the problems…
I have this piece of data.
Can I actually believe it?
Default answer: Find some expert and ask him.
I have this inconsistency in my data.
Who has introduced it and why?
Default answer: Try to find it in the sources.
I have this piece of data.
How can I use it? Can I show it to anyone?
Default answer:
• You are not allowed to do anything with it.
Just throw it away.
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 10
11. Two Types of Provenance Knowledge
Provenance labels for facts
Which confidence?
Which privileges?
Who?
Bluewater is a City Which
authority?
When?
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 11
12. Two Types of Provenance Knowledge
Provenance labels for facts Open Provenance Model
RDF graph representing
Which confidence? Who did
Which privileges? • what
Who? • when
• why
• …
Bluewater is a City Which
authority? to a data item
1 2 3 4 5
When? admission
exami- asking exami- prepare
nation permit nation share
„ex post“ workflow instance
audit/re-enact
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 12
13. SPARQL QUERYING
USING PROVENANCE
R. Dividino, S. Sizov, S. Staab, B. Schüler.
Querying for Provenance, Trust, Uncertainty and other Meta
Knowledge in RDF.
In: Journal of Web Semantics. Elsevier, 7(3), 2009, pp. 204-219.
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 13
14. Representing Provenance Using URIs
http://bluewater.us a dbpedia:city.
http://bluewater.us assertedBy http://neverest.de
BUT:
http://bluewater.us a dbpedia:fakecity.
http://bluewater.us assertedBy http://dpa.de
Who said what?
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 14
15. Representing Provenance Using URIs - 2
http://neverest.de/bluewater a dbpedia:city.
http://neverest.de/bluewater assertedBy
http://neverest.de
http://dpa.de/bluewater owl:sameAs
http://neverest.de/bluewater.
http://dpa.de/bluewater a dbpedia:fakecity.
http:// dpa.de/bluewater assertedBy http://dpa.de
What is the meaning of owl:same as now for
provenance?
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 15
16. Representing Provenance Using Named Graphs
http://dpa.de/ontology
{ dbpedia:locatedIn rdf:domain dbpedia:company.
dbpedia:locatedIn rdf:range dbpedia:city. …. }
http://neverest.de/kb
{ http://bluewater.us a dbpedia:city.
http://vkptv.com a dbpedia:company. }
http://dpa.de/provenance
{ http://dpa.de/ontology dpa:lrm „2000-01-01“.
http://dpa.de/ontology dpa:trust „highest“.
http://neverest.de/kb dpa:lrm „2009-09-09“.
http://neverest.de/kb dpa:trust „lowest“. }
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 16
17. Ambiguity in Representing Provenance
http://neverest.de/kb
{ http://bluewater.us a dbpedia:city.
http://vkptv.com a dbpedia:company. }
What does {…http://neverest.de/kb dpa:trust „lowest“. ..} mean?
Distributive Reading Cumulative Reading
Each of the two facts is Taken together the two
assigned low trust facts are assigned low
trust
Both readings are plausible under appropriate circumstances, but
• Cumulative reading is harder to specify
• Cumulative reading requires closing of sets of facts
(contrast to RDF open world semantics)
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 17
18. Meta Knowledge: When?
Meta knowledge dimension
Set of values plus two operators and such that
and are partial orders with a maximum
D D , D D
Least Recently Modified Date
L xsd:dateTime , L Ls.t. Total order,
lrm(a L b) = max(lrm(a), lrm(b)) and are dual operators
lrm(a L b) = min(lrm(a), lrm(b))
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 18
19. Meta Knowledge: Who?
Meta knowledge dimension
Set of values plus two operators and such that
and are partial orders with a maximum
D D , D D
Provenance
P 2^SOURCES , P Ps.t.
prov(a P b) = prov(a) prov(b) Partial order
prov(a P b) = prov(a) prov(b) and are same operator
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 19
20. SPARQL: Algebraic Graph Query Languages
[WWW08,
SELECT ?city, ?broadcaster JoWS09]
WHERE {
?city a ex:city.
{ ?broadcaster ex:activeIn ?city }
UNION
{ ?broadcaster ex:locatedIn ?city }
}
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 20
21. SPARQL: Algebraic Graph Query Languages
[WWW08,
SELECT ?city, ?broadcaster JoWS09]
WHERE {
1
?city a ex:city.
2
{ ?broadcaster ex:activeIn ?city }
UNION
3
{ ?broadcaster ex:locatedIn ?city }
}
2 3 1
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 21
22. SPARQL: Algebraic Graph Query Languages
SELECT ?city, ?broadcaster [WWW08,
JoWS09]
WHERE {
1 2009-09-09
?city a ex:city.
2 2009-09-09
{ ?broadcaster ex:activeIn ?city }
UNION
3 2009-09-08
{ ?broadcaster ex:locatedIn ?city }
}
|><| max
min
2 3 1 2009-09-08 2009-09-09 2009-09-09
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 22
23. OWL REASONING
USING PROVENANCE
S. Schenk, R. Dividino, S. Staab.
Ontology Debugging Using Provenance.
In: Journal of Web Semantics, Elsevier, accepted for publication.
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 23
24. Do we trust that bluewater is a real city?
German Press Agency, Neverest,
Highest trust, 2001-01-03 Low trust, 2009-09-09
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 24
25. Explanation (Pinpointing)
Given Ontology O, Axiom , O' O
O' is an explanation (pinpoint) for wrt. O, iff
O' and
O* for all O* O'
1
2
Explanation formula
?
3 ( 1 2 ) ( 3 4 )
4
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 25
26. Finding Pinpoints
O‘ O
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 26
27. Computation of meta knowledge for OWL
Query: Meta Knowledge for
Compute Pinpointing Formula for wrt O
(A1 … Am) … (Z1 … Zn)
Insert Meta Knowledge degrees and operators
min(max(lrm(A1), …, lrm(Am)), max(lrm(Z1), …, lrm(Zn))
[KI 2009,
Evaluate SWPM2009]
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 27
28. WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 28
29. „Least recently modified?“
(A1 … Am) … (Z1 … Zn)
min(max(lrm(A1)),…, lrm(Am)),…,max(lrm(Z1),…,lrm(Zn))
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 29
31. Optimized Computation of Provenance
9 9 9 9
8
7 7 7 Time
Order
Oracle for you:
relevant pinpoint
5 5
3 Syntactic
Relevance
2 Color codes
2 2
reachability
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 31
32. Optimized Computation of Provenance
9 9 9 9
8
7 7 7
5 5
3
2 2 2
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 32
33. Optimized Computation of Provenance
9 9 9 9
8
7 7 7
5 5
3
2 2 2
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 33
34. Optimized Computation of Provenance
9 9 9 9
8
7 7 7
5 5
3
2 2 2
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 34
35. Optimized Computation of Provenance
9 9 9 9
8
7 7 7
5 5
3
2 2 2
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 35
36. Optimized Computation of Provenance
9 9 9 9
8
7 7 7
Relevant
pinpoint only
contained
5 5
3
2 2 2
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 36
37. Evaluation: Computing Provenance in Milliseconds
Real-world provenance!
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 37
38. PROVENANCE AWARE
POLICY LANGUAGE
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 38
39. WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 39
40. Middle Rhine Hospital
1
admission
Health
Record create
Policies create
Sticky
Log create (P1): ukob is allowed to process health records for research purposes.
However, ukob is not allowed to transfer the health records of patients to
other organizations.
(P2): The mrh demands that the record is only accessed by ukob after
the sharing of the health records is approved by the patient and the
approval must have been confirmed by a doctor.
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 40
41. Middle Rhine Hospital
1 2 3 4 5 6
exami- asking exami- prepare share for
admission
nation permit nation share research
Health
Record create update update de-id. transfer
Policies create update fulfill check
transfer
You
Sticky
Log create update update update update
encrypt transfer
Sticky Log:
step (record, {mrh}, {}, create, patient_treatment, 1, {0})
step (record, {mrh}, {}, update, examination, 2, {1})
reduced (record, hidden, hidden, update, hidden, 4, {2})
step (record, {mrh}, {}, de-identified, privacy, 5, {4})
attribute (record, de-identified, true, 5)
WeST step Steffen Staab {mrh}, {ukob}, transfer, research, 6, {5})
(record, Summer School Semantic Web
staab@uni-koblenz.de 41
42. Middle Rhine Hospital
1 2 3 4 5 6
exami- asking exami- prepare share for
admission
nation permit nation share research
Health
Record create update update de-id. transfer
Policies create update fulfill check
transfer
You
permit (6)?
Sticky
Log create (P3): update update update update
permit (ID) IF (step (record, _, _, transfer, _, ID, _) AND
encrypt transfer
attribute (record, de-identified, true, ID)).
Sticky Log:
step (record, {mrh}, {}, create, patient_treatment, 1, {0})
step (record, {mrh}, {}, update, examination, 2, {1})
reduced (record, hidden, hidden, update, hidden, 4, {2})
step (record, {mrh}, {}, de-identified, privacy, 5, {4})
attribute (record, de-identified, true, 5)
WeST step Steffen Staab {mrh}, {ukob}, transfer, research, 6, {5})
(record, Summer School Semantic Web
staab@uni-koblenz.de 42
43. Middle Rhine Hospital
1 2 3 4 5 6
exami- asking exami- prepare share for
admission
nation permit nation share research
Health
Record create update update de-id. transfer
Policies create update fulfill check
transfer
You
permit (6)?
Sticky
Log create (P3): update update update update
permit (ID) IF (step (record, _, _, transfer, _, ID, _) AND
encrypt transfer
attribute (record, de-identified, true, ID)).
Sticky Log:
step (record, {mrh}, {}, create, patient_treatment, 1, {0})
step (record, {mrh}, {}, update, examination, 2, {1})
reduced (record, hidden, hidden, update, hidden, 4, {2})
step (record, {mrh}, {}, de-identified, privacy, 5, {4})
attribute (record, de-identified, true, 5)
WeST step Steffen Staab {mrh}, {ukob}, transfer, research, 6, {5})
(record, Summer School Semantic Web
staab@uni-koblenz.de 43
44. CONCLUSION
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 44
45. Data Value lies in
Past
Knowing what happened to your data
Knowing why it happened to your data
Present
Drawing the right conclusions from your data
Future
Deciding upon the destiny of your data
Your Strategy is based on Provenance!
You better take care!
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 45
46. Core References
W3C working group:
http://www.w3.org/2011/prov/wiki/Main_Page
IEEE Internet Computing, Vol 15, Issue 1, Jan/Feb 2011
Special Issue on „Provenance in Web Applications“
http://www.computer.org/portal/web/csdl/abs/html/mags/ic/
2011/01/mic2011010017.htm
Journal of Web Semantics, Volume 9, Issue 2, 2011,
Special Issue on „Provenance in the Semantic Web“
http://www.sciencedirect.com/science/journal/15708268
http://websemanticsjournal.org
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 46
47. Core References of Our Own Work
Provenance in RDF
R. Dividino, S. Sizov, S. Staab, B. Schüler. Querying for Provenance, Trust,
Uncertainty and other Meta Knowledge in RDF. In: Journal of Web Semantics.
Special issue on "The Web of Data". Elsevier, 7(3), 2009, pp. 204-219.
Provenance in OWL
S. Schenk, R. Dividino, S. Staab, N. Kurz. Ontology Debugging Using
Provenance. In: Journal of Web Semantics. Special issue on “Ontology
Dynamics“, Elsevier, 9(3), 2011.
Provenance for Policy Languages
C. Ringelstein, S. Staab. Provenance-aware Policy Definition and Execution. In:
IEEE Internet Computing, special issue on Provenance in Web Applications,
Jan/Feb 2011, pp. 49-58.
Capturing Provenance in Distributed Workflows
C. Ringelstein, S. Staab. DiALog: A Distributed Model for Capturing Provenance
and Auditing Information. International Journal of Web Services Research
(JWSR), Idea Group Publishing, 7(2): 1-20, 2010.
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 47
48. Thank You!
http://west.uni-koblenz.de
See you again at…
WeST Steffen Staab Summer School Semantic Web
staab@uni-koblenz.de 48