This document discusses using provenance information to help assess data quality. It proposes representing sensor observations and their provenance as linked data and using this information to evaluate quality metrics like accuracy, timeliness, and relevance. The work done so far involves representing observations and quality requirements as linked data and generating initial quality scores. Future work will focus on implementing quality rules that examine provenance information and enabling quality scores to be reused.
3. Motivation
“we don’t know whether the information we find [on the Web]
is accurate or not. So we have to teach people how to assess
what they’ve found’’
Vint Cerf, 2010
Web of Documents has become the Web of documents,
services, data, and people.
Anyone can publish anything so we need a way to evaluate
quality.
We are investigating these issues within the Internet of Things
Sensors now at the centre of many applications
c.baillie@abdn.ac.uk
5. Evaluating Data Quality
Quality Scores
-Quality is a multi-
Entity (and context) dimensional construct
To evaluate quality, we - Accuracy
must examine the - Timeliness
context around data - Relevance
F(E, R) = Q
WIQA Framework
examines data content, Data Requirements
context, and external -Furber and Hepp (2011)
ratings use rules to identify
(Bizer et al. 2009) quality problems
c.baillie@abdn.ac.uk
6. Representing Sensor Observations
Linked Data: “recommended best practice for exposing,
sharing, and connecting pieces of data using URIs and RDF”
c.baillie@abdn.ac.uk
7. Performing Quality Assessment
CONSTRUCT {
_:b0 a QualityScore .
_:b0 score ?qs .
( E distanceFromRoute X )
_:b0 dqm:ruleViolation _:b1 .
Rrelevance = 1-
100 _:b1 a DataRequirementViolation .
_:b1 dqm:affectedInstance ?instance .
} WHERE {
?instance a Observation .
?instance distanceFromRoute ?distance .
LET (?qs := (1 - (?distance / 100))) .
}
c.baillie@abdn.ac.uk
9. Observation Provenance
Provenance is a critical part of observation context
Describes the entities, agents, and activities involved in
data creation:
How was the observation value measured?
Who controlled the sensing process?
How has the observation been transformed since it was
created?
W3C Prov-O model provides linked data representation
of provenance
12. Work To Date
Developed Quality Assessment Framework that enables:
Linked data representation of sensor observations
Definition of quality requirements using SPARQL rules
Generation of quality scores via reasoning
Future Work
Implementation of quality rules that examine provenance
Investigate quality score re-use
14. Implementation
Quality Rules
Observation Reasoner Relevance
Triple (SPIN) Rule
Store
Timeliness
Rule
Apache Tomcat Accuracy
Rule
Observation Quality
Service Service Availability
Rule
Notas do Editor
In this talk I will outline: why the need for quality assessment exists describe how quality is perceived outline our approach to quality assessment provide an example scenario and outline our future work.
Don’t know whether information is accuracte: need to assess! Web has evolved. Web = open platform. Web is big, need smaller platform for eval.
Consider mobile phones providing passenger information regarding the location of buses. Sometimes we get lucky and observations land right on the bus route. However, there are many different sources of low quality data. Inaccurate GPS readings… Malicious users… someone playing with the app while at home People that make mistakes… someone perhaps on the wrong bus…
Animate this ObservationValue ->[Motivate SSN here] Observation + foi -> disruption report