The document summarizes lessons learned from analyzing over 100 Semantic Web applications from challenge competitions over the past decade. It finds that while standards like RDF, OWL and SPARQL are widely used, there remain gaps in publishing and updating Linked Data. Most applications require human intervention for data integration due to noisy RDF data. There is also a mismatch between graph-based data models and relational/object-oriented components. The document recommends addressing these issues through more guidelines, libraries, and software frameworks to improve the software engineering process for building Semantic Web applications.
2. Input for this workshop
Digital Enterprise Research Institute www.deri.ie
LEDP workshop CfP calls for:
requirements
patterns
gaps in Linked Data
standards + guidelines
Where should this input
come from ?
Enabling Networked Knowledge
Benjamin Heitmann, slide: 2/17
3. The Semantic Web:
a decade is a long time
Digital Enterprise Research Institute www.deri.ie
2001 2011
Enabling Networked Knowledge
Benjamin Heitmann, slide: 3/17
4. Choice of methodology?
Digital Enterprise Research Institute www.deri.ie
Goal:
patterns, requirements and gaps
regarding LD
Data:
10 years of Semantic Web research
Which scientific approach fits ?
Empirical software engineering
Full IEEE transactions journal paper:
http://tinyurl.com/semweblessons
Enabling Networked Knowledge
Benjamin Heitmann, slide: 4/17
5. Overview
Digital Enterprise Research Institute www.deri.ie
Empirical
survey
Architecture: LD standards: Software Eng. Process:
arch. pattern gaps shortcomings
Software engineering
solutions
Enabling Networked Knowledge
Benjamin Heitmann, slide: 5/17
6. Empirical survey
Digital Enterprise Research Institute www.deri.ie
Sources: 124 apps total
Semantic Web Challenge
(ISWC): 2003-2009,
101 apps
Scripting for SemWeb
Challenge (ESWC), 2006-2009,
23 apps
includes industry & research
apps
Checklist (12 questions)
Data collection:
1. own analysis of paper
2. validation by email
Enabling Networked Knowledge
Benjamin Heitmann, slide: 6/17
7. Empirical survey results
Digital Enterprise Research Institute www.deri.ie
widespread support for SemWeb specific
features
clear difference to database-driven apps
big uptake of Linked Data principles and
eco-system
integration requires human intervention
top 3 standards: RDF, OWL, SPARQL
top 3 vocabularies: FOAF, DC, SIOC
Enabling Networked Knowledge
Benjamin Heitmann, slide: 7/17
8. Conceptual architecture
Digital Enterprise Research Institute www.deri.ie
Conceptual architecture:
describes major design elements of
a system (+ relations)
domain specific
(e.g. the Semantic Web)
provides architectural pattern
documents community consensus
Enabling Networked Knowledge
Benjamin Heitmann, slide: 8/17
9. Components of conceptual
architecture
Digital Enterprise Research Institute www.deri.ie
starting
point: decouple +
specialise
RDF data Graph access RDF store Graph query
language service
handling layer (100%) (88%)
(77%)
Data Data homogenisation Data discovery
integration service (74%) service (30%)
User Graph-based Structured data
navigation interface authoring interface
interface (91%) (29%)
Enabling Networked Knowledge
Benjamin Heitmann, slide: 9/17
10. LD gaps:
publishing/consuming
Digital Enterprise Research Institute www.deri.ie
all applications consume RDF
73% import API, 69% export API
but: incompatible
implementations
LD principles in 2006 led to
consolidation
embedding RDF:
web for humans vs. web for machines
2008: introduction of RDFa
Enabling Networked Knowledge
Benjamin Heitmann, slide: 10/17
11. LD gaps: beyond open data
Digital Enterprise Research Institute www.deri.ie
writing/changing/updating RDF data
is difficult
71% of apps do not support data
changes
Writing to remote RDF store:
draft status in 2011: SPARQL Update
Restricting access (read/write):
no standards
no interoperability
closest ideas (?): R/W design note, WebID
Enabling Networked Knowledge
Benjamin Heitmann, slide: 11/17
12. Software Eng. process
shortcomings (1)
Digital Enterprise Research Institute www.deri.ie
Integrating noisy RDF data:
60% semi-automatic integration
this involves human intervention
only 20% use automatic heuristics
major part of Semantic Web specific code
Distribution of application logic:
multiple components and standards
queries(41%), rules(52%) or formal
vocabularies
hard to maintain
Enabling Networked Knowledge
Benjamin Heitmann, slide: 12/17
13. Software Eng. process
shortcomings (2)
Digital Enterprise Research Institute www.deri.ie
graph-based
Mismatch of data models
between components
graph versus relational or
object oriented (90%)
overhead in communication
inconsistent round-trip
conversion
3 way ORM needed ?
object
relational oriented
Enabling Networked Knowledge
Benjamin Heitmann, slide: 13/17
14. Software Eng. solutions (1)
Digital Enterprise Research Institute www.deri.ie
More guidelines, best
practices and design
patterns:
current examples:
– Linked Data principles and
publishing guidelines
– guidelines for naming of URIs
– Linked Data patterns collection
result: more interoperability,
more coherent Web of Data
Enabling Networked Knowledge
Benjamin Heitmann, slide: 14/17
15. Software Eng. solutions (2)
Digital Enterprise Research Institute www.deri.ie
More software libraries
(beyond RDF storage!)
guidelines can be hardcoded in
reusable libraries
good libraries can make
complicated guidelines easy to
use (See HTTP, SSL, SMTP and
DNS lookups)
current examples:
– any23, d2r server, Semantic
Web Client Library
Enabling Networked Knowledge
Benjamin Heitmann, slide: 15/17
16. Software Eng. solutions (3)
Digital Enterprise Research Institute www.deri.ie
More software factories:
create complete applications
requires patterns + libraries
or: “opinionated software”
components can be
customised for domain
Interface, homogenisation
and data discovery usually
made from scratch
https://developers.facebook.com/docs/beta/opengraph/tutorial/
Enabling Networked Knowledge
Benjamin Heitmann, slide: 16/17
17. Summary
Digital Enterprise Research Institute www.deri.ie
Empirical
survey
Architecture: LD standards: Software Eng. Process:
arch. pattern gaps shortcomings
Full article:
Software engineering http://tinyurl.com/
solutions semweblessons
Enabling Networked Knowledge
Benjamin Heitmann, slide: 17/17
18. Appendix: threats to validity
Digital Enterprise Research Institute www.deri.ie
Representativeness:
only complete applications part of challenges (not tools or
libraries)
apps needed to use real-world data
submission of paper describing the app was required
challenge extends of multiple years, allows trends to be seen
Number of authors who verified checklist (65%):
academic email addresses expire quickly
we manually tried to find new email addresses
no source code was used:
source code was not required for challenges due to e.g. IP
issues
Enabling Networked Knowledge
Benjamin Heitmann, slide: 18/17