An Introduction to Semantic Web Technology

SEMANTIC WEB
UNDERSTANDING IN BRIEF

INTRODUCTION
WEB OF DOCUMENTS VS. WEB OF DATA
4/4/2016Ankur Biswas 2

A Walk Through Brief History of World
Wide Web
• 1969 – ARPANET (Advanced Research Project Agency)
launched
• In 1980, Tim Berners-Lee built ENQUIRE, as a personal
database of people and software models, a way to
play with hypertext; each new page of information in
ENQUIRE had to be linked to an existing page.
• In 1990, Berners-Lee built all the tools necessary for
working Web: HTTP 0.9, HTML, First Web Browser
(Web-Editor), the first HTTP server software (CERN
httpd), the first web server (http://info.cern.ch), and
the first Web pages that described the project itself.
WWW's historical logo designed
by Robert Cailliau
The NeXTcube used by Tim
Berners-Lee at CERN became
the first Web server.
34/4/2016Ankur Biswas

How big is web???
• As per http://www.worldwidewebsize.com/
the Indexed Web contains at least 4.84 billion
pages (Thursday, 25 February, 2016).
• Early estimates suggested that the deep web
is 400 to 550 times larger than the surface
web.
• Since more information and sites are always
being added, it can be assumed that the deep
web is growing exponentially at a rate that
cannot be quantified.

Understanding Information in the WWW
• What is important and how do you know?
• What is information, what is advertisement?
• What does information mean?
• How credible or trustworthy is the information?
• What is redundant?

Understanding the Importance of Meaning
• SEMANTICS: It is part of the linguistics focused on Sense & Meaning of
language or symbols of language.
• It is study of interpretation of sign or symbols as used by agents or
communities within particular circumstances and contexts.
• Semantics asks, how sense and meaning of complex concepts can be
derived from simple concepts based on the rules of syntax.
• The semantics of a message depends of its context and pragmatics†.
†Dealing with things sensibly and realistically in a way that is based on practical rather than
theoretical considerations.

• SYNTAX: In grammatics denotes the study of the principles
and processes by which sentences are constructed in
particular language.
• In formal languages, syntax is just a set of rules, by which
well formed expressions can be created from a fundamental
set of symbols (alphabet).
• In computer science, Syntax defines the normative structure
of data.

• CONTEXT: It denotes the surrounding expressions (concepts) in an
expressing represents its relationship with surrounding expressions
(concepts) and further related elements.
• Context denotes all elements of any sort of communications that
define the interpretation of the communicated content e.g.
• General contexts: place, time, interrelation of action in message.
• Personal or Social contexts: relation between sender and receiver of a message.
• PRAGMATICS: It reflects the intention by which the language is used to
communicate a message.
• In linguistic pragmatics denotes the study of applying language in
different situations It also denotes the intended purpose of speaker.
Pragmatics studies the ways in which context contributes to meaning

The limits of web
• Traditional key based search leads to many irrelevant results.
• Ex.- From a simple term Jaguar it is not clear if the user mean car or animal or
OS(Mac OS X Jaguar)
• POLYSEMY: If you get some result for your search and get some other
result as well with different meaning having same or similar name.

Problem 1: Information Retrieval
• Jaguar (animal) Panthera Onca
• Traditional keyword-based search doesn’t find all results.
• Synonyms & metaphors (Not always addressed properly which results undesired
results)
Primary objects: documents
Degree of structure in data: fairly low
Implicit semantics of contents
Designed for: human consumption
HTML HTML HTML
API/
XML
A B C D
Untyped Links Untyped Links Untyped Links

Problem 2: Information Extraction
• Identifying contents written in other languages e.g. Japanese or
Bengali
• Pictures doesn’t give any information to search engines that what it
shows.
• Example – Google identifies the caption or name of the picture which
is embedded in it and makes it a reference keyword.

Problem 2: Information Extraction (Cont.)
HTML HTML HTML
API/
XML
A B C D
Untyped Links Untyped Links Untyped Links
Things Things
Are two Documents
talking about same
“Thing”???
?
?
?
?
?
? ?

• Can only be solved, correctly by a human agent
• Heterogeneous distribution and order of information.
• Software agent does not have sufficient:
• Knowledge of contexts
• World knowledge and
• Experience
To solve problem
Hence it will not be able to solve the problem without explicit
semantic available.
Implicit knowledge, i.e. information doesn’t have specified explicitly
but must be derived via logical deductions from available information.
Problem 2: Information Extraction (Cont.)

The more complex and voluminous a website is , the more complicated is the
maintenance of the only weakly structured data.
Problems:
 Syntactic consistency error: You have linked your webpage to another
webpage having some related content but now the webpage has moved to
some other place and the link to that address still exist.
 Semantic (link) consistency error: This is even more dangerous where
hyperlinked destinations is consistently changing.
 Correctness: It is tough to maintain correctness over time in automated
manner
 Timeliness: Tracking the changes over time is really tough.
Problem 3: Maintenance
http 404 Error: File/Page not found

Problem 4: Personalization
• Adaption of the presented information content to personal
requirements:
User normally password protect their details and hence it becomes tough to access
any such kind of information.
• Problems:
• From where do we get the required (personal) information?
• Personalization vs Data Security

INTRODUCTION TO
SEMANTIC WEB TECHNOLOGIES
THE VISION OF THE SEMANTIC WEB

The vision of the Semantic Web
Precondition:
• Content can be read and
interpreted correctly
(understood) by machines
Natural language Processing
• Technologies of Traditional
Information Retrieval (Search
Engines)
Semantic Web concept was first introduced in 1990’s by
Tim Berners – Lee who is also one of the creator of internet.
Semantic Web
• Natural language web content will
be explicitly annotated with
semantic metadata
• Semantic metadata encode the
Meaning (Semantics) of the
content and can be read and
interpreted correctly by machines.

How Can we Achieve the Semantic Web? –
The Original Vision
• Instead of publishing information to be consumed by
humans, publish machine-processable data and metadata
using terms/languages that can be understood by machines.
• Build machines (agents) that will search for, query, integrate
etc. this data.
• Make sure all agents understand your terms/languages.

The Semantic Web and Linked Data Vision
Today
• The Semantic Web is a web of data. There is lots of data we all use
every day, and it is not part of the web.
• The Semantic Web is about two things:
• It is about common formats for integration and combination of data drawn from
diverse sources, where on the original Web mainly concentrated on the
interchange of documents.
• It is also about language for recording how the data relates to real world objects.
• That allows a person, or a machine, to start off in one database, and
then move through an unending set of databases which are connected
not by wires but by being about the same thing.

Semantic Web Technology Stack
• Most apps use only a subset of
the stack
• Querying allows fine-grained
data access
• Standardized information
exchange is a key
• Formats are necessary but not
too important
• The semantic web is based on
the web

Basic Layer of Semantic Web Technology
Stack
• The foundation of the layer is World Wide Web. Hence we rely on all technologies in
world wide web.
• Semantic version of Wikipedia is DBpedia.
• As Wikipedia is having template hence data is somewhat structured.
• DBpedia extracts data from Wikipedia infoboxes.
• DBpedia is having machine readable language  RDF
• Dbpedia stores & publishes the result in RDF and a few other formats.
• It also hosts a community effort to define extractors for the data, that can be used
well beyond Wikipedia.
• It provides a number of services around the extracted data, like DBpedia mobile, a
SPARQL endpoint, a faceted browser, a number of mappings to external ontologies,
an ontology itself, etc.

Semantic Web Technologies
• A set of technologies and frameworks that enable
the Web of Data:
• Resource Description Framework (RDF)
• A variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-
Triples)
• Notations such as RDF Schema (RDFS) and the Web Ontology
Language (OWL)
• All are intended to provide a formal description of concepts, terms,
and relationships within a given knowledge domain
• Specialized query language (SPARQL) is just like SQL but can be more
complicated and may be based on graph extraction

Application in Web of Data
• Linked Data
• Linked Open Data (LOD) denote publicly available (RDF) Data in the web,
identification via URI and accessible via HTTP. Linked data
Web of Data:
• >31 billion Facts
• >500 million Links
(Oct 2011)

What is so special about BBC Music Website?
• Information is dynamically aggregated from
external, publicly available data (Wikipedia,
Music Brainz,…)
• No Screen Scrapping
• No specialized API
• Data available as Linked Open Data.
• Data access via simple HTTP Request
• Data is always up to date without manual
interaction.

How to build such a site 1.
• Site editors roam the Web for new facts
• may discover further links while roaming
• They update the site manually
• And the site gets soon out-of-date

• Editors roam the Web for new data published on
Web sites
• “Scrape” the sites with a program to extract the
information
• i.e., write some code to incorporate the new data
• Easily get out of date again…

• Editors roam the Web for new data via API-s
• Understand those…
• input, output arguments, datatypes used, etc.
• Write some code to incorporate the new data
• Easily get out of date again…

The choice of the BBC
• Use external, public datasets
• Wikipedia, MusicBrainz, …
• They are available as data
• not API-s or hidden on a Web site
• data can be extracted using, e.g., HTTP requests or
standard queries

Its all documented

Search Engines – Document Retrieval
• General Problems:
• Correct interpretation of query
string ->
• Somehow the context of user has
to be considered
• e.g. what was the query of the user
just before a specific query or their
usual preferences etc.
• Correct identification of entities
• Automatic disambiguation
• Usability
• personalization

Intelligent Agents in Semantic Web
WORLD WIDE WEB SEMANTIC WEB
USER
Presentation
Service (e.g.
Firefox)
Retrieval Service
(e.g. Google)
USER
Personal
Assistant
www documents
www documents
Intelligent
Infrastructure
Services

3 Generations of Web Documents
Static Web
Pages
HTML / CSS
1st Generation
Virtual
Web Pages
Interactive
Web Pages
Java Script/ Applets
Netbots
Information Extraction
Presentation Planning
Database Access
Template Based
Generation
User Model
Machine Learning
Online Layout
Dynamic Web
Pages
Adaptive Web
Pages
2nd Generation 3rd Generation

Toolbox for the Semantic Web
• Standardized Language to express semantic of information content in the
web (XML/XSD, RDF(S), OWL, RIF)
• Tools of semantic information in the web (RDFa, GRDDL,…)
• Various Field of computer science:
• Artificial Intelligence
• Linguistics
• Cryptography
• Database
• Theoretical Computer Science
• Computer Architecture
• Software Engineering
• Systems Theory
• Computer Networks

Basic Architecture of Semantic Web - I
• Uniform  Different types of
resource identifiers all
constructed according to
uniform schema.
•Resource  Whatever may be
identified by URI
•Identifier  To distinguish one
resource from another

Uniform Resource Identifier (URI)
• A Uniform Resource Identifier (URI) defines a simple and extensible
schema for world wide unique identification of abstract or physical
resources.
• Resources can be every object with a clear identity (according to the context of
the application)
• As e.g. webpages, books, locations, persons, relations among objects, abstract concepts,
etc.
• The concept of URI is already established in various domains as e.g.
• The Web(URL (uniform resource locator), PRN (persistent uniform names), pURL
(persistent uniform resource locator)
• Books & Publications (ISBN, ISSN)
• Digital Object Identifier (DOI)

Uniform Resource Identifier (URI)
• URI Combines
• Address (Locator)
• Uniform Resource Locator (URL, RFC
1738)
• Denotes, where a resource can be
found in the web by stating its
primary access mechanism
• Might change during life time.
• Identity (Name)
• Uniform Resource Name (URN, RFC
2141)
• Persistent Identifier for a web
resource
• Remains unchanged during life cycle
• URI Generic Syntax
• Schema: e.g. http, ftp, mailto
• Userinfo: e.g. username; password
• Host: e.g. Domain name, IPv4/IPv6
Address
• Port: e.g. :80 stands for http port
• Path: e.g. path in file system of
WWW server
• Query: e.g. parameters to be passed
over to applications
• Fragment: e.g. determines specific
fragment of a document
URI=schema”://”[userinfo”@”]host[:port]
[path][“?”query][“#”fragment]

Data on the Web is not enough…
• We need a proper infrastructure for a real Web of
Data
• data is available on the Web
• accessible via standard Web technologies
• data are interlinked over the Web
• i.e., data can be integrated over the Web
• This is where Semantic Web technologies come in
• We will use a simplistic example to introduce the
main Semantic Web concepts

The rough structure of data integration
• Map the various data onto an abstract data
representation
• make the data independent of its internal
representation…
• Merge the resulting representations
• Start making queries on the whole!
• queries not possible on the individual data sets

We start with a book...

A simplified bookstore data
(dataset “A”)
ID Author Title Publisher Year
ISBN 0-00-6511409-X id_xyz The Glass Palace id_qpr 2000
ID Name Homepage
id_xyz Ghosh, Amitav http://www.amitavghosh.com
ID Publisher’s name City
id_qpr Harper Collins London

1st: we export our data as a set of relations
http://…isbn/000651409X
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author

Some notes on the exporting the data
• Relations form a graph
• the nodes refer to the “real” data or contain some literal
• how the graph is represented in machine is immaterial for now
• Data export does not necessarily mean physical conversion of the data
• relations can be generated on-the-fly at query time
• via SQL “bridges”
• scraping HTML pages
• extracting data from Excel sheets
• etc.
• One can export part of the data

Same book in French…

Another bookstore data
(dataset “F”)
A B C D
1
ID Titre Traducteur Original
2
ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X
3
4
5
6
ID Auteur
7
ISBN 0-00-6511409-X $A11$
8
9
10
Nom
11
Ghosh, Amitav
12
Besse, Christianne

2nd: export your second set of data
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:nom
f:traducteur
f:auteur
http://…isbn/2020386682
f:nom

3rd: start merging your data
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
f:nom
f:traducteur
f:auteur
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
Same URI!

3rd: start merging your data
Ghosh, Amitav
Besse, Christianne
f:original
f:nom
f:traducteur
f:auteur
http://…isbn/2020386682
f:nom
Ghosh, Amitav
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
http://…isbn/000651409X

Start making queries…
• User of data “F” can now ask queries like:
• “give me the title of the original”
• well, … « donnes-moi le titre de l’original »
• This information is not in the dataset “F”…
• …but can be retrieved by merging with dataset “A”!

However, more can be achieved…
• We “feel” that a:author and f:auteur should be the same
• But an automatic merge does not know that!
• Let us add some extra information to the merged data:
• a:author same as f:auteur
• both identify a “Person”
• a term that a community may have already defined:
• a “Person” is uniquely identified by his/her name and, say, homepage
• it can be used as a “category” for certain type of resources

3rd revisited: use the extra knowledge
Besse, Christianne
f:original
f:nom
f:traducteu
r
f:auteur
http://…isbn/2020386682
f:nom
Ghosh, Amitav
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
f:auteur
a:name
a:homepage
f:auteur
a:name
a:homepage
f:original
f:traducteur
f:nom
r:type
f:auteur
a:name
a:homepage

Start making richer queries!
• User of dataset “F” can now query:
• “donnes-moi la page d’accueil de l’auteur de l’original”
• well… “give me the home page of the original’s ‘auteur’”
• The information is not in datasets “F” or “A”…
• …but was made available by:
• merging datasets “A” and datasets “F”
• adding three simple extra statements as an extra “glue”

Combine with different datasets
• Using, e.g., the “Person”, the dataset can be combined with
other sources
• For example, data in Wikipedia can be extracted using
dedicated tools
• e.g., the “dbpedia” project can extract the “infobox” information
from Wikipedia already…

Merge with Wikipedia data
Besse, Christianne
f:original
f:nom
f:traducteu
r
f:auteur
http://…isbn/2020386682
f:nom
Ghosh, Amitav http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:name
a:homepage
a:author
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name
w:reference
w:author_of
w:author_of
w:isbn
a:author
f:original
f:traducteur
f:nom
r:type
w:isbn
http://dbpedia.org/../Kolkata
w:author_of
w:born_in
w:long
w:lat

Search Engines – Fact
Retrieval
Query String: International Space Station - 17th
March 2016
• What is International Space Station?
• Is it orbiting on 17th March 2016?
• How to compute the position of satellite on the
said date
• External Data to be considered:
• Constellation data
• Planet data
• Satellite data
Query String: International Space Station - 17th
March 2016
• What is International Space Station?
• Is it orbiting on 17th March 2016?
• How to compute the position of satellite on the
said date
• External Data to be considered:
• Constellation data
• Planet data
• Satellite data

RDF
RDF stands for
• Resource: pages, dogs, ideas...
everything that can have a URI
• Description: attributes, features, and
relations of the resources
• Framework: model, languages and
syntaxes for these descriptions
•RDF is a triple model i.e. every piece of
knowledge is broken down into
( subject , predicate , object )

RDF
• doc.html has for author Ankur
and has for theme Research
• doc.html has for author Ankur
doc.html has for theme Research
• ( doc.html , author , Ankur )
( doc.html , theme , Research )
( subject , predicate , object )

RDFis also a graph model to link the descriptions of resources
RDF triples can be seen as arcs
of a graph (vertex,edge,vertex)
Ankur
Doc.html
Research
Author Theme

Resource Description Framework (RDF)
• Another Triple Model:
Subject Predicate Object
Renee Miller Teaches CSC433
Renee Miller Lives in Toronto
<URI> <URI> <URI> or “Literal”
<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> <http://dbpedia.org/resource/Toronto>
<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near> Toronto
bb: renee-j-miller
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: name
foaf: based_near
bb: renee-j-millerbb: renee-j-millerbb: renee-j-miller
foaf: name
bb: renee-j-miller
foaf: Friend of a Friend

A Simple RDF Example (in RDF/XML)
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/spec/#"
xmlns:bb="http://data.bibbase.org/ontology/">
<rdf:Description rdf:about="http://.../author/renee-j-miller/">
<rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/>
<foaf:name xml:lang=“en">Renée J. Miller</foaf:name>
<foaf:based_near
rdf:resource="http://dbpedia.org/resource/Toronto"/>
</rdf:Description>
</rdf:RDF>
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: based_near
foaf: name
bb: renee-j-miller

A Simple RDF Example (in Turtle)
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .
<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name “Renée J. Miller”@en ;
foaf:based_near <http://dbpedia.org/resource/Toronto>
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: based_near
foaf: name
bb: renee-j-miller

A Simple RDF Example (in RDFa)
…
The author
“Renée J. Miller”
lives in the city
“Toronto”
 .
…
dbpedia: Toront0
foaf: Person
Renee J. Miller
rdf: type
foaf: based_near
foaf: name
bb: renee-j-miller

• SPARQL stands for “SPARQL Protocol
and RDF Query Language”.
• It is the standard query language for
RDF data proposed by the W3C.
• It is based on matching graph
patterns against RDF graphs.
• The simplest kind of graph pattern is
a triple pattern.
– A triple pattern is like an RDF
triple, but with the option of a
variable in the subject, predicate or
object positions.

Example Dataset
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-
syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .
<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name “Renée J. Miller”@en ;
foaf:based_near [ rdf: type foaf:Place;
foaf:name “Toronto”] .

Example SPARQL Query
SELECT ?name
WHERE { ?x foaf:name ?name .
?x rdf:type foaf:Person .
?x foaf:based_near ?y .
?y foaf:name “Toronto” .
}
• Result
?name
“Renée J. Miller”

Example SPARQL Query

SPARQL 1.0 allows
• Extraction of Data as
• URIs, Blank Nodes, typed & un-typed Literals
• RDF Subgraphs
• Exploration of data via Query for unknown relations.
• Execution of complex join operations heterogeneous databases in a
single query
• Transformation of RDF Data from one Vocabulary to another
• Construction of new RDF Graphs based on RDF Query Subgraph

SPARQL 1.1 (in progress) allows
• Additional Query Features
• Aggregate function, subqueries, negations, project expressions, property paths,
• Enables logical Entailment for
• RDF, RDFS, OWL Direct & RDFS – Based Semantic entailment and RIF Core
entailment
• Enables update of RDF graphs as a full data manipulation language
• Enables the discovery of information about the SPARQL service
• Enables Federated Queries distributed over different SPARQL.

SPARQL usage in practice
• SPARQL is usually used over the network
• Separate documents define the protocol and the result format
• SPARQL Protocol for RDF with HTTP and SOAP bindings
• SPARQL results in XML or JSON formats
• Big datasets often offer “SPARQL endpoints” using this
protocol
• Typical example: SPARQL endpoint to DBpedia

SPARQL as a unifying point
Applications
SPARQL Processor
RDF Graph
HTML
NLPTechnique
Relational Database
SQL⇔RDF
Database
SPARQLEndpoint
SPARQLEndpoint
Triple Store
Unstructured Text XML/XHTML
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

Other Semantic Web Technologies
• Web Ontology Language (OWL)
• A family of knowledge representation languages for authoring ontologies for
the Web
• RDF Schema (RDFS)
• RDF Vocabulary Description Language
• http://www.w3.org/TR/rdf-schema/
• How to use RDF to describe RDF vocabularies
• Other RDF Vocabularies
• Simple Knowledge Organization System (SKOS)
• Designed for representation of thesauri, classification schemes, taxonomies,
subject-heading systems, or any other type of structured controlled
vocabulary
• FOAF (Friend of a friend)
• A machine-readable ontology describing persons, their activities and their
relations to other people and object

ONTOLOGIES
EXISTING OF BEING

Ontologies
• An ontology is a formal, explicit, shared specification of a
conceptualization of a domain (Gruber, 1993).
• Conceptualization: the objects, concepts, and other entities that are
assumed to exist in some area of interest and the relationships that
hold among them. A conceptualization is an abstract, simplified view
of the world that we wish to represent for some purpose.
• The term ontology is borrowed from Philosophy, where ontology is a
systematic account of existence (what things exist, how they can be
differentiated from each other etc.).
• Today the word ontology is a synonym for a shared knowledge base.

Ontologies – Components & Models
• Classes, Relations & Instances
• Classes represent concepts
• Classes are described by
attributes
• Attributes are name value pairs
The address contains the name, title and
place of address of a person
Semi - Informal Description
Address
 First name <string>
 Family name <string>
 Street <string>
 PIN Code <int>
 City <string>
 …
Informal Description

Learning Ontologies

Very Large Ontologies
• Recently there has been a lot of work on developing very large
ontologies that capture various areas of human knowledge and
deploying this knowledge in applications such as search engines or
question answering.
• Example: Watson, IBM’s question answering system that beat humans
in the quiz show Jeopardy (http://www-
03.ibm.com/innovation/us/watson/index.html ).

5 Open Data – by Tim Berners-Lee
• Tim Berners-Lee, the inventor of the Web and Linked Data initiator,
suggested a 5-star deployment scheme for Open Data. Here, we give
examples for each step of the stars and explain costs and benefits that
come along with it.

BY EXAMPLE …
make your stuff available on the Web (whatever format) under an open
license
make it available as structured data (e.g., Excel instead of image scan of a
table)
make it available in a non-proprietary open format (e.g., CSV as well as of
Excel)
use URIs to denote things, so that people can point at your stuff
link your data to other data to provide context

What are the costs & benefits of ★ Web
data?
• As a consumer …
• You can look at it.
• You can print it.
• You can store it locally (on your hard drive or on an USB stick).
• You can enter the data into any other system.
• You can change the data as you wish.
• You can share the data with anyone you like.
• As a publisher …
• It’s simple to publish.
• You do not have explain repeatedly to others that they can use your data.

What are the costs & benefits of ★★ Web
data?
• As a consumer, you can do all what you can do with ★ Web
data and additionally:
• You can directly process it with proprietary software to aggregate it,
perform calculations, visualize it, etc.
• You can export it into another (structured) format.
• It’s still simple to publish.

What are the costs & benefits of ★★★ Web
data?
• As a consumer, you can do all what you can do
with ★★ Web data and additionally:
• You can manipulate the data in any way you like, without the need
to own any proprietary software package.
• You might need converters or plug-ins to export the data from the
proprietary format.
• It’s still rather simple to publish.

What are the costs & benefits of ★★★★ Web
data?
• As a consumer, you can do all what you can do with ★★★ Web data and additionally:
• You can link to it from any other place (on the Web or locally).
• You can bookmark it.
• You can reuse parts of the data.
• You may be able to reuse existing tools and libraries, even if they only understand parts of the pattern
the publisher used.
• Understanding the structure of an RDF “Graph” of data can be more effort than tabular (Excel/CSV) or
tree (XML/JSON) data.
• You can combine the data safely with other data. URIs are a global scheme so if two things have the
same URI then it’s intentional, and if so that’s well on it’s way to being 5-star data!
• You have fine-granular control over the data items and can optimize their access (load balancing,
caching, etc.)
• Other data publishers can now link into your data, promoting it to 5 star!
• You typically invest some time slicing and dicing your data.
• You’ll need to assign URIs to data items and think about how to represent the data.
• You need to either find existing patterns to reuse or create your own.

What are the costs & benefits of ★★★★★ Web
data?
• As a consumer, you can do all what you can do with ★★★★ Web data and
additionally:
• You can discover more (related) data while consuming the data.
You can directly learn about the data schema.
• You now have to deal with broken data links, just like 404 errors in web pages.
• Presenting data from an arbitrary link as fact is as risky as letting people include
content from any website in your pages. Caution, trust and common sense are all still
necessary.
• You make your data discoverable.
• You increase the value of your data.
• Your own organization will gain the same benefits from the links as the consumers.
• You’ll need to invest resources to link your data to other data on the Web.
• You may need to repair broken or incorrect links.

Applications
• Data integration (e.g., see project Optique http://www.optique-
project.eu/)
• E-government (e.g., open data)
• E-commerce
• Tourism
• Medicine
• Biology
• Earth Observation (see the work of my group in projects TELEIOS
http://www.earthobservatory.eu/ and LEO
http://www.linkedeodata.eu/ ).
• …

References:
• Books:
• Antoniou, Grigoris, and F. Van Harmelet. "A semantic web premier." England: The MIT Press
Cambridge (2004).
• Segaran, Toby, Colin Evans, and Jamie Taylor. Programming the semantic web. " O'Reilly Media, Inc.", 2009.
• Davies, John, Dieter Fensel, and Frank Van Harmelen. "Towards the semantic web." Ontology-Driven
Knowledge Management. Chichester (2003).
• Scientific Papers:
• Maedche, Alexander. Ontology learning for the semantic web. Vol. 665. Springer Science & Business Media,
2012.
• Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the linked data best practices in
different topical domains." The semantic web–ISWC 2014. Springer International Publishing, 2014. 245-260.
• Video Lectures & Slides
• Video lectures on Semantic Web by Dr. Harald Sack, Hasso Plattner Institute, University in Potsdam, Germany
• www.cs.toronto.edu/~oktie/slides/web-of-data-intro.pdf
• https://www.w3.org/2010/Talks/0622-SemTech-IH/
• Websites
• http://dbpedia.org/snorql/
• http://5stardata.info/en/

Thank You

An Introduction to Semantic Web Technology

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a An Introduction to Semantic Web Technology

Semelhante a An Introduction to Semantic Web Technology (20)

Último

Último (20)

An Introduction to Semantic Web Technology