This document provides an introduction and overview of a course on Semantic Web Technologies taught by Professor Mariano Rodriguez-Muro. It outlines the course organization, topics to be covered, reading materials, and grading structure. It also introduces key concepts in the Semantic Web and semantic technologies, including using semantic data integration to merge datasets and answer complex queries.
2. +
Disclaimer
License
This work is licensed under the
Creative Commons Attribution-Share Alike 3.0 License
http://creativecommons.org/licenses/by-sa/3.0/
5. +
About me
Mariano Rodríguez-Muro
Assistant Professor at KRDB
Faculty of computer Science (POS Building, 202)
Tel. +390471016228
rodriguez =at= inf.unibz.it
Research interests:
Techniques for query answering optimization
SPARQL, Big RDFS, virtual RDF
Data integration with Semantic Tech and SemTech in the
enterprise.
6. +
About you
Which program?
Which semester?
Why are you here?
Topic relates to my area
Looking for project/thesis?
Just Interesting?
Topic is mandatory
Need some credits?
Special interests?
7. +
Course organization (Part I)
Website:
Moodle
…
Schedule
http://rodriguez-muro.com/courses/index.php?title=SWT12
Lecture: Tuesday:10:30 am to 12:30 pm
Lecture: Thursday 8:30 am to 10:30 am
Lab: Tuesday 2:00 to 4:00 pm
Office Hours
With appointment
Please use forums as main means of comunication
8. +
Reference Material
Slides, Papers
Foundations of Semantic Web. Pascal HItzler, Markus
Krotzsch and Sebastian Rudolph. Chapman & Hall/CRC, 2010.
(Code FSW)
Semantic Web Programming. John Hebeler et. al. Wiley.
2009. (Code SWP)
Programming the Semantic Web. Toby Segaran, Colin Evans
and Jamie Taylor. O‟Reilly. 2009. (Code PTSW)
Available at the library. SWP and PTSW available as ebooks.
9. +
Grading
Part I 50%, Part II 50%
Grading Part I
Lab exercises: 15%
Mid-term: 35%
Exercises: Each week a new assignment. All assignments are
graded. All assignments are mandatory. Delivery must be
done by the next week. Java and SQL/JDBC is required.
Projects must be packaged with Maven.
Midterm. Covers all material seen during the lectures. From
slides, presentation and selected book chapters/readings
(marked at the end of each slide)
11. +
Web of Documents
Primary objects: documents
Degree of structure in data: low
Semantics of content:Implicit
Designed for: human consumption
Links between documents
16. +
Web of data: The problem
How about this query:
How many romantic comedy Hollywood movies are directed by a
person who is born in a city that has average temperature above 15
degrees!?
You need to:
Find reliable sources containing facts about movies (genre &
director), birthplaces of famous artists/directors, average
temperature of cities across the world, etc.
The result: several lists of thousands of facts
Integrate all the data, join the facts that come from heterogeneous
sources
Even if possible, it may take days to answer just a single query!
17. +
The Vision
I have a dream for the Web in which computers
become capable of analyzing all the data on the
Web - the content, links, and transactions
between people and computers. A Semantic
Web, which should make this possible, has yet
to emerge, but when it does, the day-to-day
mechanisms of trade, bureaucracy and our daily
lives will be handled by machines talking to machines. The intelligent agents people have
touted for ages will finally materialize.
Barners-Lee, 1999
18. +
The semantic web
Primary objects: things
Links between: things
Degree of Structure: high
Explicit semantics of contents and links
Designed for both machines and humans
21. +
Not only about the web
The semantic web vision has generated technologies that are
applied outside the web context including:
Government
Research (Bio, Geo, Cultural heritage, etc.)
Software development
Enterprise intelligence
…
Semantic technologies provide flexible and powerful tools to
accomplish things that were not possible or not practical in the
past.
22. +
22
Introduction to the Semantic Web
approach
How does a Semantic Web approach help us
merge data sets, infer new relations, and
integrate outside data sources?
23. +
23
The rough structure of data
integration with SWT
Map the various data onto an abstract data representation
1.
•
Make the data independent of its internal representation…
2.
Merge the resulting representations
3.
Start making queries on the whole
•
Queries not possible on the individual data sets
24. +
Data set “A”: A simplified book store
Books
ID
Author
ISBN0-00-651409-X id_xyz
Authors
ID
Title
The Glass Palace
Name
id_xyz
Ghosh, Amitav
Publishers
ID
Harper Collins
id_qpr
Home page
Publisher Name
id_qpr
Publisher
http://www.amitavghosh.com
City
London
Year
2000
24
26. +
26
Some notes on the data export
Data export does not necessarily mean physical conversion of
the data
Relations can be virtual, generated on-the-fly at query time
via SQL “bridges”
scraping HTML pages
extracting data from Excel sheets
etc.
One can export part of the data
27. +
27
Data set “F”: Another book store‟s
data
A
1
7
11
12
13
D
E
ID
ISBN0 2020386682
Traducteur
Titre
Original
Le Palais A13
ISBN-0-00-651409-X
des
miroirs
ID
ISBN-0-00-651409-X
Auteur
A12
2
3
6
B
Nom
Ghosh, Amitav
Besse, Christianne
32. +
32
Start making queries…
User of data set “F” can now ask queries like:
“What is the title of the original version of Le Palais des miroirs?”
This information is not in the data set “F”...
…but can be retrieved after merging with data set “A”!
34. +
34
However, more can be achieved…
We “know” that a:author and f:auteur are really the same
But our automatic merge does not know that!
Let us add some extra information to the merged data:
a:author is equivalent to f:auteur
Both identify a Person, a category (type) for certain resources
a:name and f:nom are equivalent to foaf:name
36. +
36
Start making richer queries!
User of data set “F” can now query:
“What is the home page of Le Palais des miroirs’s „auteur‟?”
The information is not in data set “F” or “A”…
…but was made available by:
Merging data sets “A” and “F”
Adding three simple “glue” statements
38. +
38
Bring in other data sources
We can integrate new information into our merged data set
from other sources
e.g. additional information about author Amitav Ghosh
Perhaps the largest public source of general knowledge is
Wikipedia
Structured data can be extracted from Wikipedia using dedicated
tools
May 12, 2009
42. +
42
Is that surprising?
It may look like it but, in fact, it should not be…
What happened via automatic means is done every day by
Web users!
The difference: a bit of extra rigour so that machines could do
this, too
43. +
43
What did we do?
We combined different data sets that
...are of different formats (RDBMS, Excel spreadsheet, (X)HTML, etc)
...may be internal or somewhere on the Web
...have different names for the same relations
We could combine the data because some URIs were identical
i.e. the ISBNs in this case
We could add some simple additional information (the “glue”) to
help further merge data sets
The result? Answer queries that could not previously be asked
45. +
45
The abstraction pays off because…
…the graph representation is independent of the details of the
native structures
…a change in local database schemas, HTML structures, etc.
do not affect the whole
“schema independence”
…new data, new connections can be added seamlessly &
incrementally
… it doesn‟t matter if you are at the enterprise level or at the
web level
46. +
46
So where is the Semantic Web?
Semantic Web technologies make such integration possible
50. +
Semantic Web Technologies
A set of technologies and frameworks that enable semantic
data management, data integration and the web of data
Resource Description Framework (RDF)
A variety of data interchange formats (e.g., RDF/XML, N3, Turtle, NTriples)
Semantic languages such as RDF Schema (RDFS) and the Web
Ontology Language (OWL) and Rules (SWRL)
Query language (SPARQL)
Software infrastructure (RDF/SPARQL frameworks, Triple stores,
Data integrators, Query engines, Reasoners)
Publicly available connected dataset and open data initiatives
(LOD)
51. +
SWT Part I
The Data Model (RDF)
The query language (SPARQL)
Software Development (Architecture, Frameworks and Tools)
A little more semantics (RDFS, inference techniques, tools and
data integration)
Interacting with the enterprise (Legacy sources, XML, DBMS,
mappings)
More complex semantics (Rules, data integration and
reasoning with rules)
http://creativecommons.org/licenses/by-sa/3.0/You are free:to Share — to copy, distribute and transmit the workto Remix — to adapt the workto make commercial use of the workUnder the following conditions:Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.With the understanding that:Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.Other Rights — In no way are any of the following rights affected by the license:Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;The author's moral rights;Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.
We need A data modelA query languageStandards and tools to publish the dataStandards and tools to consume the data