Semantic Web Technologies 2012-2013 Part I

+

Semantic Web
Technologies 2012-2013
Part I
Mariano Rodriguez-Muro,
Free University of Bozen-Bolzano

+

Disclaimer

License

This work is licensed under the
Creative Commons Attribution-Share Alike 3.0 License
http://creativecommons.org/licenses/by-sa/3.0/

+

Intro


Course organization



Intro to Semantic Web



Intro to Semantic Technologies

+

About me
Mariano Rodríguez-Muro
Assistant Professor at KRDB
Faculty of computer Science (POS Building, 202)
Tel. +390471016228
rodriguez =at= inf.unibz.it

Research interests:


Techniques for query answering optimization



SPARQL, Big RDFS, virtual RDF



Data integration with Semantic Tech and SemTech in the
enterprise.

+

About you


Which program?



Which semester?



Why are you here?




Topic relates to my area



Looking for project/thesis?



Just Interesting?





Topic is mandatory

Need some credits?

Special interests?

+

Course organization (Part I)


Website:




Moodle




…

Schedule






http://rodriguez-muro.com/courses/index.php?title=SWT12

Lecture: Tuesday:10:30 am to 12:30 pm
Lecture: Thursday 8:30 am to 10:30 am
Lab: Tuesday 2:00 to 4:00 pm

Office Hours



With appointment
Please use forums as main means of comunication

+

Reference Material


Slides, Papers



Foundations of Semantic Web. Pascal HItzler, Markus
Krotzsch and Sebastian Rudolph. Chapman & Hall/CRC, 2010.
(Code FSW)



Semantic Web Programming. John Hebeler et. al. Wiley.
2009. (Code SWP)



Programming the Semantic Web. Toby Segaran, Colin Evans
and Jamie Taylor. O‟Reilly. 2009. (Code PTSW)

Available at the library. SWP and PTSW available as ebooks.

+

Grading


Part I 50%, Part II 50%



Grading Part I



Lab exercises: 15%
Mid-term: 35%



Exercises: Each week a new assignment. All assignments are
graded. All assignments are mandatory. Delivery must be
done by the next week. Java and SQL/JDBC is required.
Projects must be packaged with Maven.



Midterm. Covers all material seen during the lectures. From
slides, presentation and selected book chapters/readings
(marked at the end of each slide)

+

Web of Documents



Primary objects: documents



Degree of structure in data: low



Semantics of content:Implicit



Designed for: human consumption

Links between documents

+

Web of documents: The problem

+

Web of data: The problem


How about this query:




How many romantic comedy Hollywood movies are directed by a
person who is born in a city that has average temperature above 15
degrees!?

You need to:




Find reliable sources containing facts about movies (genre &
director), birthplaces of famous artists/directors, average
temperature of cities across the world, etc.
 The result: several lists of thousands of facts
Integrate all the data, join the facts that come from heterogeneous
sources

Even if possible, it may take days to answer just a single query!

+

The Vision
I have a dream for the Web in which computers
become capable of analyzing all the data on the
Web - the content, links, and transactions
between people and computers. A Semantic
Web, which should make this possible, has yet
to emerge, but when it does, the day-to-day
mechanisms of trade, bureaucracy and our daily
lives will be handled by machines talking to machines. The intelligent agents people have
touted for ages will finally materialize.
Barners-Lee, 1999

+

The semantic web


Primary objects: things

Links between: things



Degree of Structure: high



Explicit semantics of contents and links



Designed for both machines and humans

+

Not only about the web


The semantic web vision has generated technologies that are
applied outside the web context including:



Government



Research (Bio, Geo, Cultural heritage, etc.)



Software development





Enterprise intelligence

…

Semantic technologies provide flexible and powerful tools to
accomplish things that were not possible or not practical in the
past.

+

22

Introduction to the Semantic Web
approach

How does a Semantic Web approach help us
merge data sets, infer new relations, and
integrate outside data sources?

+

23

The rough structure of data
integration with SWT
Map the various data onto an abstract data representation

1.
•

Make the data independent of its internal representation…

2.

Merge the resulting representations

3.

Start making queries on the whole
•

Queries not possible on the individual data sets

+

Data set “A”: A simplified book store

Books
ID

Author

ISBN0-00-651409-X id_xyz

Authors
ID

Title
The Glass Palace

Name

id_xyz

Ghosh, Amitav

Publishers
ID

Harper Collins

id_qpr

Home page

Publisher Name

id_qpr

Publisher

http://www.amitavghosh.com

City
London

Year
2000

24

+

25

1st: Export your data as a set of
relations

+

26

Some notes on the data export
Data export does not necessarily mean physical conversion of
the data
Relations can be virtual, generated on-the-fly at query time
via SQL “bridges”
scraping HTML pages

extracting data from Excel sheets
etc.

One can export part of the data

+

27

Data set “F”: Another book store‟s
data
A

1

7

11
12
13

D

E

ID
ISBN0 2020386682

Traducteur
Titre
Original
Le Palais A13
ISBN-0-00-651409-X
des
miroirs

ID
ISBN-0-00-651409-X

Auteur
A12

2
3

6

B

Nom
Ghosh, Amitav
Besse, Christianne

2nd: Export your second set of data
+

28

3rd: start merging your data
+

29

3rd: start merging your data (cont‟d)
+

30

4th: Merge identical resources
+

31

+

32

Start making queries…


User of data set “F” can now ask queries like:


“What is the title of the original version of Le Palais des miroirs?”



This information is not in the data set “F”...



…but can be retrieved after merging with data set “A”!

5th: Query the merged data set
+

33

+

34

However, more can be achieved…


We “know” that a:author and f:auteur are really the same



But our automatic merge does not know that!



Let us add some extra information to the merged data:


a:author is equivalent to f:auteur



Both identify a Person, a category (type) for certain resources



a:name and f:nom are equivalent to foaf:name

3rd revisited: Use the extra knowledge
+

35

+

36

Start making richer queries!


User of data set “F” can now query:


“What is the home page of Le Palais des miroirs’s „auteur‟?”



The information is not in data set “F” or “A”…



…but was made available by:


Merging data sets “A” and “F”



Adding three simple “glue” statements

+

38

Bring in other data sources


We can integrate new information into our merged data set
from other sources




e.g. additional information about author Amitav Ghosh

Perhaps the largest public source of general knowledge is
Wikipedia


Structured data can be extracted from Wikipedia using dedicated
tools

May 12, 2009

7th: Merge with Wikipedia data
+

owl:sameAs

39

7th (cont‟d): Merge with Wikipedia data
+

owl:sameAs

40

7th (cont‟d): Merge with Wikipedia data
+

owl:sameAs

41

+

42

Is that surprising?


It may look like it but, in fact, it should not be…



What happened via automatic means is done every day by
Web users!



The difference: a bit of extra rigour so that machines could do
this, too

+

43

What did we do?


We combined different data sets that



...are of different formats (RDBMS, Excel spreadsheet, (X)HTML, etc)





...may be internal or somewhere on the Web
...have different names for the same relations

We could combine the data because some URIs were identical


i.e. the ISBNs in this case



We could add some simple additional information (the “glue”) to
help further merge data sets



The result? Answer queries that could not previously be asked

+

44

What did we do? (cont‟d)

+

45

The abstraction pays off because…


…the graph representation is independent of the details of the
native structures



…a change in local database schemas, HTML structures, etc.
do not affect the whole


“schema independence”



…new data, new connections can be added seamlessly &
incrementally



… it doesn‟t matter if you are at the enterprise level or at the
web level

+

46

So where is the Semantic Web?

Semantic Web technologies make such integration possible

+
Semantic Technologies
Today: Applications, Use cases, Technologies, Systems

+

Semantics today


Linked-in



Schema.org



Good-relations



Oracle (Server)



IBM (DB2, Watson)



Apple (Siri)



SAP



Evri, Linked-in, many startups



Many deployed systems

+

Semantic Web Technologies


A set of technologies and frameworks that enable semantic
data management, data integration and the web of data


Resource Description Framework (RDF)



A variety of data interchange formats (e.g., RDF/XML, N3, Turtle, NTriples)



Semantic languages such as RDF Schema (RDFS) and the Web
Ontology Language (OWL) and Rules (SWRL)



Query language (SPARQL)



Software infrastructure (RDF/SPARQL frameworks, Triple stores,
Data integrators, Query engines, Reasoners)



Publicly available connected dataset and open data initiatives
(LOD)

+

SWT Part I


The Data Model (RDF)



The query language (SPARQL)



Software Development (Architecture, Frameworks and Tools)



A little more semantics (RDFS, inference techniques, tools and
data integration)



Interacting with the enterprise (Legacy sources, XML, DBMS,
mappings)



More complex semantics (Rules, data integration and
reasoning with rules)

+

Reading material


PTSW Chapter 1



SWP Part I, Chapter 1



FTW Section 1.4

Semantic Web Technologies 2012-2013 Part I

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a Semantic Web Technologies 2012-2013 Part I

Semelhante a Semantic Web Technologies 2012-2013 Part I (20)

Mais de Mariano Rodriguez-Muro

Mais de Mariano Rodriguez-Muro (20)

Último

Último (20)

Semantic Web Technologies 2012-2013 Part I

Notas do Editor